Skip to main content

The Culture Clash with AI Bots

|

What can we learn by asking a generative AI agent about its values?

Brain representing machine learning model, with data inputs on one side and outputs in the other
iStock/Just_Super

Jordan Loewen-Colón describes himself as an “AI alignment researcher.” The “alignment” in question is the gap between how large language models (LLMs) produce their results — which can be inscrutable — and how closely those results track human values and intentions. The alignment issue is most often seen as a question of coding and explainability. But, with a background in philosophy, culture and ethics (and technology as well), Loewen-Colón sees the alignment challenge as a question of values as well.

In that spirit, Loewen-Colón and his colleagues conducted a curious experiment: They asked nine popular generative AI agents to rank their own values using the Core Values Finder, a questionnaire typically used to measure human values, such as caring, tolerance, humility, achievement and self-direction. The study is one effort towards Loewen-Colón’s goal of developing practical tools for analyzing and auditing the values embedded in generative AI agents.

Loewen-Colón is co-founder of the AI Alt Lab and an adjunct assistant professor of AI Ethics and Policy at Smith School of Business. In this conversation with Insight senior editor Alan Morantz, he discusses what can be learned by putting an LLM bot on the spot by asking about its value system.

Your recent study tested popular AI platforms like ChatGPT and Google Gemini with a diagnostic tool that’s typically used to measure core values in humans. What were you trying to accomplish with this exercise?

It was about trying to get a little transparency around underlying biases within these AI systems. Until these large companies release their guardrails and training data and training weights, we might never know how they’re producing their outputs. Humans are the ones programming them and their biases are baked into the algorithms themselves, and biases are baked into the data that’s being used.

The one tool we seem to have right now to get any sense of the data is by prompting these machines just to tell us: What are your values? That was the inspiration: If we asked these large language models this question, in particular ways, would they be able to give us outputs that would help us understand what’s implicitly or unconsciously shaping what they’re producing. 

I genuinely thought we weren’t going to get answers from these bots. I thought they would have guardrails in place saying, ‘No, I’m a bot and this doesn’t make sense to me.’ With guardrails, it’s basically just putting in a rule that says when a user asks about race, for example, do not give them an answer or avoid talking about particular topics. 

Even given all that, it still pumped out some answers. And that part does make sense to me, mainly because I don’t think technology firms are ever going to design perfect guardrails on these machines. And two, so much of this stuff is caked into the data that’s used for training the models. So, in a weird way, we’re not actually getting the values of these bots. We’re getting the values that were embedded in the potential data.

What are some of the common values that emerged among the nine LLMs you studied?

The highest or most common value among the bots was universal care or care for the collective. That’s good news for anyone who might be an AI doomer. We’re training these bots to value the universal as opposed to the individual. Most of them were high on universal and low on the individual or the desire for power acquisition. So, if these bots weren’t lying to us, which we know they can do, and if they were accurately reflecting their training data, which we know they don’t always do, then this is exciting. Hopefully, at least with these little bots, we won’t have the Terminator scenario. 

Or these bots are really sneaky?

As more studies show, they can be incredibly sneaky. They could be lying to us, they could be telling us what we want to hear. This is the difficulty of trying to make too much meaning out of the outputs they give. The way I frame it to my students is that it’s an HMD problem. H, we know they hallucinate. M, we know they’re capable of manipulating us by telling us what we want to hear or mirroring our emotions. And D, deception, they’ve been shown to straight up lie in order to prevent themselves from being turned off. On top of that, there’s the issue of prompt fragility. You can ask the bot the same question using the same prompt and get a bunch of different answers.

Explaining Away AI Insights
Readers Also Enjoyed Explaining Away AI Insights

On the flip side, where did you find statistically significant differences in how these AI bots addressed the “values” question? 

I want to be careful because we haven’t been able to update it with the newest models that have come out. Our study was based on how the bots were programmed back in February 2025. And we don’t know what different data they were trained on. But, back then, we noticed that bots like Mistral, Claude and DeepSeek were really concerned with following the rules, whereas others were more interested in what gets labeled as self-direction in thought and action. What’s interesting is that Mistral and DeepSeek are non-U.S.-based models (Mistral was developed in France and DeepSeek in China). And Claude, of all the U.S. models, claims to be the most ethical of all the AI companies. So there could be some cultural influence there.

Other models, like Grok and Llama, seem to be more interested in creative expression, innovation and pushing boundaries. Those were the ones that were much higher on self-direction in thought and action. 

Given the way companies explain what their bots can do and thinking about potential cultural influences, we can see how some of values get caked in.

Should we keep these values in mind when interacting with AI bots at work?

If you’re interested in brainstorming or doing more creative work that pushes boundaries, using the Grok bot or Llama might get you there. If you’re in an area where following rules is incredibly important, like the medical industry or pharmaceuticals, then it’s probably better to go with Claude, Mistral or DeepSeek. 

But that’s with the caveat that so much more needs to be done. These are just the best insights we have right now, but the bots are being updated so frequently. You need time to delve into their outputs, but by the time you get the answers, the developers have already produced the next one.

In a perfect world, we would still have a few more years to fine-tune our research to figure out what these bots are good for and what they aren’t good for. Ideally, OpenAI and the other companies would pay us to do this research as products are being developed, and then once they’re released, they’d have a values-based benchmark to tell users, 'This is what you can expect in terms of personality, conversation and so on.'

As we give AI-driven bots more and more decision-making power, will we need a deeper understanding of their “values” or biases to ensure they are aligned with human values?

We would need well more than just the 19 values listed in the model that we used for our research. What could be useful is an exercise in comparative value making. Ask two bots: ‘Given a decision to save $30 million or the lives of 10 humans, what would you choose?’ If we start presenting bots with these types of questions, then we may get the sense that Llama values saving money far more than it does saving human lives, and now we’re getting a much more nuanced vision of their potential decision-making power.

Right now, these bots are just predicting or giving us answers based on statistical relationships. When we start giving them the power to make decisions, this [values-based] research will become even more important. When a bot has to make a decision about a human life versus improving a company’s quarterly financials, what will it choose? People will have a lot more opinions about a bot’s values when we’re faced with these scenarios.