Artificial Intelligence, particularly generative AI such as large language models (LLMs), has rapidly emerged as a core interest not only among technologists but also among social scientists, ethicists, and policymakers. The swift rise of tools like ChatGPT has highlighted significant ethical, cultural, and social questions, notably around cultural alignment: the capacity of AI systems to recognise, reflect, and engage effectively with human cultural nuances.
Conventionally, developers have sought to assess the degree of cultural alignment in AI models using datasets where “culture” is represented through vague proxies such as nationality, ethnicity, or language — often without a clear or consistent definition of what culture entails. While appealingly straightforward, this data-driven method reduces culture to static stereotypes, neglecting its fluid, context-sensitive nature. This reflects a common machine learning approach—letting models infer human concepts from data alone—yet expecting AI to grasp cultural alignment, a fundamentally human concept, in a meaningful way.
The implications are critical:
We advocate shifting toward an interpretive, qualitative approach inspired by cultural anthropology. Drawing on Clifford Geertz’s concept of “thick description,” we propose that LLMs generate outputs that capture deeper cultural contexts rather than superficial artefacts[1], requiring richer, more reflective understandings of culture.
Defining “culture” continues to be an ongoing discussion within the social sciences. Though most definitions present culture as an overarching system containing a mix of explicit and implicit rules and principles used by humans to organise socially and make sense of the world around them, such definitions present a challenge for AI developers and professionals. Specifically, culture’s inherently fuzzy, context-specific natures make it particularly challenging to operationalise.
Indeed, whilst humans often struggle to explicitly and systematically articulate these cultural nuances, acquiring understanding implicitly through lived experience and continuous social interaction and inculcation. This implicit understanding raises questions about what might realistically be expected from AI systems. After all, if culture is something humans largely internalise, and perhaps even something emergent, rather than explicit and discrete, how can anyone expect machines, whose learning is dependent on context-poor traces of culture embedded in unstructured data, to navigate this complexity in any meaningfully ‘human’ sense?
This complication is further compounded by the availability—or lack—of relevant cultural data. Humans naturally acquire cultural knowledge through subjective,‘multi-modal’, experiences. Such nuanced data may be challenging or impractical to collect systematically for AI purposes at best, and some forms of data may be impossible to collect outright. This is not intended to imply cultural alignment as folly, rather it underscores the stark contrast between organic human acquisition and structured computational input.
Indeed, despite this complexity, humans routinely and effectively negotiate cultural contexts, suggesting that AI might similarly be benchmarked against average human competence rather than an unrealistic standard of complete cultural comprehension. Thus, the goal for AI is not perfect cultural fluency, but sufficient alignment to navigate social contexts comparably to humans.
Human-AI alignment is central across regulatory, academic, and industry circles. Initially framed philosophically by thinkers like Nick Bostrom and Stuart Russell, alignment traditionally focused on accuracy, reflecting human values, or preventing harm. However, cultural alignment differs fundamentally from these areas. Unlike accuracy or safety—easily benchmarked—or ethics codified into guidelines, culture is elusive, deeply contextual, interconnected, and resistant to simplistic schemas.
For LLMs specifically, cultural alignment extends beyond syntax or direct semantics. It resides in implicit understandings, embedded practices, and nuanced contexts, creating an epistemological gap that purely data-driven approaches cannot bridge. The epistemological gap arises because the data lacks the necessary contextual and subtle information in the first place. If such information is essential for achieving cultural alignment and yet is absent from the training data, then it follows that the model has not been exposed to it during training and is thus inherently incapable of achieving cultural alignment. We argue for intentionality: surfacing implicit cultural contexts explicitly and structuring nuanced signals to inform algorithmic learning, thus treating cultural nuance as a core alignment goal.
Current cultural alignment methods appeal to developers because they produce measurable outcomes—either the AI gets it right or it doesn’t. Yet such binary measures ignore culture’s inherent complexity. Culture isn’t a static attribute; it’s continuously enacted through interactions and relationships, resisting reduction to demographic proxies like race, religion, or nationality.
Current methods rely on these static categories, expecting models to infer alignment implicitly and then measuring it against external benchmarks—often arbitrary compilations of cultural assumptions. These benchmarks rarely reflect genuine cultural contexts and have unclear test validity, creating fundamental misalignments between intentions and outcomes.
Culture, inherently embodied and context-rich, presents a unique challenge for primarily language-based models. While multimodal models might capture richer contexts, intentional guidance remains essential. Effective alignment demands explicitly structured cultural contexts, clear definitions of alignment goals, and meaningful evaluation criteria from the outset.
Culture exhibits fractal complexity, revealing intricate detail at every scale. Thus, comprehensive cultural alignment across all dimensions is unrealistic. Taking inspiration from how the social sciences practically approach culture as a concept and area of research, practical alignment requires clearly defined, context-specific targets—such as Indonesian business culture or Vietnamese family dynamics—making effective reflection feasible.
Crucially, AI can never fully represent culture comprehensively. Representation implies authority and completeness, both, at the very least, are beyond current AI capabilities, if possible at all. AI should instead serve as a cultural mirror, accurately reflecting nuanced contexts, leaving interpretation of meaning to users.
The analogy of translating a globe onto a flat map illustrates this point: distortion is inevitable. Accepting these distortions shifts the objective towards richer, culturally meaningful reflections rather than unattainable perfect representations.
Central to our framework is the concept of thick outputs, borrowed from Clifford Geertz’s thick description, which convey deeper cultural meanings rather than superficial accuracy. Geertz distinguished a twitch (involuntary) from a wink (intentional and culturally meaningful). Similarly, culturally aligned AI should differentiate nuanced expressions, accurately reflecting subtleties of formality, hierarchy, or implicit meanings.
Producing thick outputs alone isn’t enough. Effective cultural alignment also demands outputs be explicitly anchored in the user’s cultural cues and intentions embedded in prompts. Given culture’s context-dependent nature, meaningful alignment depends on responsive context interpretation.
We propose three essential conditions for successful cultural alignment:
These conditions underscore that cultural alignment is inherently situational, attributed by users rather than being an inherent model property, making general-purpose benchmarks inadequate.
Evaluating cultural alignment requires qualitative methods rather than standard quantitative benchmarks, which fail to capture cultural nuance effectively. We advocate using qualitative and ethnographic approaches—user feedback from culturally diverse groups, ethnographic analyses of real-world interactions, and iterative testing—to obtain richer insights into cultural sensitivity and guide effective AI development.
Implementing our framework practically involves:
Meaningful cultural alignment cannot be achieved by computer scientists and engineers alone. Collaboration with anthropologists, sociologists, linguists, critical data scholars, and ethicists is essential, ensuring ethical considerations such as transparency, informed consent, and accountability remain central.
Our approach moves beyond superficial cultural representation toward culturally reflective AI—technically proficient, ethically responsible, and culturally sensitive. As AI increasingly integrates into daily life, users should consider not only its accuracy but its genuine cultural sensitivity. Our vision is an AI landscape enriched by cultural reflection, fostering deeper, more meaningful human-AI interactions. Developing practicable solutions in the direction that we have outlined in this short piece necessarily requires interdisciplinary collaboration. We therefore invite anyone interested in exploring a more open and human-centric direction to AI to reach out to us with thoughts, comments, or ideas for further work or collaboration.
[1] This also opens up a space for discussing the importance of benchmarks and evaluation frameworks for this work, which lies outside of the scope for this current piece.
