Core Concepts
To truly understand human language, language models must directly integrate the rich and dynamic human context that shapes how people express themselves, rather than relying on linguistic signals alone.
Abstract
The content discusses the need for large language models (LLMs) to better incorporate human context, which encompasses the various personal, social, and situational factors that influence how people use language. It makes three key arguments:
LM training should include the human context: Current LLMs treat text sequences as independent, missing the opportunity to capture the dependence between an individual's language use and their unique human context. Integrating the human context directly into LM training can lead to better language understanding.
LHLMs should recognize that people are more than their group(s): Human context is not limited to discrete group memberships, but rather a rich mixture of continuous individual traits and characteristics. LHLMs should model this diversity and intersectionality of human factors, rather than relying on narrow group-based representations.
LHLMs should account for the dynamic and temporally-dependent nature of human context: A person's language expresses their changing states of being over time, influenced by factors like mood, personality, and temporal rhythms. LHLMs should capture these dynamic and temporal aspects of the human context to better model human language.
The content reviews relevant past work, discusses key challenges, and proposes potential solutions for realizing this vision of large human language models (LHLMs). It emphasizes the need for representative datasets, scalable modeling approaches, and responsible development strategies to address privacy and ethical concerns.
Quotes
"Serious errors can result when an investigator makes the seemingly natural assumption that the inference from an ecological analysis must pertain either to individuals within the group or to individuals across groups."
"[P]eople from the collectivist culture produc[e] significantly more group and fewer idiocentric self-descriptions than ... people from the individualist cultures"
"[P]eople are embedded within time, ... time is fundamentally important to life as it is lived, and ... personality processes take place over time."