แนวคิดหลัก
Speakers of different languages exhibit distinct patterns in the entities they mention when describing the same images, reflecting cultural differences in visual perception and attention.
บทคัดย่อ
This study presents the first large-scale empirical investigation of cross-lingual variation in image descriptions. Using a diverse dataset of 31 languages and 3,600 images, the authors develop an automated method to accurately identify entities mentioned in captions and measure how their saliency varies across languages.
The key findings are:
- Languages that are geographically or genetically closer tend to mention the same entities more frequently.
- Certain entities are universally salient (e.g., animate beings) or non-salient (e.g., clothing accessories), while others display high variance in saliency across languages (e.g., landscape).
- Annotators across languages prefer to mention entities at the "basic level" of the conceptual hierarchy, supporting Rosch et al.'s (1976) theory of basic-level categories.
- The number of entities mentioned is affected by the environment where the image was taken (e.g., Japan vs. Anglosphere), rather than its familiarity to the annotator.
The authors validate previous small-scale findings on a larger and more diverse dataset, and also present new insights into cross-cultural differences in visual perception and attention. The method can serve as both an exploration and a verification tool for studying cultural effects on language and cognition.
สถิติ
"A smiling girl standing in a classroom."
"A young girl smiling in a classroom."
"Images taken in Japanese-speaking regions contain more entities on average than images taken in English-speaking regions."
คำพูด
"Do speakers of different languages talk differently about what they see?"
"Crucially, this takes into account certain aspects of culture (e.g., the saliency of an entity) while excluding environmental ones (e.g., the presence of an entity)."
"Overall, our work reveals the presence of both universal and culture-specific patterns in entity mentions."