מושגי ליבה
Languages exhibit syntactic similarities influenced by geographic proximity.
תקציר
The article explores language relations through syntactic distances and geographic proximity. It delves into the analysis of linguistic distances using parts of speech (POS) trigrams extracted from the Universal Dependencies dataset. The study reveals clusters corresponding to language families and groups, with exceptions explained by distinct morphological typologies. Additionally, a significant correlation between language similarity and geographic distance is highlighted, emphasizing the impact of spatial proximity on language kinships.
I. Introduction
- Linguistic diversity across 7,000 languages
- Historical linguistics and language evolution
- Importance of quantitative approaches in language classification
II. Methods
- Data sourced from Universal Dependencies library
- Analysis of POS trigrams for syntactic variations
- Information-theoretic approach for linguistic distances
III. Results
- Hierarchical clustering reveals language groupings
- Minimum spanning tree visualization of language connections
- Positive correlation between linguistic and geographic distances
IV. Conclusions
- Logarithmic relation between linguistic and geographic distances
- Potential for further exploration in linguistic evolution and historical connections
סטטיסטיקה
"The number of languages in the world is estimated to be around 7,000."
"We find that r = 3 suffices to correctly characterize any of the studied languages."
"We observe that the values of ˆG0 significantly vary for each language."
"We find dJS(E, J ) = 0.79, which is a high value due to the strong morphosyntactic differences between Japanese and English."
"We compute the estimated stationary and transition probabilities, as specified in Eq. (9) and (10) respectively."
ציטוטים
"Languages are grouped into families that share common linguistic traits."
"Quantitative measures of linguistic distances are useful not only for fundamental reasons but also in applied linguistics."
"Our analysis reveals definite clusters that correspond to well known language families and groups."