Core Concepts
Applying topological data analysis to analyze the shapes of South American languages reveals significant distinctions within language families.
Abstract
The content discusses using multiple correspondence analysis and topological data analysis to analyze the shapes of South American languages. It covers the introduction, related work, methodology, data analysis procedure, applications of TDA to Nuclear-Macro-Jˆe and Quechuan families, discussions on results, and acknowledgments. Key insights include:
Difficulty in visualizing categorical-valued linguistic data.
Application of MCA for dimensional reduction.
Use of TDA to analyze topological structures in language distributions.
Distinctions between Jˆe-proper and non-Jˆe-proper languages in NMJ family.
Significance of circular structures in sub-point clouds.
Differences between north and south Quechuan languages.
Permutation tests for statistical inference.
Stats
In Grambank dataset, 189 out of 195 features are binary.
The MCA method encodes frequency information into feature values positions.
The TDA framework detects higher dimensional topological structures like holes and voids.
Quotes
"In this paper we describe a workflow to analyze the topological shapes of South American languages."
"We restrict our analysis to South American languages focusing on Nuclear-Macro-Jˆe and Quechuan families."