Core Concepts
南米言語の形状を解析するために、多重対応分析技術とトポロジカルデータ分析手法を適用する。
Abstract
言語学におけるトポロジカル視点の重要性が強調される。
南米言語の形状解析において、MCAとTDA手法が使用される。
データ前処理、MCA処理、言語の形状解析手順が詳細に記載されている。
Jˆe-properと非Jˆe-properなど、異なるグループ間での比較結果が示されている。
I. Introduction
Typological data in linguistics is usually categorical-valued.
Difficulty in measuring differences between languages due to Gower distance limitations.
II. Related work
Previous applications of topological methods in linguistics using PCA.
Comparison with the current study's approach using MCA for individual language visualization.
III. Multiple correspondence analysis
MCA as a dimension reduction technique for categorical-valued features.
Encoding feature frequency information into point clouds for each language.
IV. Topological data analysis
A. A toy example of pretzel
Explanation of the pretzel-shaped point cloud and its circular structures.
B. The TDA workflow
Introduction to the Vietoris-Rips simplicial complex and persistent homology concepts.
V. Data analysis procedure
A. Data preprocessing
Cleaning and imputation procedures on the Grambank dataset.
B. MCA processing
Scree plot and visualization of feature values using MCA projection.
C. The shapes of languages
Visualization and comparison of language shapes within NMJ and Quechuan families.
VI. Applications of TDA
A. Nuclear-Macro-Jˆe family
Distinction between Jˆe-proper and non-Jˆe-proper languages based on circular structures.
B. Quechuan family
Analysis of north vs south Quechuan languages through persistence diagrams and 2-Wasserstein distance.
VII. Discussion & Acknowledgements
Significance of TDA in linguistic research for analyzing language shapes.
Acknowledgments to collaborators, funding sources, and workshop contributions.
Stats
南米言語の形状を解析するためにMCAとTDA手法を適用する。