Core Concepts
Many widely used dependence measures for binary data, like the phi coefficient, are inadequate for measuring dependence strength because they confound dependence with event equality and suffer from attainability issues. This paper advocates for "proper" measures like Yule's Q and Cole's coefficient, which satisfy key properties like attainability and monotonicity, providing a more accurate reflection of dependence strength.
This research paper presents a critical analysis of dependence measures for binary random variables, focusing on their theoretical properties and practical implications.
Introduction and Motivation
The paper highlights the lack of theoretical guidance on choosing appropriate dependence measures for binary data, leading to the frequent use of popular but statistically flawed measures. It uses the example of the phi coefficient, often misinterpreted as a correlation coefficient, to demonstrate how its limitations can lead to misleading conclusions about dependence strength.
Dependence Concepts and Desirable Properties
The authors establish clear definitions for positive/negative dependence, stronger/weaker dependence, and perfect dependence in the context of 2x2 contingency tables. They propose a set of desirable properties for dependence measures, defining a "proper" measure as one that satisfies normalization, independence, attainability, monotonicity, and symmetry.
Improper Measures: The Case of the Phi Coefficient
The paper dissects the shortcomings of the phi coefficient, showing that it lacks attainability, meaning it often fails to reach -1 or 1 even under perfect negative or positive dependence. This limitation stems from the phi coefficient's inherent nature as a measure of event equality rather than pure dependence. The authors argue that this flaw makes it unsuitable for reliably assessing dependence strength.
Proper Measures: Yule's Q, Cole's Coefficient, and the Odds Ratio
The paper advocates for the use of "proper" measures like Yule's Q and Cole's coefficient, demonstrating how they fulfill all the desired properties. It also discusses the odds ratio, a widely used measure outside the [-1, 1] range, and shows its connection to Yule's Q and its own desirable properties.
Statistical Inference and Applications
The authors develop statistical inference procedures for the discussed measures, deriving their asymptotic distributions and proposing methods for constructing confidence intervals. They illustrate the practical implications of using proper versus improper measures through an application to drug use data, showing how proper measures reveal stronger interdependence patterns consistent with the gateway drug hypothesis.
Conclusion
The paper concludes by emphasizing the importance of using "proper" dependence measures for binary data analysis. It underscores that relying on popular but flawed measures like the phi coefficient can lead to inaccurate interpretations of dependence strength and potentially misleading conclusions. The authors' work provides a valuable framework for selecting and interpreting dependence measures in 2x2 contingency tables, promoting more rigorous and reliable statistical analysis in various fields.
Stats
Google Scholar hits for the phi coefficient: 94,200
Google Scholar hits for Cramér’s V: 26,400
Google Scholar hits for Yule’s Q: 2,120
Google Scholar hits for Cole’s coefficient: 131
Phi coefficient value for the smallpox data example: 0.23
Yule’s Q value for the smallpox data example: 0.86
Cole’s coefficient value for the smallpox data example: 0.83