toplogo
Sign In

Estimating Racial Disparities in Outcomes When Individual Race is Unobserved


Core Concepts
Racial disparities in various outcomes can be estimated even when individual race is not observed, by using surnames as an instrumental variable for unobserved race and combining this with Bayesian Improved Surname Geocoding (BISG) probabilities.
Abstract
The key challenge in estimating racial disparities is the lack of individual-level racial information, as the law often prohibits the collection of such data to prevent direct racial discrimination. Existing methods like Bayesian Improved Surname Geocoding (BISG) can produce accurate race predictions, but the standard approaches for using these predictions to estimate racial disparities are biased. The authors propose an alternative identification strategy that uses surname as an instrumental variable for unobserved race. They introduce a new class of models called Bayesian Instrumental Regression for Disparity Estimation (BIRDiE) that take BISG probabilities as inputs and produce unbiased estimates of racial disparities. The BIRDiE approach combines a user-specified outcome model with the BISG model, exploiting the conditional independence of surname and outcome given race, residence location, and other observed characteristics. This allows BIRDiE to accurately estimate the distribution of the outcome variable by race, even when race directly affects the outcome. The authors validate the BIRDiE methodology using voter data from North Carolina, where self-reported race is observed. Compared to standard BISG-based estimators, BIRDiE substantially reduces bias in estimating racial disparities in party registration. The authors then apply BIRDiE to estimate racial differences in who claims the home mortgage interest deduction using IRS tax data, which lacks individual race information.
Stats
"The actual gap is 54.6 percentage points, while the most popular existing BISG-only disparity estimator pegs it at 24.1 percentage points—more than double the true value." "Our preferred BIRDiE model using the same BISG probabilities yields an estimate of 48.5 percentage points, representing about an 80% reduction in bias."
Quotes
"Racial disparities in various fields are often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination." "To correct this bias, we propose an alternative identification strategy under the assumption that surname is conditionally independent of the outcome given (unobserved) race, residence location, and other observed characteristics." "BIRDiE substantially outperforms existing estimators across different error measures and multiple levels of geolocation specificity."

Key Insights Distilled From

by Cory McCarta... at arxiv.org 04-18-2024

https://arxiv.org/pdf/2303.02580.pdf
Estimating Racial Disparities When Race is Not Observed

Deeper Inquiries

How might the proposed BIRDiE methodology be extended to handle continuous outcome variables?

The BIRDiE methodology can be extended to handle continuous outcome variables by using appropriate statistical models that are suitable for continuous data. Instead of a categorical regression model, a regression model such as linear regression or logistic regression can be employed to estimate the relationship between the outcome variable and the predictors. In this case, the likelihood function in the Bayesian framework would need to be adjusted to accommodate continuous outcomes. The posterior distribution would then be updated based on the continuous outcome data, allowing for the estimation of racial disparities in settings where the outcome variable is continuous.

What are some potential limitations or drawbacks of using surname as an instrumental variable for race, and how could these be addressed?

Using surname as an instrumental variable for race may have limitations and drawbacks. One potential limitation is the assumption that surname is conditionally independent of the outcome given race, residence location, and other observed characteristics. This assumption may not always hold true, especially in cases where there are confounding factors that affect both surname and the outcome of interest. Additionally, surname-based discrimination or biases in surname predictions could introduce errors in the estimation of racial disparities. To address these limitations, researchers can consider incorporating additional information or variables that may help improve the accuracy of race prediction. This could involve using more detailed demographic data, genetic information, or other proxy variables that are less correlated with the outcome. Sensitivity analyses and robustness checks can also be conducted to assess the impact of potential violations of the key assumptions. Furthermore, ongoing validation and refinement of the surname prediction models can help mitigate biases and improve the overall accuracy of the estimates.

How could the BIRDiE framework be adapted to estimate racial disparities in settings where there are concerns about name-based discrimination affecting the outcome of interest?

In settings where there are concerns about name-based discrimination affecting the outcome of interest, the BIRDiE framework can be adapted by incorporating additional safeguards and sensitivity analyses to address these issues. One approach could involve conducting subgroup analyses based on the likelihood of name-based discrimination, such as comparing outcomes for individuals with common surnames versus those with less common surnames. Furthermore, researchers can explore the possibility of including interaction terms or additional covariates in the outcome model to account for potential biases related to name-based discrimination. By adjusting the model specifications and considering the impact of name-based discrimination on the outcome variable, the BIRDiE framework can provide more nuanced and accurate estimates of racial disparities in these sensitive settings. Regular monitoring and validation of the results, along with transparency in reporting any limitations or biases, are essential in addressing concerns about name-based discrimination in the estimation of racial disparities.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star