toplogo
Увійти

Continuous Voice Gender Prediction System for Evaluating Transgender Voice Transition


Основні поняття
This paper presents a software system that can describe voices using a continuous Voice Femininity Percentage (VFP) metric. The system is intended to support transgender speakers during their voice transition and voice therapists assisting them.
Анотація

The paper describes the development and evaluation of a voice gender prediction system that provides a continuous Voice Femininity Percentage (VFP) estimate. Key points:

  • A corpus of 41 French cisgender and transgender speakers was recorded and perceptually evaluated by 57 participants to obtain VFP estimates.
  • Binary gender classification models were trained on external gender-balanced data and used with sliding windows to obtain average gender prediction estimates. These were then calibrated to predict the VFP.
  • The training data speaking style and DNN architecture were shown to impact the VFP estimation accuracy. The models' accuracy was also affected by the speakers' age.
  • The proposed system outperformed baseline approaches based on fundamental frequency (F0) or vocal tract length (VTL) in predicting the VFP, especially for transgender voices.
  • The best-performing model used X-vector features with a 4-layer MLP, trained on the French Common Voice corpus, and achieved an R^2 of 0.94 in predicting the VFP of transgender voices.
  • The results highlight the importance of considering speaking style, age, and the conception of gender as binary or not, to build adequate statistical representations of this cultural concept.
edit_icon

Customize Summary

edit_icon

Rewrite with AI

edit_icon

Generate Citations

translate_icon

Translate Source

visual_icon

Generate MindMap

visit_icon

Visit Source

Статистика
The average reaction time for gender categorization was maximal around 50% Voice Femininity Percentage. The binary gender classification harmonic accuracy decreased with the speaker's age, from 99.3% for 20-35 years old to 96.0% for over 65 years old. The gender bias (difference in accuracy between male and female speakers) increased with age, from -1.0% for 20-35 years old to +4.3% for over 65 years old.
Цитати
"This paper describes the setup and evaluation of a tool trying to close the gap between the reality of voice therapy practice and gender perception." "Our working assumption is to favor systems producing continuous gender estimates fitted to human perception of gender." "Results described in this study are currently limited to read speech in French. Ongoing work consists in building Human-Machine Interfaces to investigate if these theoretical results match end-users expectations and allow to provide constructive voice-passing feedback to be used in addition or instead of F0 estimates."

Глибші Запити

How could the proposed system be extended to handle spontaneous speech and other languages beyond French?

To extend the proposed system to handle spontaneous speech and languages beyond French, several steps can be taken: Data Collection: Gather a diverse dataset of spontaneous speech in various languages. This dataset should include different speaking styles, accents, and age groups to ensure robustness. Feature Extraction: Modify the feature extraction process to capture nuances present in spontaneous speech. This may involve incorporating additional acoustic features that are relevant for spontaneous speech analysis. Model Training: Train the machine learning models on the new dataset that includes spontaneous speech in multiple languages. This will require adapting the models to handle the variability in speech patterns across languages. Evaluation and Validation: Conduct thorough evaluations to ensure the system performs well across different languages and speech styles. This may involve cross-validation techniques and testing on diverse datasets. Language Adaptation: Implement language adaptation techniques to fine-tune the models for specific languages. This could involve transfer learning or multi-task learning approaches. By following these steps, the system can be extended to effectively handle spontaneous speech and cater to a broader range of languages beyond French.

What are the potential ethical considerations and regulatory requirements for deploying such a system in real-world applications?

Deploying a system for evaluating transgender voice transition raises several ethical considerations and regulatory requirements: Informed Consent: Ensure that individuals providing voice samples for training and testing purposes give informed consent for the use of their data. Privacy and Data Security: Implement robust data security measures to protect the sensitive voice data collected. Adhere to data protection regulations such as GDPR. Bias and Fairness: Mitigate bias in the system to ensure fair evaluation of transgender voices. Regularly monitor and address any biases that may arise during system deployment. Transparency: Provide transparency regarding how the system works, the data used for training, and the limitations of the technology to users and stakeholders. Regulatory Compliance: Comply with regulations related to healthcare data, especially when the system is used in a clinical setting for voice therapy support. User Empowerment: Empower users by providing them with control over their data and the ability to understand and interpret the system's outputs. By addressing these ethical considerations and regulatory requirements, the deployment of such a system can be done responsibly and ethically.

How could the insights from this work on the relationship between acoustic features, age, and perceived gender be leveraged to improve voice therapy techniques for transgender individuals?

The insights from the study on the relationship between acoustic features, age, and perceived gender can be leveraged to enhance voice therapy techniques for transgender individuals in the following ways: Personalized Therapy: Tailor voice therapy programs based on individual characteristics such as age and specific acoustic features that impact perceived gender. Targeted Interventions: Focus on specific acoustic features that have a significant influence on gender perception, such as vocal pitch and quality, to guide therapy interventions effectively. Progress Monitoring: Use the continuous Voice Femininity Percentage (VFP) metric to track progress during voice transition therapy and provide feedback to individuals on their voice feminization or masculinization. Age Considerations: Take into account the impact of age on voice characteristics and gender perception when designing therapy plans, especially for older transgender individuals. Technology Integration: Integrate the non-binary voice gender prediction system into voice therapy sessions to provide real-time feedback and objective assessments of voice femininity or masculinity. By leveraging these insights, voice therapy techniques can be optimized to better support transgender individuals in achieving their desired voice transition goals effectively and efficiently.
0
star