Sign In

A Systematic Review of Aspect-based Sentiment Analysis (ABSA): Trends, Domains, and Dataset Diversity

Core Concepts
The review identifies significant skewness in the research application domains and dataset domains of ABSA studies, with the majority focusing on product/service reviews and relying heavily on a few benchmark datasets. This mismatch and lack of diversity may hinder the development of ABSA solutions across different domains.
This systematic literature review examines trends and high-level patterns in aspect-based sentiment analysis (ABSA) research over nearly two decades. The key findings are: Research application domains and dataset domains are significantly skewed, with the majority of studies focusing on product/service reviews and using datasets from this domain. Only a small proportion of studies targeted other domains like healthcare, education, and policy. The SemEval restaurant and laptop review datasets dominate the ABSA literature, accounting for over 78% of the studies using datasets with 10 or more samples. This heavy reliance on a few benchmark datasets, especially from a narrow domain, may limit the generalizability of ABSA solutions. While deep learning (DL) approaches have rapidly overtaken traditional machine learning and linguistic methods since 2017, the latter approaches remain a steady presence in ABSA research. DL models, especially those based on recurrent neural networks and attention mechanisms, are the most common solution approach. The lack of dataset diversity, especially in important public sector domains like healthcare, education, and policy, is a significant issue. Many studies in these domains had to create their own datasets, indicating a scarcity of open-access resources. The findings suggest that the domain-dependent nature of ABSA and the skewed distribution of research effort and dataset resources could be hindering the development of generalizable ABSA solutions across different application areas. The ABSA research community should focus on expanding dataset diversity, especially in underrepresented domains, to enable more robust and versatile ABSA systems.
"The majority (65.32%, N=339) of the 519 studies did not aim for a specific research domain, but a greater proportion (70.95%, N=447) used datasets from the 'product/service review' domain." "Among the 757 unique 'research-domain, dataset-domain, dataset-name' combinations with ten or more studies: 90.48% (N=656) of the studies in the 'non-specific' research domain (95.77%, N=725) used datasets from the 'product/service review' domain." "Only 12 studies (2.31%) focused on the 'student feedback/education review' domain since 2008, and there were 8 studies (1.54%) in 'politics/policy-reaction' and 7 (1.35%) in 'healthcare/medicine'." "78.20% (N=592) of the 757 unique 'research-domain, dataset-domain, dataset-name' combinations with ten or more studies are taken up by the four SemEval datasets."
"The SemEval challenge datasets ... are the most extensively used corpora for aspect-based sentiment analysis" "The lack of non-commercial-domain datasets could have forced generic technical studies to use benchmark datasets from a single popular domain. Given ABSA problem's domain-dependent nature, this could have indirectly hindered the solution development and evaluation across domains."

Deeper Inquiries

How can the ABSA research community collaborate to create and share high-quality, open-access datasets across a diverse range of application domains, especially in the public sector?

To address the issue of limited dataset diversity in ABSA research, especially in under-researched public sector domains, collaboration within the research community is crucial. Here are some strategies for creating and sharing high-quality, open-access datasets: Establish Collaborative Networks: Researchers can form collaborative networks or consortia focused on specific application domains like healthcare, education, and policy. These networks can pool resources, expertise, and data to create comprehensive and diverse datasets. Data Sharing Initiatives: Encourage researchers to share their datasets openly through platforms like GitHub, Kaggle, or dedicated ABSA repositories. This promotes transparency, reproducibility, and allows for benchmarking across different datasets. Crowdsourcing and Citizen Science: Engage the public in data collection efforts through crowdsourcing platforms. This approach can help gather diverse perspectives and feedback, especially in domains like education and public policy. Public-Private Partnerships: Collaborate with government agencies, non-profit organizations, and industry partners to access existing datasets and co-create new ones. This can ensure a broader representation of real-world data in ABSA research. Data Standardization and Annotation Guidelines: Develop standardized data formats and annotation guidelines for specific domains to ensure consistency and interoperability across datasets. This can facilitate data sharing and comparison. Funding for Data Collection: Seek funding from government grants, research institutions, and philanthropic organizations to support data collection efforts in underrepresented domains. This can incentivize researchers to focus on creating datasets in these areas. By implementing these collaborative strategies, the ABSA research community can work towards creating a more diverse and inclusive dataset landscape, enabling more robust and generalizable research outcomes.

How can ABSA research be better integrated with real-world applications in domains like healthcare, education, and policy to drive more impactful and practical solutions?

Integrating ABSA research with real-world applications in domains like healthcare, education, and policy requires a multidisciplinary approach and close collaboration between researchers, domain experts, and stakeholders. Here are some ways to drive more impactful and practical solutions: Domain-Specific Customization: Tailor ABSA models and algorithms to the specific needs and nuances of each domain. Work closely with domain experts to identify relevant aspects, sentiments, and use cases that can drive actionable insights. Ethical Considerations: Ensure that ABSA solutions in sensitive domains like healthcare and education adhere to ethical guidelines regarding data privacy, consent, and security. Collaborate with legal and ethical experts to navigate regulatory requirements. User-Centric Design: Involve end-users, such as healthcare providers, educators, and policymakers, in the design and evaluation of ABSA tools. Incorporate user feedback to create user-friendly interfaces and actionable insights that meet their needs. Impact Assessment: Conduct rigorous impact assessments to measure the effectiveness and outcomes of ABSA solutions in real-world settings. Evaluate how these solutions contribute to improved decision-making, patient outcomes, educational outcomes, or policy formulation. Partnerships with Stakeholders: Forge partnerships with government agencies, educational institutions, healthcare providers, and policy organizations to co-create and implement ABSA solutions. Collaborate on pilot projects and case studies to demonstrate the value of ABSA in practical settings. Continuous Learning and Improvement: Establish feedback loops to continuously learn from real-world applications and refine ABSA models. Monitor performance metrics, gather user feedback, and iterate on the solutions to ensure relevance and effectiveness. By following these strategies, ABSA research can be effectively translated into impactful and practical solutions that address real-world challenges in healthcare, education, and policy domains.

What novel techniques or approaches could be developed to improve the generalizability of ABSA solutions beyond the limitations of current benchmark datasets?

To enhance the generalizability of ABSA solutions and overcome the limitations of current benchmark datasets, researchers can explore novel techniques and approaches. Here are some innovative strategies to improve generalizability: Domain Adaptation and Transfer Learning: Develop domain adaptation techniques that allow ABSA models to transfer knowledge from one domain to another. Utilize transfer learning approaches to leverage pre-trained models on large datasets and fine-tune them for specific domains. Multi-Task Learning: Implement multi-task learning frameworks that jointly optimize ABSA tasks with related NLP tasks such as sentiment analysis, entity recognition, or document classification. This can improve model robustness and generalizability across tasks and domains. Zero-Shot and Few-Shot Learning: Explore zero-shot and few-shot learning methods that enable ABSA models to generalize to new domains with minimal labeled data. Utilize techniques like meta-learning and model adaptation to adapt quickly to unseen domains. Adversarial Training: Incorporate adversarial training techniques to enhance model robustness and reduce domain shift. Adversarial domain adaptation methods can help the model learn domain-invariant features and improve generalization across diverse datasets. Ensemble Learning: Employ ensemble learning strategies to combine multiple ABSA models trained on different datasets or domains. Ensemble methods can improve prediction accuracy, reduce overfitting, and enhance generalizability by leveraging diverse model predictions. Semi-Supervised and Self-Supervised Learning: Explore semi-supervised and self-supervised learning approaches to leverage unlabeled data and domain-specific knowledge. Techniques like pseudo-labeling, self-training, and contrastive learning can enhance model performance on limited labeled data. Interpretable and Explainable Models: Develop interpretable and explainable ABSA models that provide insights into how the model makes predictions across different domains. This can enhance trust, transparency, and generalizability by revealing model decision-making processes. By incorporating these novel techniques and approaches, ABSA researchers can improve the generalizability of their solutions, adapt to diverse domains, and address the challenges posed by current benchmark datasets.