toplogo
Sign In

Evaluating the Suitability of OpenAlex for Bibliometric Analyses: A Comparative Study with Scopus


Core Concepts
OpenAlex can serve as a reliable alternative to traditional bibliographic databases like Scopus for certain types of bibliometric analyses, particularly at the country level, but additional research is needed to fully understand and address its limitations in metadata accuracy and completeness.
Abstract
This study, conducted in collaboration with the OpenAlex team, compares the coverage and suitability of OpenAlex and Scopus for bibliometric analyses. The key findings are: Coverage: OpenAlex contains 168.2M works published between 2000-2022, with the majority (82%) classified as articles. A substantial number of works in OpenAlex (67%) do not have any references, and 66% do not receive any citations. OpenAlex indexes 83% of the unique journals in Scopus, with 94% of the "active" Scopus journals matched. Comparable Analyses: The proportion of works by Open Access status is largely comparable between OpenAlex and Scopus, except for the Hybrid category where OpenAlex identifies 81% more works. The number of works classified as "articles" is very similar between the two databases, but differences are observed for other document types. Country-level analyses show a high degree of correlation (Spearman ρ > 0.95) between OpenAlex and Scopus. Citation counts per country are lower in OpenAlex compared to Scopus, likely due to the high proportion of works without indexed references. Language-level analyses show the lowest correlations, potentially due to differences in language detection approaches. Limitations and Future Work: Additional research is needed to fully understand and address issues related to metadata accuracy and completeness, particularly for fields such as author affiliations, references, citations, and language detection. Expanding coverage and improving metadata quality for non-article document types would further enhance the usefulness of OpenAlex. The community is encouraged to systematically assess the recall and precision of key metadata fields in OpenAlex, as well as explore the characteristics of the scholarship found only in OpenAlex. Overall, this study provides evidence that OpenAlex can be a reliable alternative to traditional databases for certain types of bibliometric analyses, while also highlighting areas that require further investigation and improvement.
Stats
168.2M works in OpenAlex published between 2000-2022, with a peak of 10.2M in 2020. 67% of works in OpenAlex do not have any references, and 66% do not receive any citations. 83% of the unique journals in Scopus can be matched to a source in OpenAlex, and 94% of the "active" Scopus journals are matched. Scopus has more citations per country compared to the subset of works in OpenAlex that are also in Scopus.
Quotes
"OpenAlex is a superset of Scopus and can be a reliable alternative for some analyses, particularly at the country level." "Additional research is needed to fully comprehend and address OpenAlex's limitations in metadata accuracy and completeness."

Deeper Inquiries

What are the characteristics of the scholarship found in OpenAlex but not in other databases, and how can this expanded coverage contribute to a more inclusive understanding of global research?

OpenAlex offers a more inclusive approach to indexing scholarly works compared to traditional databases like Scopus and Web of Science. One key characteristic of the scholarship found in OpenAlex but not in other databases is its broader representation of diverse disciplines and regions. OpenAlex captures a wider range of document types beyond just journal articles, including datasets, peer reviews, and other non-traditional research outputs. This expanded coverage allows for a more comprehensive view of global research activities, especially in fields and regions that are often underrepresented in mainstream databases. To leverage this expanded coverage for a more inclusive understanding of global research, the community can focus on several strategies: Field Normalization: Develop standardized approaches to classify works by subject and discipline, ensuring that diverse fields are adequately represented and normalized across databases. This will help in identifying interdisciplinary research trends and collaborations that may be overlooked in traditional databases. Language Detection Improvement: Enhance language detection algorithms to accurately identify the language of works in OpenAlex. This improvement will facilitate better analysis of multilingual scholarship and ensure that language diversity is accounted for in research assessments. Affiliation Data Enhancement: Collaborate with the OpenAlex team to improve the accuracy and completeness of author affiliation metadata. By ensuring that author affiliations are correctly attributed and linked to countries and institutions, researchers can gain insights into the global distribution of research contributions. Inclusive Indexing Policies: Encourage OpenAlex to continue its inclusive indexing policies, capturing works from regions and disciplines that are often marginalized in traditional databases. This will promote diversity and equity in research visibility and recognition. By embracing the characteristics of scholarship found in OpenAlex and working towards enhancing its coverage and metadata quality, the research community can foster a more inclusive and comprehensive understanding of global research landscapes.

How can the community work with the OpenAlex team to improve the accuracy and completeness of key metadata fields, such as author affiliations, references, citations, and language detection?

Collaboration between the research community and the OpenAlex team is essential to enhance the accuracy and completeness of key metadata fields in the database. Several strategies can be employed to work towards improving these aspects: Data Validation Workshops: Organize workshops or collaborative initiatives where researchers can validate and correct author affiliations, references, and citations in OpenAlex. This collective effort can help in rectifying inaccuracies and filling gaps in the metadata. Feedback Mechanisms: Establish feedback mechanisms within the OpenAlex platform for users to report errors or missing information in metadata fields. This direct communication channel can enable quick corrections and updates to enhance data quality. Metadata Standardization: Work with the OpenAlex team to develop standardized metadata formats and guidelines for author affiliations, references, and citations. By aligning metadata practices with industry standards, consistency and accuracy can be improved across the database. Language Detection Algorithms: Collaborate with experts in natural language processing and machine learning to refine language detection algorithms used in OpenAlex. By fine-tuning these algorithms, the database can more accurately identify the language of works, facilitating better language-based analyses. Community Contributions: Encourage researchers to contribute missing metadata information, such as author affiliations and references, to OpenAlex. Crowdsourcing efforts can help in filling data gaps and enhancing the overall completeness of the database. By fostering a collaborative environment between the research community and the OpenAlex team, continuous improvements can be made to key metadata fields, ensuring the accuracy and completeness of the database for robust bibliometric analyses.

Given the differences in how OpenAlex and Scopus classify works by document type, how can the community develop standardized approaches to ensure consistent and meaningful comparisons across these databases?

Standardizing approaches for classifying works by document type is crucial to enable consistent and meaningful comparisons between OpenAlex and Scopus. To achieve this, the community can implement the following strategies: Harmonization of Document Types: Develop a common taxonomy or classification scheme that aligns document types used in OpenAlex with those in Scopus. By mapping equivalent document types between the two databases, researchers can ensure that comparisons are based on similar categories. Metadata Mapping Guidelines: Create guidelines or best practices for mapping document types from OpenAlex to Scopus and vice versa. These guidelines should outline the criteria for categorizing works and provide clear instructions on how to match document types across databases. Cross-Database Validation Studies: Conduct validation studies to compare the classification of document types in OpenAlex and Scopus for a sample dataset. By analyzing discrepancies and inconsistencies, the community can identify areas where standardization is needed and refine the mapping process. Community Consensus Building: Engage researchers, bibliometricians, and database experts in discussions and workshops to establish consensus on standardized approaches for classifying document types. By involving diverse perspectives, the community can develop robust and widely accepted practices for cross-database comparisons. Continuous Monitoring and Updates: Regularly review and update the classification schemes for document types in both OpenAlex and Scopus to adapt to evolving research practices and publication trends. This ongoing monitoring ensures that comparisons remain relevant and accurate over time. By implementing these standardized approaches and guidelines, the research community can facilitate consistent and meaningful comparisons of document types between OpenAlex and Scopus, enabling researchers to conduct reliable bibliometric analyses across databases.
0