toplogo
Sign In

Uncovering Biases in Sustainable Development Goal Classifications Across Major Bibliometric Databases


Core Concepts
Large bibliometric databases exhibit performative biases in their classifications of publications according to the United Nations' Sustainable Development Goals (SDGs), which significantly impact the visibility and impact measurement of scientific outputs.
Abstract
This study investigates the performative nature of SDG classifications in major bibliometric databases, including Web of Science, Scopus, and OpenAlex. The researchers created a jointly indexed publication dataset of 15,471,336 items published between 2015 and July 2023 to ensure even conditions for comparison across the databases. Key findings: The overlap in publications classified under the same SDGs by the different databases is extremely small, ranging from 1.3% to 7.2%, indicating significant divergence in their interpretations of the SDGs. Fine-tuning a large language model (DistilGPT-2) on the SDG-classified publications from each database reveals high sensitivity in the model's architecture, the fine-tuning process, and the text generation, highlighting the arbitrariness in the classifications. Analyzing the noun phrases generated by the fine-tuned models shows distinct linguistic features and perspectives associated with each database's SDG classifications, suggesting that the classifications reflect specific data biases. The findings raise concerns about the use of these bibliometric classifications in research practice and policy decision-making, as they can significantly influence the visibility and perceived impact of scientific outputs related to the SDGs.
Stats
The jointly indexed publication dataset of Web of Science, OpenAlex, and Scopus contains 15,471,336 publications published between 2015 and July 2023. Less than 8% of publications are uniformly assigned to a SDG by all three databases.
Quotes
"Bibliometric classifications, while striving to offer objective measures, seem to present a specific focus, which is crucial in the attribution of social relevance via SDG classifications." "Depending on the applied classification, scientists and institutions working in the aforementioned fields might, or might not, be able to empirically underline their impact to policy makers."

Deeper Inquiries

How can the identified biases in SDG classifications be mitigated to ensure more accurate and consistent representation of scientific contributions towards the SDGs?

The identified biases in SDG classifications can be mitigated through several strategies: Standardization of Classification Criteria: Establishing clear and standardized criteria for classifying publications under each SDG can help reduce discrepancies. Consistent guidelines and definitions can ensure that publications are categorized accurately across different databases. Collaborative Efforts: Encouraging collaboration among database providers, researchers, and policymakers to develop a unified classification system can help harmonize SDG classifications. By working together, stakeholders can address inconsistencies and improve the accuracy of categorization. Transparency and Accountability: Implementing transparency measures in the classification process, such as providing detailed explanations for classification decisions, can enhance accountability. This transparency can help identify and rectify biases in the classification system. Regular Audits and Reviews: Conducting regular audits and reviews of the classification process can help identify and correct biases. Independent assessments can ensure that classifications align with the intended goals of the SDGs and reflect the true impact of scientific contributions. Training and Education: Providing training and education to database curators and users on the SDGs and classification criteria can improve understanding and consistency in categorizing publications. Clear guidelines and training programs can help reduce subjective interpretations and biases. By implementing these strategies, stakeholders can work towards a more accurate and consistent representation of scientific contributions towards the SDGs, ultimately enhancing the credibility and reliability of bibliometric analyses in the context of sustainable development.

What are the potential implications of these biases on funding decisions, policy-making, and the overall direction of research related to sustainable development?

The biases in SDG classifications can have significant implications on funding decisions, policy-making, and the direction of research related to sustainable development: Funding Allocation: Biases in SDG classifications can influence funding decisions by misrepresenting the impact of scientific contributions. Inaccurate categorization may lead to misallocation of resources, with funding being directed towards areas that are perceived as more impactful based on biased classifications. Policy Formulation: Biases in SDG classifications can distort the evidence base used for policy formulation. Policymakers rely on bibliometric analyses to inform policy decisions, and inaccurate classifications can result in policies that are not aligned with the true societal impact of research in sustainable development. Research Prioritization: Biases in classifications can skew the perception of which research areas are deemed more relevant or impactful in the context of sustainable development. This can lead to a misalignment between research priorities and the actual needs of society, hindering progress towards achieving the SDGs. Public Perception: Biases in classifications can affect public perception of the effectiveness and relevance of scientific research in addressing sustainable development challenges. Inaccurate representations of research impact may erode public trust in the scientific community and impede efforts to engage the public in sustainability initiatives. Addressing biases in SDG classifications is crucial to ensuring that funding decisions, policy-making, and research direction are based on accurate and reliable data, ultimately advancing progress towards sustainable development goals.

How can large language models be further leveraged to uncover and address biases in other types of bibliometric and scientific classifications beyond the SDGs?

Large language models (LLMs) can be leveraged to uncover and address biases in other types of bibliometric and scientific classifications beyond the SDGs through the following approaches: Fine-tuning for Specific Classifications: Similar to the approach taken in the study on SDG classifications, LLMs can be fine-tuned on specific classification criteria to analyze and compare how different databases categorize scientific publications. By training LLMs on diverse classification systems, researchers can uncover inconsistencies and biases in various domains. Text Analysis Techniques: Utilizing text analysis techniques such as noun phrase extraction and topic modeling, LLMs can identify patterns and discrepancies in how publications are classified. By analyzing the language used in classifications, LLMs can reveal underlying biases and inconsistencies that may impact the accuracy of bibliometric analyses. Decoding Strategies: Experimenting with different decoding strategies, as done in the study, can help researchers understand how LLMs generate responses based on the input data. By exploring various decoding methods, researchers can gain insights into the decision-making processes of LLMs and identify potential sources of bias in classification tasks. Collaborative Research: Engaging in collaborative research efforts that involve experts in both natural language processing and bibliometrics can enhance the effectiveness of LLMs in uncovering biases. By combining domain knowledge with advanced language modeling techniques, researchers can develop innovative approaches to address biases in scientific classifications. By applying these strategies and leveraging the capabilities of LLMs, researchers can advance the field of bibliometrics and scientific classification, leading to more accurate and unbiased representations of research impact across various disciplines.
0