toplogo
Sign In

Fractional Classification of Papers Based on Two Generations of References and the ASJC Scopus Scheme


Core Concepts
This paper presents and evaluates methods to fractionally assign ASJC categories to Scopus publications based on their first and second generation references, and compares the results to the original Scopus journal-level classification and an author-assigned classification.
Abstract

The paper presents and evaluates various methods to classify Scopus publications into ASJC categories based on their references. The key points are:

  • Three citation generation schemes are used - M1 using only first generation references, M2 using only second generation references, and M3 using both first and second generation references.

  • The reference counts are calculated using both full-counting and weighted-counting methods, with the weighted-counting also using averaging for the second generation references.

  • Three different thresholds (0.5, 0.67, 0.8) are used to determine the number of ASJC categories assigned to each publication.

  • The resulting classifications are compared to the original Scopus ASJC journal-level classification and an "Author's Assignation Collection" (AAC) where authors assigned their own publications to ASJC categories.

  • The classifications are evaluated based on metrics like the number and size of categories, the homogeneity of categories in terms of citation patterns, and the alignment with the AAC.

  • The results show that using second generation references with weighted-counting and a higher assignment threshold (0.8) provides the most promising classifications, with greater alignment to the AAC and more homogeneous categories compared to the original Scopus classification.

edit_icon

Customize Summary

edit_icon

Rewrite with AI

edit_icon

Generate Citations

translate_icon

Translate Source

visual_icon

Generate MindMap

visit_icon

Visit Source

Stats
93.10% of 2020 Scopus publications have at least 3 active references in the first and second generations combined. The highest percentage of publications with at most 2 active references is in the Arts and Humanities (38.69%), followed by Social Sciences (15.52%) and Nursing (11.83%). The average normalized citation impact of publications with at most 2 active references is 0.36.
Quotes
"Classifications in which a high threshold is set for allowing assignments to multiple categories, combined with the use of first- and second-generation references and averaging over the number of references, provide the most promising results, improving over other reference-based reclassification proposals in terms of granularity, and over the Scopus classification itself in such aspects as the homogeneity of the publications assigned to a category." "Increasing the threshold also increases the coefficient of variation and reduces the granularity. With regard to the counting methods, full-counting leads to a lower coefficient of variation and greater granularity than weighted-counting."

Deeper Inquiries

What other data sources or features could be incorporated to further improve the accuracy and granularity of the classification methods?

To enhance the accuracy and granularity of classification methods, additional data sources and features could be integrated. One approach could involve incorporating text mining techniques to analyze the full text of publications, including titles, abstracts, and keywords. By extracting key terms and concepts from the text, researchers can gain a more comprehensive understanding of the content and thematic focus of each paper, leading to more precise categorization. Furthermore, the inclusion of author keywords and metadata from the publications could provide valuable insights into the intended subject matter of the research. By leveraging this information, classification algorithms can better capture the multidimensional nature of scientific publications and assign them to appropriate categories with greater accuracy. Additionally, incorporating data from external databases or repositories that specialize in specific scientific domains could enrich the classification process. By cross-referencing publication data with domain-specific information, researchers can ensure that papers are categorized according to the most relevant and up-to-date disciplinary standards.

How do the citation patterns and interdisciplinary nature of research differ across scientific disciplines, and how could this be better accounted for in the classification approaches?

Citation patterns vary significantly across scientific disciplines due to differences in research practices, publication norms, and the nature of scholarly communication within each field. For instance, disciplines like Medicine and Physics often exhibit high citation rates, with a focus on referencing previous studies to build upon existing knowledge. In contrast, disciplines like Mathematics and Computer Science may have lower citation rates but emphasize the development of novel theoretical frameworks and algorithms. The interdisciplinary nature of research further complicates classification approaches, as publications in interdisciplinary fields may draw from multiple disciplines and incorporate diverse methodologies and concepts. To account for this complexity, classification methods could be designed to accommodate cross-disciplinary references and thematic overlaps. By implementing network analysis techniques to identify interdisciplinary connections and citation patterns, researchers can create more nuanced classification systems that reflect the interdisciplinary nature of modern research. Moreover, the integration of machine learning algorithms and natural language processing tools can help identify interdisciplinary themes and relationships within publications, enabling more accurate and nuanced categorization of interdisciplinary research outputs.

What are the potential applications and implications of having more accurate and granular classification of scientific publications, beyond just normalization of citation indicators?

Having a more accurate and granular classification of scientific publications can have far-reaching implications beyond the normalization of citation indicators. Some potential applications include: Enhanced Literature Review: Researchers can conduct more targeted literature reviews by accessing publications categorized with greater precision, leading to a deeper understanding of the state of the art in specific research areas. Facilitated Collaboration: Improved classification can facilitate interdisciplinary collaboration by connecting researchers with similar interests across different disciplines, fostering innovation and knowledge exchange. Resource Allocation: Funding agencies and institutions can make more informed decisions regarding resource allocation by identifying emerging research trends and areas of high impact through detailed classification data. Policy Development: Policymakers can utilize granular classification data to inform evidence-based policy decisions, especially in areas where scientific research plays a crucial role. Enhanced Discovery: Advanced classification methods can enable researchers to discover new connections and insights within the scientific literature, leading to the generation of novel research hypotheses and discoveries. Overall, a more accurate and granular classification of scientific publications can significantly impact the research ecosystem by promoting collaboration, innovation, and informed decision-making across various sectors.
0
star