The authors present the ICLR dataset, which consists of abstracts and metadata for over 24,000 papers submitted to the ICLR conference from 2017 to 2024. The dataset includes information such as author names, keywords, review scores, and acceptance decisions.
The authors use this dataset to conduct a metascience study of the machine learning field. They find that while the gender balance has improved over the years, with the inferred female ratio for first and last authors increasing from around 10% in 2017 to 21% and 18% respectively in 2024, there are still no systematic differences in gender ratio across different machine learning subfields.
The authors also use the dataset to frame an NLP challenge, where the goal is to train a language model that can substantially outperform a simple TF-IDF representation in terms of kNN classification accuracy on the ICLR abstracts. Surprisingly, the authors find that most dedicated sentence transformer models perform worse than TF-IDF, and none outperform it by a large margin. This suggests that the kNN graph quality, which is relevant for the authors' application of 2D visualization, is not well captured by the current state-of-the-art language models.
The authors use the SBERT representation of the ICLR abstracts and apply t-SNE to embed them in 2D. This 2D embedding reveals rich structure, with related topics clustering together. By overlaying the conference year and topic labels, the authors are able to identify trends in machine learning research, such as the rise of diffusion models and the decline of recurrent neural networks and adversarial examples.
Finally, the authors analyze the distribution of papers containing certain keywords in their titles, such as "understanding", "rethinking", and "?", to identify potentially controversial topics within machine learning. They also examine the most prolific authors, distinguishing between "hedgehogs" who focus on a single topic and "foxes" who work across multiple areas.
toiselle kielelle
lähdeaineistosta
arxiv.org
Syvällisempiä Kysymyksiä