Core Concepts
Large language models offer a viable alternative to traditional topic modelling, providing more nuanced and relevant topics.
Abstract
The content explores the limitations of classic topic modelling approaches like LDA and BERTopic, highlighting the need for more sophisticated methods. It introduces large language models (LLMs) as an alternative for uncovering topics within text corpora. The experiments conducted demonstrate the effectiveness of LLMs in generating relevant and interpretable topics. Evaluation metrics are proposed to assess the quality and granularity of topics extracted by LLMs. A case study on COVID-19 vaccine hesitancy showcases the temporal analysis capabilities of LLMs.
Directory:
- Abstract
- Investigates limitations of classic topic modelling.
- Introduces large language models (LLMs) as an alternative.
- Introduction
- Importance of understanding topics in documents.
- Limitations of traditional topic analysis approaches.
- Related Work
- Overview of topic modelling and close-set classification.
- Experiments
- Experiment 1: Out-of-box approach struggles with granularity.
- Experiment 2: Incorporating seed topics improves granularity.
- Experiment 3: Summarisation enhances interpretability.
- Topic Extraction Evaluation
- Proposed evaluation metrics for assessing topic quality.
- Case Study: Temporal Analysis of COVID-19 Vaccine Hesitancy
- Utilizing LLMs for temporal analysis of changing reasons for vaccine hesitancy.
- Discussion & Conclusion
- Summary of key takeaways and future directions.
Stats
"Our findings indicate that LLMs with appropriate prompts can stand out as a viable alternative, capable of generating relevant topic titles and adhering to human guidelines to refine and merge topics."
"We empirically show that LLMs are capable of not just generating topics but also condensing overarching topics from their outputs."
Quotes
"Owing to differences in the pre-training corpus and RLHF strategies, various LLMs can exhibit variability in ‘zero-shot’ topic extraction, especially when utilizing only basic prompts."
"By incorporating seed topics, LLMs can generate topics with the desired granularity as specified by users."