Core Concepts
Systematic comparison of the performance of RoBERTa, BART, and LLaMA large language models in predicting molecular properties through fine-tuning, highlighting the importance of model architecture and scale.
Abstract
The study introduces a framework to systematically compare the efficacy of large language models (LLMs) - RoBERTa, BART, and LLaMA - for fine-tuning on molecular property prediction tasks. The authors pre-trained 18 configurations of these models with varying parameter sizes and dataset scales, and then fine-tuned them on six benchmarking tasks from DeepChem.
Key insights:
- LLaMA-based models generally offered the lowest validation loss, suggesting their superior adaptability across tasks and scales.
- Absolute validation loss is not a definitive indicator of model performance for fine-tuning tasks; model size plays a crucial role.
- For regression tasks, larger-sized ChemBART models using smaller data sets emerge as one of the best configurations.
- ChemLLaMA exhibits a clear proportional relationship between performance and model size, a pattern not observed in the other models.
- When sufficient computational resources and pre-training datasets are available, training large-scale ChemLLaMA models proves to be the most effective strategy for both regression and classification tasks.
The study underscores the importance of considering model architecture and dataset characteristics in deploying AI for molecular property prediction, paving the way for more informed and effective utilization of AI in drug discovery and related fields.
Stats
The authors pre-trained 18 configurations of RoBERTa, BART, and LLaMA models with varying parameter sizes (13M and 30M) and dataset sizes (10M, 20M, and 30M instances).
The fine-tuning datasets included 6 benchmarking tasks from DeepChem, with 3 regression tasks (Bace, Delaney, Lipo) and 3 classification tasks (Bace, Hiv, Tox21).
Quotes
"ChemLLaMA consistently demonstrated the lowest validation loss across all model sizes and epochs."
"Absolute validation loss is not a definitive indicator of model performance - at least for fine-tuning tasks; instead, model size plays a crucial role."
"When sufficient computational resources and MTR datasets are available, training large-scale ChemLLaMA models with extensive datasets proves to be the most effective strategy for both regression and classification tasks."