The paper presents a new foundation model called nach0 that is designed to handle a wide range of natural language processing (NLP) and chemistry-related tasks. The model is built using an encoder-decoder transformer architecture and is pre-trained on both textual data (scientific literature and patents) as well as chemical data (SMILES strings).
The key highlights of the paper are:
The nach0 model is pre-trained on a diverse set of data sources, including scientific literature, patents, and molecular structures, to incorporate a range of chemical and linguistic knowledge.
The model is fine-tuned using a multi-task approach, where it is trained on a variety of tasks specified through natural language prompts. These tasks include NLP problems (e.g., named entity recognition, question answering), chemistry-related tasks (e.g., molecular property prediction, reaction prediction), and cross-domain tasks (e.g., description-guided molecule design).
Extensive experiments demonstrate that the nach0 model outperforms state-of-the-art baselines on both single-domain and cross-domain tasks. The model is able to generate high-quality outputs in both textual and molecular formats, showcasing its effectiveness in multi-domain setups.
The authors also present two case studies to illustrate the capabilities of the nach0 model in drug discovery and generative chemistry applications.
Overall, the paper introduces a novel multimodal foundation model that can effectively leverage both natural language and chemical data to tackle a diverse range of tasks, paving the way for advancements in areas such as drug discovery and materials design.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Mich... lúc arxiv.org 05-01-2024
https://arxiv.org/pdf/2311.12410.pdfYêu cầu sâu hơn