Основные понятия
Associating astronomical observations with natural language using a neural network model.
Аннотация
The paper introduces PAPERCLIP, a method that connects astronomical observations imaged by telescopes with natural language using a neural network model. By fine-tuning a pre-trained Contrastive Language–Image Pre-training (CLIP) model, the study demonstrates meaningful joint representations between observations and natural language. The methodology involves dataset construction from Hubble Space Telescope data, contrastive language-image pre-training, and evaluation metrics for image and text retrieval tasks. Results show improved performance over the base CLIP model in quantitative metrics and quality of text-to-image and image-to-text retrieval.
Статистика
31,859 images corresponding to 4,438 abstracts included in the fine-tuning dataset.
Training takes approximately 3 hours on 4 Nvidia A100 GPUs.
Base CLIP model uses a vision transformer with patch size 16x16 for image encoding.