Finding the Most Compute-Optimal Recipe for Repurposing Language Models into Embedding Models
This research paper presents an algorithm for determining the optimal combination of model size, data quantity, and fine-tuning method to create high-quality text embedding models from pre-trained language models while adhering to specific computational budgets.