Conceptos Básicos
Foundational models can be used to generate dictionary example sentences that outperform existing expert-curated examples, by leveraging a novel method to identify sentences that best exemplify the meaning of words.
Resumen
The paper introduces a new method called FM-MLM (Foundational Model - Masked Language Model) for generating and evaluating dictionary example sentences in a low-cost, zero-shot manner.
Key highlights:
- FM-MLM uses foundational language models (LLMs) like Claude and Llama-2 to generate candidate sentences that illustrate the definition of a given word.
- It then employs a novel adaptation of pre-trained masked language models to score how well each candidate sentence exemplifies the meaning of the target word.
- The sentence with the highest exemplification score is selected as the final output.
- Experiments show that sentences generated by FM-MLM achieve an 85.1% win-rate when evaluated competitively against example sentences from the Oxford Dictionary, significantly outperforming prior model-generated sentences.
- The approach is shown to be cost-effective, with the full end-to-end process for 8,000 word senses estimated to cost less than $50.
- Ablation studies provide insights into the impact of different modeling choices, such as the choice of LLM, sentence generation strategy, and use of word definitions/POS.
- The work provides a refreshed low-cost baseline for generating high-quality dictionary example sentences that can benefit language learners.
Estadísticas
The Oxford Dictionary dataset contains 105,818 word senses across training, validation and test splits.
The validation set has 7,931 word senses with an average of 11.0 example sentences per sense.
The test set has 7,843 word senses with an average of 11.1 example sentences per sense.
Citas
"Dictionary example sentences play a vital role in illustrating the meanings and usage of headwords for dictionary users."
"Rapid advancements in foundational models (FMs) now offer new possibilities for more flexible and creative generation of dictionary example sentences at low cost."