The paper proposes a novel syntax-based in-context example selection strategy for machine translation (MT) tasks. It computes the syntactic similarity between dependency trees using Polynomial Distance to select the most informative examples for in-context learning. Additionally, the authors present an ensemble strategy that combines examples selected by both word-level and syntax-level criteria.
The key highlights are:
For the first time, the authors introduce a syntax-based in-context example selection method for MT, going beyond previous approaches that focused on superficial word-level features.
The proposed ensemble strategy, which concatenates examples selected by BM25 and the syntax-based Polynomial Distance, takes advantage of both word-level closeness and deep syntactic similarity.
Experimental results on translation between English and 6 common languages show that the syntax-based methods and the ensemble strategy outperform various baselines, obtaining the highest COMET scores on 11 out of 12 translation directions.
The authors call on the NLP community to pay more attention to syntactic knowledge when embracing large language models, as syntax can effectively enhance in-context learning for syntax-rich tasks like MT.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Chenming Tan... at arxiv.org 03-29-2024
https://arxiv.org/pdf/2403.19285.pdfDeeper Inquiries