แนวคิดหลัก
Prompt syntax and supplementary information have a significant impact on the knowledge retrieval performance of pretrained language models. Clausal syntax prompts outperform appositive syntax prompts, and range information is more helpful than domain information for improving retrieval performance.
บทคัดย่อ
The paper investigates the impact of prompt syntax and supplementary information on the knowledge retrieval performance of pretrained language models (PLMs). The authors introduce CONPARE-LAMA, a controlled paraphrasing probe that enables the systematic study of these factors.
Key highlights:
- Clausal syntax prompts (compound, complex) outperform appositive syntax prompts across different PLMs and datasets. Clausal prompts lead to more consistent knowledge retrieval and lower response uncertainty.
- Adding range information (e.g., "Paris is the capital of [MASK], which is a country") boosts performance more than adding domain information (e.g., "Paris is a city and is the capital of [MASK]"), though domain information is more reliably helpful across syntactic forms.
- The authors find that information helpful in isolation can be detrimental when combined, suggesting that PLMs struggle to efficiently incorporate supplementary information, especially when presented in appositive syntax.
- Consistency of knowledge retrieval is higher for clausal prompts compared to appositive prompts, indicating that syntax plays a crucial role in how PLMs process and retrieve relational knowledge.
The findings provide insights into the fragility of information flow in language representations achieved by PLMs and suggest that specialized training approaches leveraging controlled prompt engineering could improve their knowledge retrieval capabilities.
สถิติ
Paris is a city and is the capital of [MASK].
Paris is the capital of [MASK], which is a country.
The native language of [S] is [MASK].
[S] natively speaks [MASK].
[S] can be described as [MASK].
คำพูด
"Clausal syntax prompts (compound, complex) outperform appositive syntax prompts across different PLMs and datasets."
"Adding range information (e.g., "Paris is the capital of [MASK], which is a country") boosts performance more than adding domain information (e.g., "Paris is a city and is the capital of [MASK]"), though domain information is more reliably helpful across syntactic forms."
"The authors find that information helpful in isolation can be detrimental when combined, suggesting that PLMs struggle to efficiently incorporate supplementary information, especially when presented in appositive syntax."