toplogo
Connexion

WebCiteS: Chinese Web Search Summarization Dataset with Citations


Concepts de base
Large language models face challenges in correctly citing sources, emphasizing the need for improvement.
Résumé

WebCiteS introduces attributed query-focused summarization (AQFS) for Chinese web search results. The dataset features human-annotated summaries with citations derived from real-world user queries and search results. Evaluation metrics distinguish groundedness errors and citation errors, highlighting the challenge of explicit attribution in large language models. Models struggle with accurate citations, but supervised fine-tuning improves both summarization utility and attribution quality. Long-context settings reduce model performance, especially in accurately pinpointing supporting evidence within the context.

edit_icon

Personnaliser le résumé

edit_icon

Réécrire avec l'IA

edit_icon

Générer des citations

translate_icon

Traduire la source

visual_icon

Générer une carte mentale

visit_icon

Voir la source

Stats
WebCiteS offers 7k human-annotated summaries with citations. Existing datasets like ALCE lack high-quality citation annotations. WebGLM controls citation quality via Rouge score calculations.
Citations

Idées clés tirées de

by Haolin Deng,... à arxiv.org 03-05-2024

https://arxiv.org/pdf/2403.01774.pdf
WebCiteS

Questions plus approfondies

How can models be improved to better handle accurate citations in long-context settings?

In order to enhance the accuracy of citations in long-context settings, models can be improved through several strategies: Fine-tuning for Attribution: Models can undergo supervised fine-tuning specifically targeting attribution tasks. This process helps the model learn how to accurately cite sources and ground its generations within a longer context. Claim-Splitting Strategies: Implementing more advanced claim-splitting techniques can help models break down complex sentences into sub-claims, making it easier to verify partial support from multiple sources. Context Window Optimization: Optimizing the context window size of the model is crucial for handling long-context settings effectively. By adjusting the maximum document length or chunk sizes intelligently, models can focus on citing specific segments of information within lengthy documents. Integration of Retrieval Mechanisms: Incorporating retrieval mechanisms into the model architecture can aid in fetching relevant information from external sources, improving citation accuracy and grounding in extensive contexts. Enhanced NLI Models: Utilizing more robust Natural Language Inference (NLI) models that are trained on diverse datasets and capable of handling longer sequences can boost the evaluator's ability to detect partial support accurately.

What impact does retrieval quality have on the effectiveness of attributed query-focused summarization?

Retrieval quality plays a significant role in determining the effectiveness of attributed query-focused summarization: Relevance of Retrieved Information: High-quality retrieval ensures that only relevant and reliable information is extracted from external sources, which directly impacts the accuracy and credibility of generated summaries with proper attributions. Supporting Evidence Verification: Reliable retrieval mechanisms provide trustworthy supporting evidence for claims made by language models during generation, enabling accurate citations and enhancing overall trustworthiness. Reduced Hallucinations and Factual Errors: Improved retrieval quality reduces instances of hallucinations (generating false information) and factual errors by ensuring that generated content aligns with verified external sources. Task Completion Efficiency: Efficient retrieval processes save time by quickly gathering pertinent details required for generating well-supported summaries with appropriate attributions.

How can automatic evaluators be enhanced to detect partial support more effectively?

Automatic evaluators play a crucial role in assessing attribution accuracy in generative language models like LLMs: Advanced Claim-Split Models Integration: Integrating sophisticated claim-splitting techniques into automatic evaluators enables them to identify partial support more effectively by breaking down sentences into sub-claims for detailed verification. Fine-Tuned NLI Models: Fine-tuning Natural Language Inference (NLI) models specifically for detecting partial support scenarios enhances their capability to recognize nuanced relationships between claims and cited evidence accurately. 3Citation Mask Prediction: Automatic evaluators should predict citation masks dynamically based on contextual relevance rather than assuming all sentences require citations uniformly, allowing targeted assessment where necessary. 4Human-Centric Evaluation Alignment: Aligning automatic evaluation metrics closely with human annotations ensures that they capture nuances such as partial support comprehensively while maintaining consistency with human judgment standards.
0
star