Core Concepts
The authors propose a tree-in-the-loop approach using generAItor to enhance the explainability, comparability, and adaptability of language model outputs through visual representations of beam search trees.
Abstract
The content discusses the challenges faced by large language models (LLMs) in terms of explainability, comparability, and adaptability. It introduces generAItor, a visual analytics technique that leverages a tree-in-the-loop approach to address these challenges. The tool allows users to interact with beam search trees, explore alternative outputs, edit sequences, and fine-tune models based on adapted data. Through case studies and user studies, the effectiveness of generAItor in generating insights beyond template-based methods is demonstrated.
Large language models (LLMs) are widely used but face limitations such as repetitive content, lack of factual accuracy, and biases. The proposed tree-in-the-loop approach aims to improve understanding and access to LLM predictions by visualizing beam search trees. This allows users to explore model decisions, compare outputs, and adapt predictions according to their intentions.
The tool supports tasks like model prompting and configuration, tree exploration and explainability, guided text generation, comparative analysis, and model adaptation. By providing domain-specific word lists, semantic embeddings visualization, ontology treemaps for concept overview, ontological replacements for alternative suggestions in text generation tasks.
Overall, generAItor offers a comprehensive solution for analyzing language model outputs through interactive visualizations and tools that enhance user control over generated text.
Stats
Large language models (LLMs) are widely deployed in various downstream tasks.
Proposed approach focuses on making inputs and outputs accessible and explorable.
Beam search algorithm is commonly used in language model explanation methods.
Tool generates new insights in gender bias analysis beyond state-of-the-art methods.
Quantitative evaluation confirms adaptability of the model to user-preferences with few training samples.