Language Model Interpretability

サインイン

インサイト - Language Model Interpretability

Fact Recall, Heuristics, or Guesswork in Language Models: A Precise Interpretation for Fact Completion

Language models (LMs) utilize various mechanisms for fact completion, including exact recall, heuristics, and guesswork, and understanding these mechanisms is crucial for accurate interpretation of LM behavior.

Analyzing How Language Models Encode Linguistic Knowledge Using Shapley Head Values

Language models like BERT and RoBERTa develop internal subnetworks that correspond to theoretical linguistic categories, demonstrating a degree of learned grammatical understanding that can be analyzed using Shapley Head Values and pruning techniques.

GPT-2의 민감한 방향 조사: 개선된 기준 및 SAE 비교 분석

언어 모델의 내부 작동 방식을 이해하기 위해 사용되는 민감도 방향 분석 기법을 개선하고, 특히 Sparse Autoencoder(SAE) 기반의 특징 분석 방법의 효과와 한계를 명확히 밝혔습니다.

Multi-Dimensional Features Exist in Language Models and Can Be Found Using Sparse Autoencoders

Contrary to the linear representation hypothesis, language models can and do learn inherently multi-dimensional features, as evidenced by the discovery of circular representations for concepts like days of the week and months of the year in GPT-2 and Mistral 7B using sparse autoencoders and novel irreducibility metrics.

언어 모델의 조정 및 해석을 위한 활성화 스케일링

본 논문에서는 언어 모델 내부에서 특정 작업에 중요한 역할을 하는 구성 요소를 파악하고, 이를 활용하여 모델의 예측을 효과적으로 조정하는 방법을 제시합니다.

Evidence-driven Predictions with Language Models: Retrieve to Explain

Retrieve to Explain (R2E) introduces a retrieval-based language model that prioritizes evidence for predictions, improving explainability and performance in complex tasks.

会社概要

プロダクト

リソース