Liu, T. J. B., Boullé, N., Sarfati, R., & Earls, C. J. (2024). Density estimation with LLMs: A geometric investigation of in-context learning trajectories. arXiv preprint arXiv:2410.05218.
This research paper investigates the capacity of large language models (LLMs) to perform density estimation (DE) directly from in-context data, aiming to understand the underlying mechanisms of this emergent ability.
The researchers prompt LLaMA-2 models with sequences of numbers sampled from target distributions and analyze the models' predicted probability density functions (PDFs) at increasing context lengths. They employ Intensive Principal Component Analysis (InPCA) to visualize and analyze the in-context DE trajectories in a low-dimensional probability space, comparing them to trajectories of classical DE methods like kernel density estimation (KDE) and Bayesian histograms. Furthermore, they develop a "bespoke KDE" model with adaptive kernel shape and bandwidth, optimizing it to emulate LLaMA's learning trajectory.
The study provides evidence that LLaMA-2 models implicitly employ a form of adaptive kernel density estimation for in-context DE. This finding suggests the presence of a "dispersive induction head" mechanism in LLMs, extending the concept of induction heads to continuous domains.
This research contributes to understanding the emergent mathematical abilities of LLMs, particularly their capacity for probabilistic reasoning and in-context learning of continuous stochastic systems.
The study focuses on unconditional density estimation and a limited set of target distributions. Future research could explore LLMs' performance on conditional DE tasks and more complex data distributions. Investigating the "dispersive induction head" hypothesis through analysis of internal LLM representations is another promising direction.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問