Language Model

登入

洞見 - Language Model

Measuring Cross-lingual Knowledge Transfer Across Diverse Languages

The study demonstrates that language models can effectively transfer knowledge across diverse languages, with the transfer being largely independent of language proximity. This suggests the presence of language-agnostic representations that enable cross-lingual generalization.

Efficient Gated Linear Recurrent Neural Networks with Expanded State Size for Improved Language Modeling and Downstream Tasks

HGRN2 introduces a simple outer-product-based state expansion mechanism to significantly increase the recurrent state size of HGRN without introducing additional parameters, leading to improved performance in language modeling, image classification, and long-range tasks.

Efficient Infinite Context Transformers with Infini-attention

This work introduces an efficient attention mechanism called Infini-attention that enables Transformer-based Large Language Models (LLMs) to effectively process infinitely long inputs with bounded memory and computation.

MiniCPM: Efficient Small Language Models with Scalable Training Strategies

MiniCPM, a series of small language models with 1.2B and 2.4B non-embedding parameters, demonstrate capabilities on par with 7B-13B large language models through meticulous model wind tunnel experiments, a novel Warmup-Stable-Decay learning rate scheduler, and a two-stage pre-training strategy.

Efficient Streaming Deployment of Large Language Models with Attention Sinks

Introducing StreamingLLM, an efficient framework that enables large language models trained with a finite attention window to work on text of infinite length without fine-tuning by leveraging attention sinks.

Multicalibration Techniques for Reliable Confidence Scoring in Large Language Models

This paper proposes the use of "multicalibration" to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs), which can help detect hallucinations.

Improving Parameter Efficiency of Mixture-of-Experts Language Models through Dense Training and Sparse Inference

Employing dense training and sparse inference to enhance the parameter efficiency of Mixture-of-Experts (MoE) language models while maintaining comparable performance to dense models.

Failure of Generalization in Large Language Models: The Reversal Curse

Auto-regressive large language models (LLMs) trained on sentences of the form "A is B" fail to generalize to the reverse direction "B is A".

Enhancing Language Model Inference Efficiency and Privacy through Model-Aware Retrieval Augmentation

A novel model-aware approach that leverages language model token embeddings to efficiently determine when retrieval augmentation is necessary, without requiring access to sensitive pre-training data.

Language Models Implement Simple Vector Arithmetic to Solve Relational Tasks

Language models sometimes use a simple vector arithmetic mechanism to solve relational tasks by leveraging regularities encoded in their hidden representations.

關於我們

產品

資源