Language Modeling

登入

洞見 - Language Modeling

Multilingual Language Models Benefit from Imbalanced Training Data

Language imbalance during training can boost cross-lingual generalization in multilingual language models, leading to better performance on less frequent languages.

A Mathematical Theory for Learning Semantic Capabilities in Abstract Language Models

A mathematical theory is developed to explain the emergence of learned skills in large language models when the number of system parameters and the size of training data surpass certain thresholds.

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

Modernizing n-gram language models by scaling the training data to 5 trillion tokens and extending the n-gram to be unbounded, enabling novel analyses of human-written and machine-generated text, and improving the performance of large neural language models.

NumeroLogic: Enhancing Large Language Models' Numerical Reasoning through Digit Count Encoding

Incorporating the count of digits before each number, a technique called NumeroLogic, enhances the numerical capabilities of large language models by enabling them to better understand the place value of digits and reason about the magnitude of numbers before generating them.

Reference Resolution As Language Modeling: An Effective Approach for Resolving Ambiguous References in Conversational and On-Screen Contexts

This paper demonstrates how Large Language Models (LLMs) can be effectively used to perform reference resolution, a crucial task for conversational agents, by converting it into a language modeling problem. The authors propose a novel approach to encode on-screen entities as text, enabling the LLM to handle both conversational and on-screen references.

Jamba: A Hybrid Transformer-Mamba Language Model with Improved Performance and Efficiency

Jamba is a novel hybrid language model architecture that combines Transformer and Mamba (state-space) layers, along with a mixture-of-experts (MoE) component, to achieve improved performance and efficiency compared to pure Transformer models.

LongLoRA: Efficient Extension of Context for Large Language Models

LongLoRA presents an efficient fine-tuning approach to extend the context of large language models, reducing computational costs while maintaining performance.

Large Language Models: Origins of Linear Representations

High-level semantic concepts are encoded linearly in large language models due to the next token prediction objective and the implicit bias of gradient descent.

The Invalsi Benchmark: Evaluating Language Models in Italian

The release of the Invalsi dataset provides a challenging benchmark for evaluating language models in Italian, paving the way for future improvements in mathematical and language understanding.

Quantitative Predictability of Model Performance with Data Mixing Laws

Quantitatively predict model performance with data mixing laws.

關於我們

產品

資源