JetMoE-8B, a new 8B-parameter Large Language Model (LLM), demonstrates impressive performance while being trained with less than $0.1 million, outperforming the larger Llama2-7B and Llama2-13B-Chat models.
Applying user-defined constraints on the format and semantics of LLM outputs can streamline prompt-based development, integrate LLMs into existing workflows, satisfy product requirements, and enhance user trust and experience.
Leveraging the self-consistency of multiple language model samples to assess the reliability and factuality of generated text.
Introducing a novel method called "incremental utility" to estimate how much additional knowledge a demonstration brings to a large language model for few-shot in-context learning tasks, and showing its effectiveness compared to previous utility estimation approaches.
Mixture-of-Expert (MoE) language models can scale model size without increasing training cost, but face challenges in inference efficiency. This work studies the optimal training budget allocation for MoE models by incorporating both model performance and inference cost as key metrics.
A novel training-free Mixture of Experts (MoE) method called GRIFFIN that selects unique feedforward experts at the sequence level to enable efficient generation across a variety of large language models with different non-ReLU activation functions, while preserving the original model's performance.
LLM ATTRIBUTOR is a Python library that provides interactive visualizations to help LLM developers understand and improve the training data attribution of their models' text generation.
Incorporating a citation mechanism in large language models can enhance content transparency, verifiability, and accountability, addressing intellectual property and ethical concerns.