Sentence transformers fine-tuned on general question-answering datasets demonstrate some zero-shot ability to associate subjective queries about hiking experiences with synthetically generated route descriptions, but performance is mixed and model-dependent.
The capacity and effectiveness of pre-trained multilingual models (MLMs) for zero-shot cross-lingual transfer is well established, but the phenomena of positive or negative transfer and the effect of language choice still need to be fully understood, especially in the complex setting of massively multilingual LMs. This work proposes an efficient method to study transfer language influence in zero-shot performance on another target language.
Open-source low-parameter language models can have their agent capabilities significantly improved through supervised fine-tuning on agent-specific data and techniques like task decomposition and backtracking to enhance their reasoning abilities.
The core message of this paper is that by leveraging frequency domain transformations and weighted Quasi-Monte Carlo sampling, the authors propose a novel Frequency Domain Kernelization (DiJiang) approach that can efficiently approximate the attention mechanism in Transformer models, leading to significant reductions in training costs and inference time while maintaining comparable performance.
Transformer-based language models employ a sequential process to achieve factual recall, involving argument extraction by task-specific attention heads, activation of the extracted argument by the MLP layer, and task-aware function application.
A novel intrinsic self-correction framework, Learning from Correctness (LECO), can significantly improve the reasoning performance of large language models across various tasks by progressively accumulating correct reasoning steps without relying on external feedback or handcrafted prompts.
Improving LLM comprehension through optimal paraphrasing and [PAUSE] injection can reduce hallucination in generated content.
Amharic LLaMA and LLaVA aim to enhance language models for low resource languages like Amharic through data augmentation and multimodal capabilities.
Activation steering can effectively reduce specific skills and behaviors in language models without significant negative impacts on overall performance.
Given a dataset of prompts and a set of LLMs, ranking them without access to ground truth is possible by considering triplets of models.