Language Models

サインイン

Evaluating Sentence Transformers' Understanding of Quasi-Geospatial Concepts from General Text

Sentence transformers fine-tuned on general question-answering datasets demonstrate some zero-shot ability to associate subjective queries about hiking experiences with synthetically generated route descriptions, but performance is mixed and model-dependent.

Efficient Method for Studying Cross-Lingual Transfer in Multilingual Language Models

The capacity and effectiveness of pre-trained multilingual models (MLMs) for zero-shot cross-lingual transfer is well established, but the phenomena of positive or negative transfer and the effect of language choice still need to be fully understood, especially in the complex setting of massively multilingual LMs. This work proposes an efficient method to study transfer language influence in zero-shot performance on another target language.

Enhancing the Agent Capabilities of Low-Parameter Language Models through Supervised Fine-Tuning and Multi-Branch Reasoning

Open-source low-parameter language models can have their agent capabilities significantly improved through supervised fine-tuning on agent-specific data and techniques like task decomposition and backtracking to enhance their reasoning abilities.

DiJiang: Efficient Large Language Models through Compact Frequency Domain Kernelization

The core message of this paper is that by leveraging frequency domain transformations and weighted Quasi-Monte Carlo sampling, the authors propose a novel Frequency Domain Kernelization (DiJiang) approach that can efficiently approximate the attention mechanism in Transformer models, leading to significant reductions in training costs and inference time while maintaining comparable performance.

Uncovering the Mechanisms Behind Factual Recall in Transformer-Based Language Models

Transformer-based language models employ a sequential process to achieve factual recall, involving argument extraction by task-specific attention heads, activation of the extracted argument by the MLP layer, and task-aware function application.

Large Language Models Can Improve Reasoning Abilities by Learning from Correct Steps

A novel intrinsic self-correction framework, Learning from Correctness (LECO), can significantly improve the reasoning performance of large language models across various tasks by progressively accumulating correct reasoning steps without relying on external feedback or handcrafted prompts.

Enhancing Comprehension and Mitigating Hallucination in Large Language Models through Optimal Paraphrasing and [PAUSE] Injection

Improving LLM comprehension through optimal paraphrasing and [PAUSE] injection can reduce hallucination in generated content.

Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages

Amharic LLaMA and LLaVA aim to enhance language models for low resource languages like Amharic through data augmentation and multimodal capabilities.

Broad Skills and Multiple Behaviours: Activation Steering Study

Activation steering can effectively reduce specific skills and behaviors in language models without significant negative impacts on overall performance.

Large Language Model Ranking without Ground Truth: A Novel Perspective

Given a dataset of prompts and a set of LLMs, ranking them without access to ground truth is possible by considering triplets of models.

1
2
3
4
5
•••
18

会社概要

プロダクト

リソース