BIBench introduces a comprehensive benchmark to assess Large Language Models (LLMs) in the context of Business Intelligence (BI). The benchmark evaluates LLMs across three dimensions: BI foundational knowledge, BI knowledge application, and BI technical skills. It comprises 11 sub-tasks covering classification, extraction, and generation tasks. Additionally, a domain-specific dataset called BIChat with over a million data points is developed to fine-tune LLMs. The goal is to provide a measure for evaluating LLM abilities in data analysis within the BI domain and foster advancements in this field.
To Another Language
from source content
arxiv.org
Viktige innsikter hentet fra
by Shu Liu,Shan... klokken arxiv.org 02-29-2024
https://arxiv.org/pdf/2401.02982.pdfDypere Spørsmål