BIBench introduces a comprehensive benchmark to assess Large Language Models (LLMs) in the context of Business Intelligence (BI). The benchmark evaluates LLMs across three dimensions: BI foundational knowledge, BI knowledge application, and BI technical skills. It comprises 11 sub-tasks covering classification, extraction, and generation tasks. Additionally, a domain-specific dataset called BIChat with over a million data points is developed to fine-tune LLMs. The goal is to provide a measure for evaluating LLM abilities in data analysis within the BI domain and foster advancements in this field.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Shu Liu,Shan... alle arxiv.org 02-29-2024
https://arxiv.org/pdf/2401.02982.pdfDomande più approfondite