BIBench introduces a comprehensive benchmark to assess Large Language Models (LLMs) in the context of Business Intelligence (BI). The benchmark evaluates LLMs across three dimensions: BI foundational knowledge, BI knowledge application, and BI technical skills. It comprises 11 sub-tasks covering classification, extraction, and generation tasks. Additionally, a domain-specific dataset called BIChat with over a million data points is developed to fine-tune LLMs. The goal is to provide a measure for evaluating LLM abilities in data analysis within the BI domain and foster advancements in this field.
To Another Language
from source content
arxiv.org
ข้อมูลเชิงลึกที่สำคัญจาก
by Shu Liu,Shan... ที่ arxiv.org 02-29-2024
https://arxiv.org/pdf/2401.02982.pdfสอบถามเพิ่มเติม