BIBench introduces a comprehensive benchmark to assess Large Language Models (LLMs) in the context of Business Intelligence (BI). The benchmark evaluates LLMs across three dimensions: BI foundational knowledge, BI knowledge application, and BI technical skills. It comprises 11 sub-tasks covering classification, extraction, and generation tasks. Additionally, a domain-specific dataset called BIChat with over a million data points is developed to fine-tune LLMs. The goal is to provide a measure for evaluating LLM abilities in data analysis within the BI domain and foster advancements in this field.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Shu Liu,Shan... ב- arxiv.org 02-29-2024
https://arxiv.org/pdf/2401.02982.pdfשאלות מעמיקות