BIBench introduces a comprehensive benchmark to assess Large Language Models (LLMs) in the context of Business Intelligence (BI). The benchmark evaluates LLMs across three dimensions: BI foundational knowledge, BI knowledge application, and BI technical skills. It comprises 11 sub-tasks covering classification, extraction, and generation tasks. Additionally, a domain-specific dataset called BIChat with over a million data points is developed to fine-tune LLMs. The goal is to provide a measure for evaluating LLM abilities in data analysis within the BI domain and foster advancements in this field.
翻譯成其他語言
從原文內容
arxiv.org
深入探究