核心概念
Large Language Models can be effectively leveraged as "Language Data Scientists" to automate low-level data analysis tasks by generating natural language action plans and executing them through a low-level executor.
摘要
This study evaluates the performance of a "Language Data Scientist" (LDS) model that utilizes OpenAI's GPT-3.5 to analyze datasets and answer data science queries in a zero-shot manner. The key aspects of the methodology are:
- Gathering background information on the dataset using Pandas functions to provide context to the GPT-based Action Plan Generator (AcPG).
- The AcPG generates a natural language action plan to answer the given query, leveraging techniques like Chain-of-Thought and SayCan prompt engineering.
- The action plan is then executed by a low-level executor to produce the final answer.
The LDS was tested on 15 benchmark datasets of varying sizes (small, medium, large) and question difficulty levels. Overall, the LDS achieved an accuracy of 32.89% in correctly answering the queries. The performance was relatively stable across dataset sizes, with the LDS performing best on large datasets (36% accuracy).
The key challenges faced were instances of the GPT model generating incorrect code, and limitations imposed by the token limits of the GPT API, which restricted the amount of context that could be provided for larger datasets. Future work will explore the use of more advanced language models like GPT-4 to address these limitations and further improve the performance of the LDS.
统计
The LDS was able to correctly answer 27 out of 75 questions (36%) on the large benchmark datasets.
The LDS was able to correctly answer 22 out of 75 questions (29.33%) on the medium benchmark datasets.
The LDS was able to correctly answer 25 out of 75 questions (33.33%) on the small benchmark datasets.
引用
"Significant work already exists in this field. Automatic Prompt Engineers (APEs) have proved useful in demonstrating the power of LLMs to extrapolate correlatory data; when given a set of inputs, APEs are able to identify the most likely "instruction" for the specific set of inputs."
"Though existing efforts into improving the accessibility of ML models for non-ML experts are generally well-supported, such efforts are rarely directed towards ameliorating direct, user-generated queries in the field of data science."