Evaluating the Capabilities of Data Science Agents: A Comprehensive Benchmark for Realistic Data Analysis and Modeling Tasks
Existing data science benchmarks fall short in capturing the complexity of real-world data science tasks. DSBench, a comprehensive benchmark, is introduced to evaluate the performance of data science agents on realistic data analysis and modeling tasks sourced from Eloquence and Kaggle competitions.