Existing data science benchmarks fall short in capturing the complexity of real-world data science tasks. DSBench, a comprehensive benchmark, is introduced to evaluate the performance of data science agents on realistic data analysis and modeling tasks sourced from Eloquence and Kaggle competitions.


coremsg

evaluating-the-capabilities-of-data-science-agents-a-comprehensive-benchmark-for-realistic-data-analysis-and-modeling-tasks


Evaluating the Capabilities of Data Science Agents: A Comprehensive Benchmark for Realistic Data Analysis and Modeling Tasks