Existing data science benchmarks fall short in capturing the complexity of real-world data science tasks. DSBench, a comprehensive benchmark, is introduced to evaluate the performance of data science agents on realistic data analysis and modeling tasks sourced from Eloquence and Kaggle competitions.
Virtual reality can significantly improve navigation and comparison performance in computational notebooks compared to desktop environments.
cuDF, an NVIDIA framework, can significantly accelerate Pandas-based data processing and analysis by leveraging the power of GPUs.
Large Language Models can be effectively leveraged as "Language Data Scientists" to automate low-level data analysis tasks by generating natural language action plans and executing them through a low-level executor.
HiRA-Pro introduces a novel approach for high-resolution alignment of multimodal spatio-temporal data, enhancing machine learning predictive performance in smart manufacturing processes.
Proposing a data-driven method, tLaSDI, that embeds thermodynamics in latent space dynamics identification.
Italian Twitter user demographic dataset DADIT enables improved gender and age prediction using text classifiers.
CommitBench는 커밋 메시지 생성을 위한 새로운 대규모 데이터셋으로, 기존 데이터셋의 한계를 극복하고 품질을 향상시키는 데 중요한 역할을 합니다.
Knowledge-to-SQL framework enhances text-to-SQL models by providing expert knowledge for accurate SQL generation.
Mobility data science presents challenges and opportunities in data collection, cleaning, analysis, and management for various applications.