Dataframes have become essential tools for data scientists, but current systems struggle with responsiveness on moderately-large datasets. The paper outlines the need to make dataframes scalable while maintaining usability. By discarding traditional relational algebra assumptions and introducing new ones, such as rigid schemas and distinct rows and columns, the authors aim to address scalability issues. Their experience with Modin, a pandas replacement, highlights research challenges like metadata management and query optimization under these new assumptions. A proposed formalism for dataframes provides a foundation for future development in this area.
Sang ngôn ngữ khác
từ nội dung nguồn
ar5iv.org
Thông tin chi tiết chính được chắt lọc từ
by lúc ar5iv.labs.arxiv.org 02-29-2024
https://ar5iv.labs.arxiv.org/html/2001.00888Yêu cầu sâu hơn