Dataframes have become essential tools for data scientists, but current systems struggle with responsiveness on moderately-large datasets. The paper outlines the need to make dataframes scalable while maintaining usability. By discarding traditional relational algebra assumptions and introducing new ones, such as rigid schemas and distinct rows and columns, the authors aim to address scalability issues. Their experience with Modin, a pandas replacement, highlights research challenges like metadata management and query optimization under these new assumptions. A proposed formalism for dataframes provides a foundation for future development in this area.
Till ett annat språk
från källinnehåll
ar5iv.org
Viktiga insikter från
by på ar5iv.labs.arxiv.org 02-29-2024
https://ar5iv.labs.arxiv.org/html/2001.00888Djupare frågor