Dataframes have become essential tools for data scientists, but current systems struggle with responsiveness on moderately-large datasets. The paper outlines the need to make dataframes scalable while maintaining usability. By discarding traditional relational algebra assumptions and introducing new ones, such as rigid schemas and distinct rows and columns, the authors aim to address scalability issues. Their experience with Modin, a pandas replacement, highlights research challenges like metadata management and query optimization under these new assumptions. A proposed formalism for dataframes provides a foundation for future development in this area.
In un'altra lingua
dal contenuto originale
ar5iv.org
Approfondimenti chiave tratti da
by alle ar5iv.labs.arxiv.org 02-29-2024
https://ar5iv.labs.arxiv.org/html/2001.00888Domande più approfondite