Dataframes have become essential tools for data scientists, but current systems struggle with responsiveness on moderately-large datasets. The paper outlines the need to make dataframes scalable while maintaining usability. By discarding traditional relational algebra assumptions and introducing new ones, such as rigid schemas and distinct rows and columns, the authors aim to address scalability issues. Their experience with Modin, a pandas replacement, highlights research challenges like metadata management and query optimization under these new assumptions. A proposed formalism for dataframes provides a foundation for future development in this area.
翻譯成其他語言
從原文內容
ar5iv.org
深入探究