Dataframes have become essential tools for data scientists, but current systems struggle with responsiveness on moderately-large datasets. The paper outlines the need to make dataframes scalable while maintaining usability. By discarding traditional relational algebra assumptions and introducing new ones, such as rigid schemas and distinct rows and columns, the authors aim to address scalability issues. Their experience with Modin, a pandas replacement, highlights research challenges like metadata management and query optimization under these new assumptions. A proposed formalism for dataframes provides a foundation for future development in this area.
他の言語に翻訳
原文コンテンツから
ar5iv.org
深掘り質問