The unification of large language models (LLMs) and knowledge graphs (KGs) presents significant data management opportunities and challenges, including consistency, scalability, knowledge editing, privacy, fairness, explainability, and human-in-the-loop approaches.
Developing a method to automatically estimate the proportions of pre-training data used to develop large language models, enabling more effective data management and model optimization.
Foundation models show promising performance on a range of diverse tasks unrelated to their training, making them highly applicable to the data discovery and data exploration domain. When carefully used, they outperform task-specific models and even human experts on three representative tasks: table-class detection, column-type annotation, and join-column prediction.
DynaWarp introduces a novel probabilistic indexing structure for efficient log data retrieval, outperforming existing solutions in terms of storage space, false positives, and query throughput.
Efficient data access paths for mixed vector-relational search require careful consideration of scan-based and index-based approaches to optimize performance.
The author introduces the concept of model lakes as repositories for managing diverse models, highlighting the need for new scientific solutions to address challenges in model management.