Core Concepts
KGLiDS is a scalable platform that abstracts data science artifacts, captures semantics, and enables automation for data discovery, cleaning, transformation, and AutoML.
Abstract
KGLiDS addresses the lack of systematic knowledge sharing in data science.
The platform employs machine learning and knowledge graph technologies.
It abstracts pipelines and datasets to capture their semantics efficiently.
KGLiDS offers on-demand automation for data cleaning and transformation using GNN models.
The LiDS graph construction interlinks datasets with pipeline graphs.
Interfaces provide pre-defined operations for users to interact with the system effectively.
Stats
KGLiDS demonstrates significantly faster performance with lower memory usage compared to existing systems while maintaining accuracy.
Quotes
"Data scientists primarily work in isolation without exchanging knowledge."
"KGLiDS combines dataset search and pipeline generation within a single framework."
"KGLiDS enables automatic learning and discovery on open data science."