核心概念
The Krebs cycle dataset provides a standardized synthetic benchmark for evaluating causal learning methods using time series data from a real-world biochemical process.
摘要
The Krebs cycle dataset is a simulated dataset based on the fundamental Krebs cycle pathway in biochemistry. It provides a standardized benchmark for evaluating causal learning methods using time series data.
The dataset consists of four scenarios with varying lengths and numbers of time series, all containing 16 features representing the concentrations of different reactants in the Krebs cycle. The dataset is generated using a simulator that models the movement and reactions of molecules in a virtual box, introducing noise and nonlinearity into the time series.
The key features of the Krebs cycle dataset are:
- It is based on a real-world biochemical process, providing a more realistic test case than many synthetic benchmarks.
- The ground-truth causal relationships are known, allowing for quantitative evaluation of causal learning methods.
- The dataset is not R2-sortable, meaning it does not contain residual information that can be easily exploited by some causal learning algorithms.
- The dataset includes scenarios with varying time series lengths and numbers, allowing for the assessment of sample complexity and stability of causal learning methods.
The paper provides a baseline evaluation using the state-of-the-art DyNoTears method, demonstrating that the Krebs cycle dataset poses a significant challenge for current causal learning techniques. The dataset is made publicly available to encourage further research and development of more expressive causal learning models.
統計資料
The Krebs cycle dataset contains time series data with 16 features representing the concentrations of different reactants in the Krebs cycle.
The four scenarios in the dataset are:
KrebsN: 100 time series with 500 time steps and normally distributed initial concentrations.
Krebs3: 120 time series with 500 time steps and uniform priors for 3 selected reactants.
KrebsL: 10 time series with 5000 time steps and normally distributed initial concentrations.
KrebsS: 10,000 time series with 5 time steps and normally distributed initial concentrations.