Core Concepts
In causal bandit problems with unknown causal graphs and potential confounders, discovering the full causal structure is unnecessary for regret minimization; instead, identifying a specific subset of latent confounders is sufficient to determine all possibly optimal intervention strategies.
Stats
The paper assumes all observed variables are discrete with a domain of {1, 2, 3, ..., K}.
The reward variable (Y) is binary, with a domain of {0, 1}.
The paper assumes semi-Markovian causal models where every unobserved variable has no parents and exactly two children, both observed.
The algorithms use parameters like α, δ1, δ2, δ3, δ4 to control the probability of learning the true causal graph and the confidence level of the regret bound.
The paper uses constants ϵ and γ in Assumptions 2 and 3 to represent the minimum gaps in causal effects required for reliable detection of ancestral relations and latent confounders, respectively.
The constant η in Assumption 4 represents a lower bound on the probability of observing a specific realization of a variable under an intervention.
Quotes
"For regret minimization, we identify that discovering the full causal structure is unnecessary; however, no existing work provides the necessary and sufficient components of the causal graph."
"We formally characterize the set of necessary and sufficient latent confounders one needs to detect or learn to ensure that all possibly optimal arms are identified correctly."
"We propose a randomized algorithm for learning the causal graph with a limited number of samples, providing a sample complexity guarantee for any desired confidence level."