Core Concepts
Active learning can be used to efficiently estimate non-parametric choice models, overcoming identifiability challenges that arise with offline data.
Abstract
The paper studies the problem of actively learning a non-parametric choice model based on consumers' decisions. It presents a negative result showing that such choice models may not be identifiable, even with active learning. To overcome this, the paper introduces a Directed Acyclic Graph (DAG) representation of the choice model, which provably encodes all the information that can be inferred from the available data.
The key contributions are:
Indistinguishability Result: The paper shows that even with active learning, two different non-parametric choice models can be information-theoretically indistinguishable from each other.
DAG Representation: To address the identifiability issue, the paper introduces a novel DAG representation of non-parametric choice models. This representation can always be uniquely identified, assuming enough samples with suitably chosen choice sets.
Computing Choice Probabilities from the DAG: The paper demonstrates that given a DAG representation of a non-parametric choice model, one can efficiently calculate the probability of selecting an item from a given set of items.
Constructing the DAG with Exact Choice Probabilities: The paper provides an efficient algorithm that can construct the DAG representation when given exact choice probabilities.
Active Learning of the DAG Representation: The paper's primary technical contribution is a method for actively learning the DAG representation from noisy choice frequency data. This method carefully manages error propagation across DAG levels, leading to accurate DAG estimates using only a polynomial number of active queries.
Empirical Evaluation: Experiments on synthetic and real-world data show that the active learning algorithm significantly outperforms non-active choice model estimation approaches, while using fewer queries.
Stats
The choice model has n items and T consumer types/rankings.
The goal is to learn the top n0 positions of all frequent rankings (where n0 = αn for some constant α ∈ (0,1)) and their corresponding probabilities within accuracy ε.
Quotes
"We study the problem of actively learning a non-parametric choice model based on consumers' decisions."
"We present a negative result showing that such choice models may not be identifiable."
"To overcome the identifiability problem, we introduce a directed acyclic graph (DAG) representation of the choice model."