Grunnleggende konsepter
The decision maker aims to learn a treatment assignment policy that maximizes the equilibrium policy value in the presence of strategic behavior and capacity constraints.
Sammendrag
The content describes a dynamic model for capacity-constrained treatment assignment where human agents can respond strategically to the decision maker's policy. The key insights are:
Agents are heterogeneous in their raw covariates and ability to modify their covariates. They myopically best respond to the previous treatment assignment policy.
In the mean-field regime with an infinite population of agents, the threshold for receiving treatment under a given policy converges to the policy's mean-field equilibrium threshold. This result enables the development of a consistent estimator for the policy gradient.
In the finite regime with a large but finite number of agents, the system converges to the mean-field equilibrium in a stochastic version of fixed-point iteration.
The policy gradient can be estimated via a unit-level randomized experiment that applies symmetric, mean-zero perturbations to the policy parameters. This allows learning optimal policies in the presence of strategic behavior and capacity constraints.
The authors demonstrate the effectiveness of the policy gradient estimator in a semi-synthetic experiment using data from the National Education Longitudinal Study of 1988.
Statistikk
The content does not provide any specific numerical data or statistics. It focuses on the theoretical model and estimation framework.
Sitater
The content does not contain any striking quotes.