Optimal Experimentation with Disentangled Exploration and Exploitation
The optimal policy features complete learning asymptotically, exhibits lots of persistence, but cannot be identified by an index à la Gittins when exploration and exploitation are disentangled.