Recovering latent action information from videos enables the training of powerful policies and world models without the need for action labels.