The author proposes a recursive video captioning model, Video ReCap, to efficiently process videos of varying lengths and generate captions at multiple hierarchy levels.