Vision-language models can convert webpage screenshots to HTML code efficiently with the WebSight dataset.