Ferret-UI: A Multimodal Large Language Model for Comprehensive Mobile UI Understanding
Ferret-UI, a multimodal large language model, is designed to enhance the understanding and interaction with mobile user interface (UI) screens through improved referring, grounding, and reasoning capabilities.