Multimodal Large Language Model for Instructional Plan Guidance and Execution
MM-PlanLLM, a multimodal architecture that enables large language models to comprehend and guide users through complex procedural plans by leveraging both textual and visual information.