A unified template filling framework that connects textual and visual modalities via natural language prompts to effectively address the event argument extraction task.