텍스트 기반 게임 환경에서 대형 언어 모델(LLM) 기반 에이전트의 성능을 향상시키기 위해 긍정적 경험과 부정적 경험 모두를 활용하는 새로운 반영 기법인 Sweet&Sour를 제안한다.
LLMベースのエージェントにおける自己反省の際に、従来の失敗からの学習だけでなく、成功体験も活用することで、より効果的な学習とパフォーマンスの向上が見込める。
This paper introduces CAAP, a novel LLM-based agent that can solve computer tasks by mimicking human behavior, relying solely on visual input from the screen and executing actions through keyboard and mouse commands.
Hierarchical LLM agents trained with a novel Environment Preference Optimization (EPO) method, which leverages multimodal environment feedback to generate reward and preference signals, achieve state-of-the-art performance on long-horizon decision-making tasks in the ALFRED benchmark.