Core Concepts
POK´ELLMON, the first LLM-based agent that achieves human-parity performance in tactical Pok´emon battles, through in-context reinforcement learning, knowledge-augmented generation, and consistent action generation.
Abstract
The paper introduces POK´ELLMON, the first LLM-based agent that achieves human-parity performance in tactical Pok´emon battles. The key strategies employed by POK´ELLMON include:
In-Context Reinforcement Learning (ICRL): POK´ELLMON uses text-based feedback derived from battles to iteratively refine its action generation policy, without the need for explicit training.
Knowledge-Augmented Generation (KAG): POK´ELLMON retrieves external knowledge, such as type advantage relationships and move/ability effects, to combat hallucination and enable timely and proper decision-making.
Consistent Action Generation: POK´ELLMON generates multiple actions and selects the most consistent one, mitigating the "panic switching" phenomenon observed when the agent faces powerful opponents.
The paper presents a detailed evaluation of existing LLMs, including GPT-3.5, GPT-4, and LLaMA-2, in Pok´emon battles against a heuristic bot. The results show that these LLMs suffer from hallucination issues, often making ineffective or even detrimental actions.
To address these challenges, the authors implement POK´ELLMON, which demonstrates human-competitive battle abilities, achieving a 49% win rate in Ladder competitions and a 56% win rate in invited battles against experienced human players. The paper also reveals POK´ELLMON's vulnerabilities to human players' attrition strategies and deceptive tricks, which are identified as future work.
Stats
The agent repeatedly uses the same attack move but has zero effect to the opposing Pok´emon due to its ability "Dry Skin."
In turn 3, the agent uses "Psyshock", which cause zero damage to the opposing Pok´emon.
Quotes
"Drapion has boosted its attack to two times, posing a significant threat that could potentially knock out Doublade with a single hit. Since Doublade is slower and likely to be knocked out, I need to switch to Entei because..."