SmartPlay introduces a challenging benchmark to evaluate the capabilities of large language models (LLMs) as intelligent agents.