핵심 개념
LMs struggle with open-domain planning due to syntactic and semantic errors.
초록
PROC2PDDL introduces a dataset pairing procedural texts with PDDL representations for evaluating action modeling. LMs face challenges in generating domain-specific programs and reasoning about events, as shown by low success rates. The dataset aims to bridge the gap between language models and formal planning, highlighting deficiencies in current approaches. Evaluation reveals difficulties in predicting preconditions and effects of actions, emphasizing the need for improved methodologies.
통계
GPT-3.5's success rate close to 0%
GPT-4's success rate around 35%
GPT-4 can only generate exactly matching DFs 16% of the time and solvable DFs 33% of the time.
인용구
"Linguistic models' deficiency in both generating domain-specific programs and reasoning about events."
"Models make both syntactic and semantic errors when predicting action definitions."