Evaluating Compositional and Conditional Reasoning Capabilities of Language Models in a Flight Booking Task
Contemporary large language models struggle to effectively handle complex conditional and compositional reasoning required for aligning detailed user preferences with available flight options.