Comprehensive Benchmark for Evaluating Causal Learning Capabilities of Large Language Models
CausalBench is a comprehensive benchmark designed to thoroughly evaluate the causal learning capabilities of large language models (LLMs) across diverse datasets, tasks, and prompt formats.