A Diverse Benchmark for Evaluating Scientific Natural Language Inference
The core message of this paper is to introduce MSCINLI, a diverse benchmark for evaluating scientific natural language inference (NLI) that covers multiple scientific domains, in contrast to the existing SCINLI dataset which is limited to the computational linguistics domain. The authors establish strong baselines using pre-trained language models and large language models, and show that MSCINLI is a challenging dataset that can be used to evaluate the complex reasoning capabilities of NLP models.