Large language models can be used to automatically generate and evaluate multiple-choice reading comprehension test items with acceptable quality, especially for languages with limited available data.
Pre-trained language models can effectively control and evaluate the difficulty levels of MC cloze tests.