CPSDBench is a specialized evaluation benchmark tailored for the Chinese public security domain. It integrates datasets related to public security from real-world scenarios, assessing LLMs across text classification, information extraction, question answering, and text generation tasks. Innovative evaluation metrics are introduced to quantify LLM efficacy accurately. The study aims to enhance understanding of existing models' performance in addressing public security issues and guide future development of more accurate models.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Xin Tong,Bo ... om arxiv.org 03-05-2024
https://arxiv.org/pdf/2402.07234.pdfDiepere vragen