CPSDBench is a specialized evaluation benchmark tailored for the Chinese public security domain. It integrates datasets related to public security from real-world scenarios, assessing LLMs across text classification, information extraction, question answering, and text generation tasks. Innovative evaluation metrics are introduced to quantify LLM efficacy accurately. The study aims to enhance understanding of existing models' performance in addressing public security issues and guide future development of more accurate models.
Till ett annat språk
från källinnehåll
arxiv.org
Djupare frågor