핵심 개념
GPTs' performance in cross-lingual legal QA scenarios.
통계
The dataset comprises data from different years: H29, H30, R01, R02, and R03.
English context length varies from 525 characters (H30) to 703 characters (R03).
Japanese context length ranges from 110 characters (H30) to 213 characters (R03).
인용구
"The GPT-4 model consistently outperforms the GPT-3.5 model across all independent yearly instances in both monolingual and cross-lingual settings."
"Monolingual settings generally yield higher accuracy scores than cross-lingual settings for both models."