The paper discusses the authors' submission to the SemEval 2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure. The task involves evaluating the ability of large language models (LLMs) to interpret and apply legal principles and laws to specific case questions.
The authors first established a baseline by fine-tuning various BERT models on the task dataset. They encountered an issue with the models disproportionately favoring the 0 label (incorrect answer) due to the dataset's imbalance. To address this, the authors explored data augmentation using the Casehold corpus, but this did not yield productive results.
Next, the authors experimented with finetuning a Longformer model, which could handle longer context lengths compared to BERT. However, this approach did not outperform the fine-tuned BERT models.
The authors then turned to few-shot prompting with GPT-3.5 and GPT-4. They found that reformulating the task as a multiple-choice QA problem, rather than binary classification, significantly improved the performance of the GPT models. The authors also applied a rule-based algorithm to further enhance the results.
The authors' best submission, a fine-tuned BERT model with the rule-based algorithm, ranked 7th out of 20 on the competition leaderboard. Their overall best results came from the few-shot prompting on GPT models using the multi-choice QA format, combined with the rule-based algorithm.
The paper concludes with insights and future directions, including the potential benefits of integrating specific laws or precedents as a form of analysis to further enhance the capabilities of LLMs in legal reasoning tasks.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Anish Pahila... at arxiv.org 04-05-2024
https://arxiv.org/pdf/2404.03150.pdfDeeper Inquiries