Large language models (LLMs) have recently obtained strong performance on complex rea- soning tasks. However, their capabilities in specialized domains like law remain relatively unexplored. We present CLUEDO, a system to tackle a novel legal reasoning task that in- volves determining if a provided answer cor- rectly addresses a legal question derived from U.S. civil procedure cases. CLUEDO utilizes multiple collaborator models that are trained using multiple-choice prompting to choose the right label and generate explanations. These collaborators are overseen by a final "detective" model that identifies the most accurate answer in a zero-shot manner. Our approach achieves an F1 macro score of 0.74 on the development set and 0.76 on the test set, outperforming in- dividual models. Unlike the powerful GPT- 4, CLUEDO provides more stable predictions thanks to the ensemble approach. Our results showcase the promise of tailored frameworks to enhance legal reasoning capabilities in LLMs
MAINDZ at SemEval-2024 Task 5: CLUEDO-Choosing Legal oUtcome by Explaining Decision through Oversight / Benedetto, Irene; Koudounas, Alkis; Vaiani, Lorenzo; Pastor, Eliana; Cagliero, Luca; Tarasconi, Francesco. - (2024), pp. 997-1005. (Intervento presentato al convegno SemEval-2024 (Workshop of ACL) tenutosi a Mexico City (MEX) nel 20-21 June, 2024) [10.18653/v1/2023.semeval-1.144].
MAINDZ at SemEval-2024 Task 5: CLUEDO-Choosing Legal oUtcome by Explaining Decision through Oversight
Irene Benedetto;Alkis Koudounas;Lorenzo Vaiani;Eliana Pastor;Luca Cagliero;
2024
Abstract
Large language models (LLMs) have recently obtained strong performance on complex rea- soning tasks. However, their capabilities in specialized domains like law remain relatively unexplored. We present CLUEDO, a system to tackle a novel legal reasoning task that in- volves determining if a provided answer cor- rectly addresses a legal question derived from U.S. civil procedure cases. CLUEDO utilizes multiple collaborator models that are trained using multiple-choice prompting to choose the right label and generate explanations. These collaborators are overseen by a final "detective" model that identifies the most accurate answer in a zero-shot manner. Our approach achieves an F1 macro score of 0.74 on the development set and 0.76 on the test set, outperforming in- dividual models. Unlike the powerful GPT- 4, CLUEDO provides more stable predictions thanks to the ensemble approach. Our results showcase the promise of tailored frameworks to enhance legal reasoning capabilities in LLMsFile | Dimensione | Formato | |
---|---|---|---|
2024.semeval-1.144.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
174.31 kB
Formato
Adobe PDF
|
174.31 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2990376