TY - GEN
T1 - SHROOM-INDElab at SemEval-2024 Task 6
T2 - 18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024
AU - Allen, Bradley P.
AU - Polat, Fina
AU - Groth, Paul
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - We describe the University of Amsterdam Intelligent Data Engineering Lab team's entry for the SemEval-2024 Task 6 competition. The SHROOM-INDElab system builds on previous work on using prompt programming and in-context learning with large language models (LLMs) to build classifiers for hallucination detection, and extends that work through the incorporation of context-specific definition of task, role, and target concept, and automated generation of examples for use in a few-shot prompting approach. The resulting system achieved fourth-best and sixth-best performance in the model-agnostic track and model-aware tracks for Task 6, respectively, and evaluation using the validation sets showed that the system's classification decisions were consistent with those of the crowd-sourced human labellers. We further found that a zero-shot approach provided better accuracy than a few-shot approach using automatically generated examples. Code for the system described in this paper is available on Github.
AB - We describe the University of Amsterdam Intelligent Data Engineering Lab team's entry for the SemEval-2024 Task 6 competition. The SHROOM-INDElab system builds on previous work on using prompt programming and in-context learning with large language models (LLMs) to build classifiers for hallucination detection, and extends that work through the incorporation of context-specific definition of task, role, and target concept, and automated generation of examples for use in a few-shot prompting approach. The resulting system achieved fourth-best and sixth-best performance in the model-agnostic track and model-aware tracks for Task 6, respectively, and evaluation using the validation sets showed that the system's classification decisions were consistent with those of the crowd-sourced human labellers. We further found that a zero-shot approach provided better accuracy than a few-shot approach using automatically generated examples. Code for the system described in this paper is available on Github.
UR - https://www.scopus.com/pages/publications/85215527455
U2 - 10.18653/v1/2024.semeval-1.120
DO - 10.18653/v1/2024.semeval-1.120
M3 - Contribución a la conferencia
AN - SCOPUS:85215527455
T3 - SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop
SP - 839
EP - 844
BT - SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop
A2 - Ojha, Atul Kr.
A2 - Dohruoz, A. Seza
A2 - Madabushi, Harish Tayyar
A2 - Da San Martino, Giovanni
A2 - Rosenthal, Sara
A2 - Rosa, Aiala
PB - Association for Computational Linguistics (ACL)
Y2 - 20 June 2024 through 21 June 2024
ER -