TY - GEN
T1 - Stress Test Evaluation of Biomedical Word Embeddings
AU - Araujo, Vladimir
AU - Carvallo, Andrés
AU - Aspillaga, Carlos
AU - Thorne, Camilo
AU - Parra, Denis
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - The success of pretrained word embeddings has motivated their use in the biomedical domain, with contextualized embeddings yielding remarkable results in several biomedical NLP tasks. However, there is a lack of research on quantifying their behavior under severe “stress” scenarios. In this work, we systematically evaluate three language models with adversarial examples – automatically constructed tests that allow us to examine how robust the models are. We propose two types of stress scenarios focused on the biomedical named entity recognition (NER) task, one inspired by spelling errors and another based on the use of synonyms for medical terms. Our experiments with three benchmarks show that the performance of the original models decreases considerably, in addition to revealing their weaknesses and strengths. Finally, we show that adversarial training causes the models to improve their robustness and even to exceed the original performance in some cases.
AB - The success of pretrained word embeddings has motivated their use in the biomedical domain, with contextualized embeddings yielding remarkable results in several biomedical NLP tasks. However, there is a lack of research on quantifying their behavior under severe “stress” scenarios. In this work, we systematically evaluate three language models with adversarial examples – automatically constructed tests that allow us to examine how robust the models are. We propose two types of stress scenarios focused on the biomedical named entity recognition (NER) task, one inspired by spelling errors and another based on the use of synonyms for medical terms. Our experiments with three benchmarks show that the performance of the original models decreases considerably, in addition to revealing their weaknesses and strengths. Finally, we show that adversarial training causes the models to improve their robustness and even to exceed the original performance in some cases.
UR - http://www.scopus.com/inward/record.url?scp=85111173817&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85111173817
T3 - Proceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021
SP - 119
EP - 125
BT - Proceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021
A2 - Demner-Fushman, Dina
A2 - Cohen, Kevin Bretonnel
A2 - Ananiadou, Sophia
A2 - Tsujii, Junichi
PB - Association for Computational Linguistics (ACL)
T2 - 20th Workshop on Biomedical Language Processing, BioNLP 2021
Y2 - 11 June 2021
ER -