“I'm sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset

Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, Adina Williams

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

78 Scopus citations

Abstract

As language models grow in popularity, it becomes increasingly important to clearly measure all possible markers of demographic identity in order to avoid perpetuating existing societal harms. Many datasets for measuring bias currently exist, but they are restricted in their coverage of demographic axes and are commonly used with preset bias tests that presuppose which types of biases models can exhibit. In this work, we present a new, more inclusive bias measurement dataset, HOLISTICBIAS, which includes nearly 600 descriptor terms across 13 different demographic axes. HOLISTICBIAS was assembled in a participatory process including experts and community members with lived experience of these terms. These descriptors combine with a set of bias measurement templates to produce over 450, 000 unique sentence prompts, which we use to explore, identify, and reduce novel forms of bias in several generative models. We demonstrate that HOLISTICBIAS is effective at measuring previously undetectable biases in token likelihoods from language models, as well as in an offensiveness classifier. We will invite additions and amendments to the dataset, which we hope will serve as a basis for more easy-to-use and standardized methods for evaluating bias in NLP models.

Original languageEnglish
Title of host publicationProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
EditorsYoav Goldberg, Zornitsa Kozareva, Yue Zhang
PublisherAssociation for Computational Linguistics (ACL)
Pages9180-9211
Number of pages32
ISBN (Electronic)9781959429401
DOIs
StatePublished - 2022
Externally publishedYes
Event2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Hybrid, Abu Dhabi, United Arab Emirates
Duration: Dec 7 2022Dec 11 2022

Publication series

NameProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022

Conference

Conference2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityHybrid, Abu Dhabi
Period12/7/2212/11/22

Fingerprint

Dive into the research topics of '“I'm sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset'. Together they form a unique fingerprint.

Cite this