Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels

Dan Li, Maarten De Rijke

Research output: Contribution to conferencePaperpeer-review

Abstract

Label aggregation (LA) is the task of inferring a high-quality label
for an example from multiple noisy labels generated by either human annotators or model predictions. Existing work on LA assumes
a label generation process and designs a probabilistic graphical
model (PGM) to learn latent true labels from observed crowd labels. However, the performance of PGM-based LA models is easily
affected by the noise of crowd labels. As a consequence, the performance of LA models differs on different datasets and no single LA
model outperforms the others on all datasets.
We extend PGM-based LA models by integrating a Gaussian
process (GP) prior on the true labels. The advantage of LA models
extended with a GP prior is that they can take as input crowd
labels, example features, and existing pre-trained label prediction
models to infer the true labels, while the original LA can only
leverage crowd labels. Experimental results on both synthetic and
real datasets show that any LA model extended with a GP prior
and a suitable mean function achieves better performance than the
underlying LA model, demonstrating the effectiveness of using a
GP prior.
Original languageAmerican English
StatePublished - 2023
Externally publishedYes
EventProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23) -
Duration: Jul 23 2023Jul 27 2023

Conference

ConferenceProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23)
Period07/23/2307/27/23

Fingerprint

Dive into the research topics of 'Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels'. Together they form a unique fingerprint.

Cite this