Estimating the F1 Score for Learning from Positive and Unlabeled Examples

Seyed Amin Tabatabaei, Jan Klein, Mark Hoogendoorn

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Semi-supervised learning can be applied to datasets that contain both labeled and unlabeled instances and can result in more accurate predictions compared to fully supervised or unsupervised learning in case limited labeled data is available. A subclass of problems, called Positive-Unlabeled (PU) learning, focuses on cases in which the labeled instances contain only positive examples. Given the lack of negatively labeled data, estimating the general performance is difficult. In this paper, we propose a new approach to approximate the F1 score for PU learning. It requires an estimate of what fraction of the total number of positive instances is available in the labeled set. We derive theoretical properties of the approach and apply it to several datasets to study its empirical behavior and to compare it to the most well-known score in the field, LL score. Results show that even when the estimate is quite off compared to the real fraction of positive labels the approximation of the F1 score is significantly better compared with the LL score.

Original languageEnglish
Title of host publicationMachine Learning, Optimization, and Data Science - 6th International Conference, LOD 2020, Revised Selected Papers
EditorsGiuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, Renato Umeton
PublisherSpringer Science and Business Media Deutschland GmbH
Pages150-161
Number of pages12
ISBN (Print)9783030645823
DOIs
StatePublished - 2020
Externally publishedYes
Event6th International Conference on Machine Learning, Optimization, and Data Science, LOD 2020 - Siena, Italy
Duration: Jul 19 2020Jul 23 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12565 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th International Conference on Machine Learning, Optimization, and Data Science, LOD 2020
Country/TerritoryItaly
CitySiena
Period07/19/2007/23/20

Fingerprint

Dive into the research topics of 'Estimating the F1 Score for Learning from Positive and Unlabeled Examples'. Together they form a unique fingerprint.

Cite this