TY - JOUR
T1 - Focused Contrastive Loss for Classification With Pre-Trained Language Models
AU - He, Jiayuan
AU - Li, Yuan
AU - Zhai, Zenan
AU - Fang, Biaoyan
AU - Thorne, Camilo
AU - Druckenbrodt, Christian
AU - Akhondi, Saber
AU - Verspoor, Karin
PY - 2023/11/3
Y1 - 2023/11/3
N2 - Contrastive learning, which learns data representations by contrasting similar and dissimilar instances, has achieved great success in various domains including natural language processing (NLP). Recently, it has been demonstrated that incorporating class labels into contrastive learning, i.e., supervised contrastive learning (SCL), can further enhance the quality of the learned data representations. Although several works have shown empirically that incorporating SCL into classification models leads to better performance, the mechanism of how SCL works for classification is less studied. In this paper, we first investigate how SCL facilitates the classifier learning, where we show that the contrastive region, i.e., the data instances involved in each contrasting operation, has a crucial link to the mechanism of SCL. We reveal that the vanilla SCL is suboptimal since its behavior can be altered by variances in class distributions. Based on this finding, we propose a Focused Contrastive Loss (FoCL) for classification. Compared with SCL, FoCL defines a finer contrastive region, focusing on the data instances surrounding decision boundaries. We conduct extensive experiments on three NLP tasks: text classification, named entity recognition, and relation extraction. Experimental results show consistent and significant improvements of FoCL over strong baselines on various benchmark datasets, especially in few-shot scenarios.
AB - Contrastive learning, which learns data representations by contrasting similar and dissimilar instances, has achieved great success in various domains including natural language processing (NLP). Recently, it has been demonstrated that incorporating class labels into contrastive learning, i.e., supervised contrastive learning (SCL), can further enhance the quality of the learned data representations. Although several works have shown empirically that incorporating SCL into classification models leads to better performance, the mechanism of how SCL works for classification is less studied. In this paper, we first investigate how SCL facilitates the classifier learning, where we show that the contrastive region, i.e., the data instances involved in each contrasting operation, has a crucial link to the mechanism of SCL. We reveal that the vanilla SCL is suboptimal since its behavior can be altered by variances in class distributions. Based on this finding, we propose a Focused Contrastive Loss (FoCL) for classification. Compared with SCL, FoCL defines a finer contrastive region, focusing on the data instances surrounding decision boundaries. We conduct extensive experiments on three NLP tasks: text classification, named entity recognition, and relation extraction. Experimental results show consistent and significant improvements of FoCL over strong baselines on various benchmark datasets, especially in few-shot scenarios.
UR - https://ieeexplore.ieee.org/abstract/document/10306323/authors#authors
U2 - 10.1109/TKDE.2023.3327777
DO - 10.1109/TKDE.2023.3327777
M3 - Article
SN - 1041-4347
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
ER -