TY - JOUR
T1 - Is your document novel? Let attention guide you. An attention-based model for document-level novelty detection
AU - Ghosal, Tirthankar
AU - Edithal, Vignesh
AU - Ekbal, Asif
AU - Bhattacharyya, Pushpak
AU - Chivukula, Srinivasa Satya Sameer Kumar
AU - Tsatsaronis, George
N1 - Publisher Copyright:
© 2021 Cambridge University Press. All rights reserved.
PY - 2021/7
Y1 - 2021/7
N2 - Detecting, whether a document contains sufficient new information to be deemed as novel, is of immense significance in this age of data duplication. Existing techniques for document-level novelty detection mostly perform at the lexical level and are unable to address the semantic-level redundancy. These techniques usually rely on handcrafted features extracted from the documents in a rule-based or traditional feature-based machine learning setup. Here, we present an effective approach based on neural attention mechanism to detect document-level novelty without any manual feature engineering. We contend that the simple alignment of texts between the source and target document(s) could identify the state of novelty of a target document. Our deep neural architecture elicits inference knowledge from a large-scale natural language inference dataset, which proves crucial to the novelty detection task. Our approach is effective and outperforms the standard baselines and recent work on document-level novelty detection by a margin of 3% in terms of accuracy.
AB - Detecting, whether a document contains sufficient new information to be deemed as novel, is of immense significance in this age of data duplication. Existing techniques for document-level novelty detection mostly perform at the lexical level and are unable to address the semantic-level redundancy. These techniques usually rely on handcrafted features extracted from the documents in a rule-based or traditional feature-based machine learning setup. Here, we present an effective approach based on neural attention mechanism to detect document-level novelty without any manual feature engineering. We contend that the simple alignment of texts between the source and target document(s) could identify the state of novelty of a target document. Our deep neural architecture elicits inference knowledge from a large-scale natural language inference dataset, which proves crucial to the novelty detection task. Our approach is effective and outperforms the standard baselines and recent work on document-level novelty detection by a margin of 3% in terms of accuracy.
KW - Decomposable Attention
KW - Document Classification
KW - Document-Level Novelty Detection
KW - Natural Language Inference
UR - http://www.scopus.com/inward/record.url?scp=85084288173&partnerID=8YFLogxK
U2 - 10.1017/S1351324920000194
DO - 10.1017/S1351324920000194
M3 - Artículo
AN - SCOPUS:85084288173
SN - 1351-3249
VL - 27
SP - 427
EP - 454
JO - Natural Language Engineering
JF - Natural Language Engineering
IS - 4
ER -