TY - JOUR
T1 - Identifying disease trajectories with predicate information from a knowledge graph
AU - Vlietstra, Wytze J.
AU - Vos, Rein
AU - Van Den Akker, Marjan
AU - Van Mulligen, Erik M.
AU - Kors, Jan A.
N1 - Publisher Copyright:
© 2020 The Author(s).
PY - 2020/8/20
Y1 - 2020/8/20
N2 - Background: Knowledge graphs can represent the contents of biomedical literature and databases as subject-predicate-object triples, thereby enabling comprehensive analyses that identify e.g. relationships between diseases. Some diseases are often diagnosed in patients in specific temporal sequences, which are referred to as disease trajectories. Here, we determine whether a sequence of two diseases forms a trajectory by leveraging the predicate information from paths between (disease) proteins in a knowledge graph. Furthermore, we determine the added value of directional information of predicates for this task. To do so, we create four feature sets, based on two methods for representing indirect paths, and both with and without directional information of predicates (i.e., which protein is considered subject and which object). The added value of the directional information of predicates is quantified by comparing the classification performance of the feature sets that include or exclude it. Results: Our method achieved a maximum area under the ROC curve of 89.8% and 74.5% when evaluated with two different reference sets. Use of directional information of predicates significantly improved performance by 6.5 and 2.0 percentage points respectively. Conclusions: Our work demonstrates that predicates between proteins can be used to identify disease trajectories. Using the directional information of predicates significantly improved performance over not using this information.
AB - Background: Knowledge graphs can represent the contents of biomedical literature and databases as subject-predicate-object triples, thereby enabling comprehensive analyses that identify e.g. relationships between diseases. Some diseases are often diagnosed in patients in specific temporal sequences, which are referred to as disease trajectories. Here, we determine whether a sequence of two diseases forms a trajectory by leveraging the predicate information from paths between (disease) proteins in a knowledge graph. Furthermore, we determine the added value of directional information of predicates for this task. To do so, we create four feature sets, based on two methods for representing indirect paths, and both with and without directional information of predicates (i.e., which protein is considered subject and which object). The added value of the directional information of predicates is quantified by comparing the classification performance of the feature sets that include or exclude it. Results: Our method achieved a maximum area under the ROC curve of 89.8% and 74.5% when evaluated with two different reference sets. Use of directional information of predicates significantly improved performance by 6.5 and 2.0 percentage points respectively. Conclusions: Our work demonstrates that predicates between proteins can be used to identify disease trajectories. Using the directional information of predicates significantly improved performance over not using this information.
KW - Directionality of predicates
KW - Disease trajectories
KW - Knowledge graph
KW - Predicates
KW - Protein-protein interactions
KW - Temporal relationships
UR - http://www.scopus.com/inward/record.url?scp=85089769199&partnerID=8YFLogxK
U2 - 10.1186/s13326-020-00228-8
DO - 10.1186/s13326-020-00228-8
M3 - Article
C2 - 32819419
AN - SCOPUS:85089769199
SN - 2041-1480
VL - 11
JO - Journal of Biomedical Semantics
JF - Journal of Biomedical Semantics
IS - 1
M1 - 9
ER -