TY - JOUR
T1 - Combining user reputation and provenance analysis for trust assessment
AU - Ceolin, Davide
AU - Groth, Paul
AU - Maccatrozzo, Valentina
AU - Fokkink, Wan
AU - Van Hage, Willem Robert
AU - Nottamkandath, Archana
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/1
Y1 - 2016/1
N2 - Trust is a broad concept that in many systems is often reduced to user reputation alone. However, user reputation is just one way to determine trust. The estimation of trust can be tackled from other perspectives as well, including by looking at provenance. Here, we present a complete pipeline for estimating the trustworthiness of artifacts given their provenance and a set of sample evaluations. The pipeline is composed of a series of algorithms for (1) extracting relevant provenance features, (2) generating stereotypes of user behavior from provenance features, (3) estimating the reputation of both stereotypes and users, (4) using a combination of user and stereotype reputations to estimate the trustworthiness of artifacts and (5) selecting sets of artifacts to trust. These algorithms rely on the W3C PROV recommendations for provenance and on evidential reasoning by means of subjective logic. We evaluate the pipeline over two tagging datasets: tags and evaluations from the Netherlands Institute for Sound and Vision's Waisda? video tagging platform, as well as crowdsourced annotations from the Steve. Museum project. The approach achieves up to 85% precision when predicting tag trustworthiness. Perhaps more importantly, the pipeline provides satisfactory results using relatively little evidence through the use of provenance.
AB - Trust is a broad concept that in many systems is often reduced to user reputation alone. However, user reputation is just one way to determine trust. The estimation of trust can be tackled from other perspectives as well, including by looking at provenance. Here, we present a complete pipeline for estimating the trustworthiness of artifacts given their provenance and a set of sample evaluations. The pipeline is composed of a series of algorithms for (1) extracting relevant provenance features, (2) generating stereotypes of user behavior from provenance features, (3) estimating the reputation of both stereotypes and users, (4) using a combination of user and stereotype reputations to estimate the trustworthiness of artifacts and (5) selecting sets of artifacts to trust. These algorithms rely on the W3C PROV recommendations for provenance and on evidential reasoning by means of subjective logic. We evaluate the pipeline over two tagging datasets: tags and evaluations from the Netherlands Institute for Sound and Vision's Waisda? video tagging platform, as well as crowdsourced annotations from the Steve. Museum project. The approach achieves up to 85% precision when predicting tag trustworthiness. Perhaps more importantly, the pipeline provides satisfactory results using relatively little evidence through the use of provenance.
KW - Machine learning
KW - Provenance
KW - Subjective logic
KW - Tags
KW - Trust
KW - Uncertainty reasoning
UR - http://www.scopus.com/inward/record.url?scp=84957083633&partnerID=8YFLogxK
U2 - 10.1145/2818382
DO - 10.1145/2818382
M3 - Artículo
AN - SCOPUS:84957083633
SN - 1936-1955
VL - 7
JO - Journal of Data and Information Quality
JF - Journal of Data and Information Quality
IS - 1-2
M1 - 6
ER -