Linked cancer genome atlas database

Muhammad Saleem, Shanmukha S. Padmanabhuni, Axel Cyrille Ngonga Ngomo, Jonas S. Almeida, Stefan Decker, Helena F. Deus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional pilot project to create an atlas of genetic mutations responsible for cancer. One of the aims of this project is to develop an infrastructure for making the cancer related data publicly accessible, to enable cancer researchers anywhere around the world to make and validate important discoveries. However, data in the cancer genome atlas are organized as text archives in a set of directories. Devising bioinformatics applications to analyse such data is still challenging, as it requires downloading very large archives and parsing the relevant text files in order to collect the critical co-variates necessary for analysis. Furthermore, the various types of experimental results are not connected biologically, i.e. in order to truly exploit the data in the genome-wide context in which the TCGA project was devised, the data needs to be converted into a structured representation and made publicly available for remote querying and virtual integration. In this work, we address these issues by RDFizing data from TCGA and linking its elements to the Linked Open Data (LOD) Cloud. The outcome is the largest LOD data source (to the best of our knowledge) comprising of over 30 billion triples. This data source can be exploited through publicly available SPARQL endpoints, thus providing an easy-to-use, time-efficient, and scalable solution to accessing the Cancer Genome Atlas. We also describe showcases which are enabled by the new linked data representation of the Cancer Genome Atlas presented in this paper.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Semantic Systems, I-SEMANTICS 2013
Pages129-134
Number of pages6
DOIs
StatePublished - 2013
Event9th International Conference on Semantic Systems, I-SEMANTICS 2013 - Graz, Austria
Duration: Sep 4 2013Sep 6 2013

Publication series

NameACM International Conference Proceeding Series

Conference

Conference9th International Conference on Semantic Systems, I-SEMANTICS 2013
Country/TerritoryAustria
CityGraz
Period09/4/1309/6/13

Keywords

  • LOD
  • SPARQL
  • TCGA

Fingerprint

Dive into the research topics of 'Linked cancer genome atlas database'. Together they form a unique fingerprint.

Cite this