TY - GEN
T1 - TCGA toolbox
T2 - 2013 4th ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013
AU - Robbins, David E.
AU - Grüneberg, Alexander
AU - Deus, Helena F.
AU - Tanik, Murat M.
AU - Almeida, Jonas
PY - 2013
Y1 - 2013
N2 - The diversity and volume of data generated by the cancer genome atlas (TCGA) has been increasing exponentially, with the number of data files hosted by NHI, currently 3/4 million, doubling every 7 months since January 2010. The proponents have recently developed a browser-based self-updating mechanism to catalog this dynamic big data repository. In this report, that foundation is built upon to devise a web app framework to distribute TCGA analytical pipelines in a manner that can be fully reproducible without the usual requirement for a pre-installed specialized computational statistics environment. The solution found relies exclusively of sandboxed code injection (JavaScript) and on access permission configuration by the browser's app store. This framework was devised with an open architecture such that third party analyses, ideally hosted with web-facing version control in a repository such as GitHub, SourceForge, Bitbucket, or Google Code, can be distributed to the toolbox. The openness of the framework developed is specifically reected by enabling the user to invoke the third party analysis simply by inputing the corresponding URL. Similarly, the toolbox also mediates the ability of the user to then distribute the result of the analysis as a reproducible procedure, also fully invoked as a Universal Resource Locator (URL).
AB - The diversity and volume of data generated by the cancer genome atlas (TCGA) has been increasing exponentially, with the number of data files hosted by NHI, currently 3/4 million, doubling every 7 months since January 2010. The proponents have recently developed a browser-based self-updating mechanism to catalog this dynamic big data repository. In this report, that foundation is built upon to devise a web app framework to distribute TCGA analytical pipelines in a manner that can be fully reproducible without the usual requirement for a pre-installed specialized computational statistics environment. The solution found relies exclusively of sandboxed code injection (JavaScript) and on access permission configuration by the browser's app store. This framework was devised with an open architecture such that third party analyses, ideally hosted with web-facing version control in a repository such as GitHub, SourceForge, Bitbucket, or Google Code, can be distributed to the toolbox. The openness of the framework developed is specifically reected by enabling the user to invoke the third party analysis simply by inputing the corresponding URL. Similarly, the toolbox also mediates the ability of the user to then distribute the result of the analysis as a reproducible procedure, also fully invoked as a Universal Resource Locator (URL).
KW - Big data
KW - Genomics
KW - The cancer genome atlas
UR - http://www.scopus.com/inward/record.url?scp=84888192399&partnerID=8YFLogxK
U2 - 10.1145/2506583.2506595
DO - 10.1145/2506583.2506595
M3 - Contribución a la conferencia
AN - SCOPUS:84888192399
SN - 9781450324342
T3 - 2013 ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013
SP - 62
EP - 67
BT - 2013 ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013
Y2 - 22 September 2013 through 25 September 2013
ER -