TY - JOUR
T1 - A Curated Target Gene Pool Assisting Early Disease Prediction and Patient-Specific Treatment for Small Cell Lung Cancer
AU - Dong, Yan
AU - Cao, Hongbao
AU - Liang, Zhigang
N1 - Publisher Copyright:
© 2018, Mary Ann Liebert, Inc.
PY - 2018/6
Y1 - 2018/6
N2 - Hundreds of genes have been linked to small cell lung cancer (SCLC), presenting multiple levels of connections with the disease. The question is whether these genes are sufficient as genetic biomarkers for the early diagnosis and personalized treatment of SCLC. An SCLC genetic database was developed through comprehensive ResNet relationship data analysis, where 557 SCLC target genes were curated. Multiple levels of associations between these genes and SCLC were studied. Then, a sparse representation-based variable selection (SRVS) was employed for gene selection for four SCLC gene expression data sets, followed by a case-control classification. Results were compared with that of analysis of variance (ANOVA)-based gene selection approaches. Using SRVS, a gene vector was selected for each data set, leading to significant higher classification accuracy compared with randomly selected genes (100%, 77.12%, 100%, and 100%; permutation p values: 0.017, 0.00060, 0.012, and 0.0066). The SRVS method outperformed ANOVA in terms of classification ratio. The genes were selected within the 557 SCLC gene pool, showing data set and method specificity. Our results suggested that for a given SCLC patient group, there might exist a gene vector in the 557 curated SCLC genes that possess significant prediction power. SRVS is effective for identifying the optimum gene subset targeting personalized treatment.
AB - Hundreds of genes have been linked to small cell lung cancer (SCLC), presenting multiple levels of connections with the disease. The question is whether these genes are sufficient as genetic biomarkers for the early diagnosis and personalized treatment of SCLC. An SCLC genetic database was developed through comprehensive ResNet relationship data analysis, where 557 SCLC target genes were curated. Multiple levels of associations between these genes and SCLC were studied. Then, a sparse representation-based variable selection (SRVS) was employed for gene selection for four SCLC gene expression data sets, followed by a case-control classification. Results were compared with that of analysis of variance (ANOVA)-based gene selection approaches. Using SRVS, a gene vector was selected for each data set, leading to significant higher classification accuracy compared with randomly selected genes (100%, 77.12%, 100%, and 100%; permutation p values: 0.017, 0.00060, 0.012, and 0.0066). The SRVS method outperformed ANOVA in terms of classification ratio. The genes were selected within the 557 SCLC gene pool, showing data set and method specificity. Our results suggested that for a given SCLC patient group, there might exist a gene vector in the 557 curated SCLC genes that possess significant prediction power. SRVS is effective for identifying the optimum gene subset targeting personalized treatment.
KW - ResNet database
KW - small cell lung cancer
KW - sparse representation
KW - variable selection
UR - http://www.scopus.com/inward/record.url?scp=85048587461&partnerID=8YFLogxK
U2 - 10.1089/cmb.2017.0071
DO - 10.1089/cmb.2017.0071
M3 - Artículo
C2 - 29741913
AN - SCOPUS:85048587461
SN - 1066-5277
VL - 25
SP - 576
EP - 585
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 6
ER -