Abstract
Objective: To investigate whether previously curated chronic lymphocytic leukemia (CLL) risk genes could be leveraged in gene marker selection for the diagnosis and prediction of CLL. Methods: A CLL genetic database (CLL_042017) was developed through a comprehensive CLL-gene relation data analysis, in which 753 CLL target genes were curated. Expression values for these genes were used for case-control classification of four CLL datasets, with a sparse representation-based variable selection (SRVS) approach employed for feature (gene) selection. Results were compared with outcomes obtained by using analysis of variance (ANOVA)-based gene selection approaches. Results: For each of the four datasets, SRVS selected a subset of genes from the 753 CLL target genes, resulting in significantly higher classification accuracy, compared with randomly selected genes (100%, 100%, 93.94%, 89.39%). The SRVS method outperformed ANOVA in terms of classification accuracy. Conclusion: Gene markers selected from the 753 CLL genes could enable significantly greater accuracy in the prediction of CLL. SRVS provides an effective method for gene marker selection.
Original language | English |
---|---|
Pages (from-to) | 3358-3364 |
Number of pages | 7 |
Journal | Journal of International Medical Research |
Volume | 46 |
Issue number | 8 |
DOIs | |
State | Published - Aug 1 2018 |
Externally published | Yes |
Keywords
- case-control classification
- Chronic lymphocytic leukemia (CLL)
- disease prediction
- gene markers
- genetic databases
- sparse representation
- variable selection