Abstract
Integrated analysis of multiple types of genomic data has received increasing attention in recently years, due to the rapid development of new genetic techniques and the strong demand for improvement of the reliability of these techniques. In this work, we proposed a sparse representation based clustering (SRC) method for joint analysis of gene expression and copy number data with the purpose to select significant genes/variables for identification of genes susceptible to a disease. Different from traditional gene selections methods, the proposed SRC model employs information of multifeatures and clusters the data into multi-groups, and then selects significant genes/variables in a particular group. By using joint features extracted from both types of data, the proposed SRC method provides an efficient approach to integrate different types of genomic measurements for comprehensive analysis. Our method has been tested on both breast cancer cell lines and breast tumors data. In addition, simulated data sets were used to test the robustness of the method to several factors such as noise, data sizes and data types. Experiments showed that our proposed method can effectively identify genes/variables with interesting characteristics, e.g., genes/variables with large variations across all genes, and genes/variables that are statistically significant in both measurements with strong correlations. The proposed method can be applicable to a wide variety of biological problems where joint analysis of biological measurements is a common challenge.
Original language | English |
---|---|
Pages (from-to) | 131-144 |
Number of pages | 14 |
Journal | International Journal of Computers and their Applications |
Volume | 19 |
Issue number | 2 |
State | Published - Jun 2012 |
Externally published | Yes |
Keywords
- And gene selection
- Clustering
- Gene copy number variation
- Gene expression
- Sparse representations