Sparse representation based clustering for integrated analysis of gene copy number variation and gene expression data

Hongbao Cao, Junbo Duan, Dongdong Lin, Yu Ping Wang

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Integrated analysis of multiple types of genomic data has received increasing attention in recently years, due to the rapid development of new genetic techniques and the strong demand for improvement of the reliability of these techniques. In this work, we proposed a sparse representation based clustering (SRC) method for joint analysis of gene expression and copy number data with the purpose to select significant genes/variables for identification of genes susceptible to a disease. Different from traditional gene selections methods, the proposed SRC model employs information of multifeatures and clusters the data into multi-groups, and then selects significant genes/variables in a particular group. By using joint features extracted from both types of data, the proposed SRC method provides an efficient approach to integrate different types of genomic measurements for comprehensive analysis. Our method has been tested on both breast cancer cell lines and breast tumors data. In addition, simulated data sets were used to test the robustness of the method to several factors such as noise, data sizes and data types. Experiments showed that our proposed method can effectively identify genes/variables with interesting characteristics, e.g., genes/variables with large variations across all genes, and genes/variables that are statistically significant in both measurements with strong correlations. The proposed method can be applicable to a wide variety of biological problems where joint analysis of biological measurements is a common challenge.

Original languageEnglish
Pages (from-to)131-144
Number of pages14
JournalInternational Journal of Computers and their Applications
Volume19
Issue number2
StatePublished - Jun 2012
Externally publishedYes

Keywords

  • And gene selection
  • Clustering
  • Gene copy number variation
  • Gene expression
  • Sparse representations

Fingerprint

Dive into the research topics of 'Sparse representation based clustering for integrated analysis of gene copy number variation and gene expression data'. Together they form a unique fingerprint.

Cite this