嘉宾介绍
主题介绍
With the current microarray and RNA sequencing technologies, two-sample genome-wide expression data have been increasingly collected in biological and medical studies. Di erential expression analysis and gene set enrichment analysis have been frequently conducted. The related statistical software in R has been widely used. Integrative analysis can be conducted when multiple data sets are available. In practice, concordant and discordant molecular behaviors among a series of data sets can be of biological and clinical interest. There is still a lack of statistical methods and software for these types of integrative analysis.
We have proposed a mixture model based approach to the integrative analysis of multiple large-scale two- sample expression data sets. Since the mixture model is based on the transformed di erential expression test P-values (z-scores), it is generally applicable to the expression data generated by either microarray or RNA sequencing platforms. The mixture model is simple with three normal distribution components for each data set to represent down-regulation, up-regulation and no di erential expression. However, when the number of data sets increases, the model parameter space increases exponentially due to the component combination from di erent data sets. To achieve a concordant and discordant integrative analysis for a series of data sets, We have introduced two model reduction strategies. The related statistical computing has been implemented in R.
We demonstrate our methods on the recent TCGA RNA sequencing data. To illustrate a concordant integrative analysis, we apply our method to a series of data sets collected for studying two closely related types of cancer. To illustrate a discordant integrative analysis, we apply our method to a series of data sets collected for studying di erent types of cancer. Interesting disease-related pathways can be detected by our integrative analysis approach.
—— 点击加载更多 ——
收起