Survival analysis of high-dimensional gene expression data from cancer diseases

Sala 6.4.30, FCUL, Lisboa

Por Eunice Carrasquinha (INESC-ID, Instituto Superior Técnico, Universidade de Lisboa).

One of the challenges arising when dealing with patient’ omics data is the high-dimensionality problem. In this type of data, the number of covariates (p) is often much larger than the number of observations (n), i.e., p»n. In this context, the usual statistical techniques for the estimation of the parameters cannot be applied, due to the inherent ill-posed inverse problem. To tackle this problem, the introduction of graph centrality measures in classical sparse survival models such as the elastic net is evaluated. The use of network information as part of the regularization applied to the inverse problem, obtained both by external knowledge on the features evaluated and the data themselves, are explored. Furthermore, the presence of outliers, either experimental errors or interesting abnormal clinical cases, gained great importance due to the fact that the identification of long or short-term survivors may lead to the detection of new prognostic factors. We propose to address this problem through an ensemble technique, the Rank Product test. Results from different types of cancer from The Cancer Genome Atlas (TCGA) are presented.

CEAUL - Centro de Estatística e Aplicações da Universidade de Lisboa