30 de mayo de 2023 a 2 de junio de 2023 Ciencias Naturales, Exactas y Ténicas
Facultad de Matemática y Computación
America/Havana zona horaria

RESTRICTED CUR MATRIX DECOMPOSITION IN CANCER DNA MICROARRAY DATA CLASSIFICATION PROBLEMS

No programado
20m
Facultad de Matemática y Computación

Facultad de Matemática y Computación

Ponente

Prof. Yunier Emilio Tejeda Rodríguez (Sociedad Cubana de Matemática y Computación)

Descripción

DNA microarray data for cancer are datasets that originate from the use of cDNA microarray technology in the classification of cancer tumors. These sets constitute a very complex classification problem for supervised and unsupervised data modeling. Due to their high dimensions in the number of columns, DNA microarray data in cancer constitute a Column Subset Selection Problem. This problem is closely related to the restricted CUR matrix decomposition. In this work we propose the Frobenius norm relative error $\theta=\frac{\|A-C \cdot X\|_F}{\|A\|_F}$ as a succession in function of the number of columns selected for $n \times p$ with $n \ll p$ matrix $A$. We demonstrate that this error is an infinitely small succession. Based on this result, we propose two algorithms that select a features subset in such a way that $\theta=\frac{\|A-C \cdot X\|_F}{\|A\|_F}$ is as small as possible. We applied the proposed algorithms to six cancer DNA microarray datasets. The subsets selected by these algorithms are used as predictors to train the the C4.5, NB and SVM classifiers, respectively. Using the 5-field cross-validation resampling technique, balanced accuracy measures are calculated for these three classifiers. Finally, the results obtained are compared with the results found in the literature using non-parametric hypothesis tests in paired samples, concluding that these results are similar.

Autor primario

Prof. Yunier Emilio Tejeda Rodríguez (Sociedad Cubana de Matemática y Computación)

Materiales de la presentación

Todavía no hay materiales.