The availability of a huge number of variables is not always associated to better classification performances, as some of them can be redundant, irrelevant or source of noise. For this reason, a Feature Selection (FS) step is often applied to high-dimensional datasets. FS based on correlation relies on the idea that “good feature subsets contain features highly correlated with the class yet uncorrelated with each other”. However, the main problem of this kind of approach is to define a threshold from which considering two variables correlated. In this study, we evaluated the impact of different thresholds on the performances of two classifiers trained to predict response to neoadjuvant chemotherapy (from grade 1 to 5) of 44 patients with breast cancer. First, 27 texture features were computed on the largest slices belonging to the segmented tumor on the pretreatment dynamic contrast enhanced-MRI. Then, we applied a FS algorithm that identifies the couples of variables with absolute value of the linear correlation coefficient above a given threshold and removed, for each couple, the variable less correlated with the response to the neoadjuvant chemotherapy. We tested correlation thresholds ranging from 1 to 0.8 with intervals of 0.01, and we used each obtained subset to construct a Decision Tree (DT) classifier and a Linear Regression Model (LRM). Our results showed that the removal of highly correlated variables (absolute value of the correlation coefficient >0.97) produced a reduction of the DT performance of about 10%. Although the LRM was not able to reach acceptable results in terms of chemotherapy response prediction (accuracy=40.9%), its intrinsic linearity allowed to be more stable to linear redundancy removal.

Correlation based Feature Selection impact on the classification of breast cancer patients response to neoadjuvant chemotherapy / Rosati, S.; Gianfreda, C. M.; Balestra, G.; Martincich, L.; Giannini, V.; Regge, D.. - ELETTRONICO. - (2018), pp. 1-5. (Intervento presentato al convegno MeMeA 2018 tenutosi a Rome (Italy) nel 11-13 June 2018) [10.1109/MeMeA.2018.8438698].

Correlation based Feature Selection impact on the classification of breast cancer patients response to neoadjuvant chemotherapy

Rosati, S.;Balestra, G.;Giannini, V.;
2018

Abstract

The availability of a huge number of variables is not always associated to better classification performances, as some of them can be redundant, irrelevant or source of noise. For this reason, a Feature Selection (FS) step is often applied to high-dimensional datasets. FS based on correlation relies on the idea that “good feature subsets contain features highly correlated with the class yet uncorrelated with each other”. However, the main problem of this kind of approach is to define a threshold from which considering two variables correlated. In this study, we evaluated the impact of different thresholds on the performances of two classifiers trained to predict response to neoadjuvant chemotherapy (from grade 1 to 5) of 44 patients with breast cancer. First, 27 texture features were computed on the largest slices belonging to the segmented tumor on the pretreatment dynamic contrast enhanced-MRI. Then, we applied a FS algorithm that identifies the couples of variables with absolute value of the linear correlation coefficient above a given threshold and removed, for each couple, the variable less correlated with the response to the neoadjuvant chemotherapy. We tested correlation thresholds ranging from 1 to 0.8 with intervals of 0.01, and we used each obtained subset to construct a Decision Tree (DT) classifier and a Linear Regression Model (LRM). Our results showed that the removal of highly correlated variables (absolute value of the correlation coefficient >0.97) produced a reduction of the DT performance of about 10%. Although the LRM was not able to reach acceptable results in terms of chemotherapy response prediction (accuracy=40.9%), its intrinsic linearity allowed to be more stable to linear redundancy removal.
2018
978-1-5386-3392-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2712192
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo