Computer-Aided Detection of Clinically Significant Prostate Cancer using Bi-Parametric Magnetic Resonance Imaging

Zhang, Yanhua

Prostate cancer (PCa) is the most common cancer among men in nearly two-thirds of countries worldwide (118 out of 195). In 2022, PCa accounted for around 1.5 million new cases and 397,000 deaths globally, making it the second most frequent cancer and the fifth leading cause of cancer-related deaths among men. A major challenge in diagnosing PCa is distinguishing benign tumors that remain non-progressive from clinically significant PCa (csPCa), which has the potential to rapidly develop into metastasis and result in death. Recently, magnetic resonance imaging (MRI) has shown high accuracy in PCa detection and characterization. However, due to the strong resemblance between csPCa and numerous nonmalignant conditions, manually characterizing of focal prostate lesions in MRI sequences is time-consuming and demands a high level of expertise. Besides, the subjective criteria used for grading could result in low inter-reader agreement. These issues raise the need of developing computer-aided diagnosis (CAD) systems to support radiologists in the automatic detection of PCa on MRI. With the increasing of large available datasets, convolutional neural networks (CNNs) have become extensively applied in PCa detection, while they are limited in learning long range relationships due to the inherent locality of convolutional kernels. Transformer, which is notable for its ability of global context modeling, has recently shown promising improvements over CNNs in medical image processing field, but it is rarely explored for PCa detection. Furthermore, the standard self-attention mechanism used to build the Transformer is memory and computationally inefficient, which hurts its performance and limits its application in actual clinical practice. In this study, we proposed a hybrid segmentation network that effectively combines CNN and Transformer as the core component of our CAD system for PCa detection. We first introduce a memory and computationally efficient self-attention module designed to facilitate reasoning on high-resolution features, thereby improving the efficiency of learning global information while effectively capturing the details of features. Then, the encoder part consisting of three independent residual-block based branches is used to extract modality-specific features from T2W, ADC, and DWI sequences and fuse them at multi-scale levels. The following decoder uses a top-down path to progressively restore the spatial details of low-resolution feature maps and integrate multi-level features from the encoder into the highest-resolution feature map. Upon the topmost feature from the decoder, an efficient self-attention based Transformer branch is utilized to learn dense global context information. Finally, we integrate the 2D and 3D versions of our proposed network to learn both inter-slice and intra-plane representations and alleviate the problem of anisotropy voxel spacing. A large cohort of 1339 cases collected from 11 centers is used to train and evaluate the proposed algorithm. On a test set of 269 cases, the ensemble model combining all 2D and 3D networks (10 in total), trained from 5-fold cross-validation, achieves balanced lesion-level detection performance with 77.86% sensitivity and 77.31% precision, as well as favorable patient-level detection performance with 88.89% sensitivity and 90.21% specificity. At the patient-level operating point decided by valuation experiments, it has a pixel-level segmentation accuracy of 69.3% mIoU and 74.88% mDSC. Compared with patient-level performance of radiology readings in routine practice, the proposed CAD system exhibits comparable performance to radiologists at PI-RADS 3 or higher operating points. The specificity of our CAD system is 3.28% higher than that of radiologists at the same sensitivity, while its sensitivity is 0.58% higher than that of radiologists at the same specificity. For the performance at the PI-RADS 4 or greater operating point, its specificity (83.22%) is significantly higher than that of radiologists by 9.2% at the matching sensitivity of 92% (P-value: 0.001). As for the PI-RADS 5 or greater operating point, the specificity and sensitivity of the ensemble model are lower than that of radiologists by 0.18% and 1.03%, with no statistically significant difference (P-value: 0.907 and 0.772). We also conduct systematic experiments to provide a comprehensive understanding of operating point selection, the comparison of 2D, 3D, and their ensemble networks, and the network architecture design.

PORTO @ Archivio Istituzionale della Ricerca