A Machine Learning-Based Analysis of Prostate Cancer Cases in Delta State, Nigeria
Nwabenu Dominic Christian
*
Department of Mathematics and Statistics, Delta State Polytechnic, Ogwashi-uku, Nigeria.
Omonode Ejiro
Sports Department, Delta State Polytechnic, Ogwashi-uku, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
Prostate cancer is the leading cause of cancer-related mortality among Nigerian males, with most diagnosed cases presenting at an advanced, incurable stage. The incidence rate of prostate cancer in Nigeria is 32.8 per 100,000 while the mortality rate is 16.3 per 100,000. Despite advancements in early-stage detection and screening programs available in Delta State, the prevalence of late-stage diagnosis and lack of knowledge subsists. This study applied four supervised machine learning classifiers — Logistic Regression, Decision Tree, Random Forest, and Support Vector Machine — to retrospective patient records from six healthcare institutions across Delta State, with the aim of identifying the clinical risk factors most strongly associated with advanced-stage diagnosis and building a predictive framework capable of distinguishing early- from late-stage disease. Secondary data was collected from 60 confirmed prostate cancer cases diagnosed between January 2015 and December 2023 from three tertiary referral centers and three general hospitals. The Models was trained on 80% of the pooled dataset and evaluated on the remaining 20% using accuracy, sensitivity, specificity, F1-score, and AUC-ROC. Results showed that 68.3% of cases were at Stage III or IV at the time of diagnosis. The mean age at presentation was 64.2 years, and three quarters of patients had PSA levels above 10 ng/m. The four strongest predictors of advanced-stage disease were PSA level (OR = 5.82), Gleason score 8–10 (OR = 4.37), age 65 years or above (OR = 3.14), and positive family history (OR = 2.41). Random Forest outperformed all three competing models, achieving 91.3% accuracy, 89.6% sensitivity, 92.8% specificity, and an AUC-ROC of 0.94. These findings show that supervised machine learning can effectively predict prostate cancer stage at diagnosis in Delta State using routinely collected clinical data, with Random Forest achieving the strongest classification performance. The results have direct implications for early detection policy and clinical triage in the region, and future research should prioritise prospective data collection, external model validation, and the development of deployable decision-support tools for primary healthcare settings.
Keywords: Prostate cancer, machine learning, random forest, Delta State Nigeria, PSA, risk factors, early detection, predictive model