Unveiling Cancer: A Data-Driven Approach for Early Identification and Prediction Using F-RUS-RF ModelShow others and affiliations
2024 (English)In: International journal of imaging systems and technology (Print), ISSN 0899-9457, E-ISSN 1098-1098, Vol. 34, no 6, article id e23221Article in journal (Refereed) Published
Abstract [en]
Globally, cancer is the second-leading cause of death after cardiovascular disease. To improve survival rates, risk factors and cancer predictors must be identified early. From the literature, researchers have developed several kinds of machine learning-based diagnostic systems for early cancer prediction. This study presented a diagnostic system that can identify the risk factors linked to the onset of cancer in order to anticipate cancer early. The newly constructed diagnostic system consists of two modules: the first module relies on a statistical F-score method to rank the variables in the dataset, and the second module deploys the random forest (RF) model for classification. Using a genetic algorithm, the hyperparameters of the RF model were optimized for improved accuracy. A dataset including 10 765 samples with 74 variables per sample was gathered from the Swedish National Study on Aging and Care (SNAC). The acquired dataset has a bias issue due to the extreme imbalance between the classes. In order to address this issue and prevent bias in the newly constructed model, we balanced the classes using a random undersampling strategy. The model's components are integrated into a single unit called F-RUS-RF. With a sensitivity of 92.25% and a specificity of 85.14%, the F-RUS-RF model achieved the highest accuracy of 86.15%, utilizing only six highly ranked variables according to the statistical F-score approach. We can lower the incidence of cancer in the aging population by addressing the risk factors for cancer that the F-RUS-RF model found. © 2024 The Author(s). International Journal of Imaging Systems and Technology published by Wiley Periodicals LLC.
Place, publisher, year, edition, pages
John Wiley & Sons, 2024. Vol. 34, no 6, article id e23221
Keywords [en]
artificial intelligence, cancer, convolutional neural network, deep learning, medical imaging, Deep neural networks, Diseases, Cardiovascular disease, Causes of death, Data-driven approach, Diagnostic systems, F-score, Random forest modeling, Risk factors, Convolutional neural networks
National Category
Cancer and Oncology Computer Sciences
Identifiers
URN: urn:nbn:se:bth-27223DOI: 10.1002/ima.23221ISI: 001370225400001Scopus ID: 2-s2.0-85209990620OAI: oai:DiVA.org:bth-27223DiVA, id: diva2:1919946
Projects
SNAC2024-12-102024-12-102024-12-17Bibliographically approved