Prediction of breast cancer using machine learning algorithms on different datasets
Department of Management Information Systems. Karadeniz Technical University.
email: omercagriyavuz@gmail.com
Department of Management Information Systems. Ankara Hacı Bayram Veli University.
email: hanefi.calp@hbv.edu.tr
Department of Management Information Systems. Karadeniz Technical University.
email: hazelceren@ktu.edu.tr
Breast cancer is a disease that is becoming more and more common day by day, causing emotional and behavioral reactions and having fatal consequences if not detected early. At this point, traditional methods are insufficient, especially in early diagnosis. In this context, this study aimed to predict breast cancer by using machine learning (ML) algorithms on different datasets and to demonstrate the applicability of these algorithms. Algorithm performances were compared on balanced and unbalanced datasets, taking into account the performance metrics obtained in applications on different datasets. In addition, a model based on the Borda Voting method was developed by including the results obtained from four different algorithms (NB, KNN, DT, and RF) in the process. The prediction values obtained from each algorithm were written in different columns on the same excel file and the most repetitive value was accepted as the final result value. The developed model was tested on real data consisting of 60 records and the results were analyzed. When the results were examined, it was seen that higher performance was obtained with the proposed RF model compared to similar studies in the literature. Finally, the prediction results obtained with the developed model revealed the applicability of ML algorithms in the diagnosis of breast cancer.
B. Mahesh, “ML algorithms-a review,” International Journal of Science and Research (IJSR), vol. 9, 381-386, 2020. doi: https://doi.org/10.21275/ART20203995
R. D. Nindrea, T. Aryandono, L. Lazuardi and I. Dwiprahast, “Diagnostic accuracy of different ML algorithms for breast cancer risk calculation: a meta-analysis,” Asian Pacific journal of cancer prevention: APJCP, vol. 19, no. 7, 1747, 2018. doi: https://doi.org/10.22034/APJCP.2018.19.7.1747
World Health Organization. Cancer, 2022. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cancer.
S. Chaudhury, S. Mukhopadhyay and D. Kbah, “A Systematic Review of Cad System Based Approach in Diagnosing Breast Cancer and Analyze Effectiveness of ML and Deep Learning Algorithms in Early Detection,” IJBPAS, vol. 10, no. 11, pp. 804-827. doi: https://doi.org/10.31032/IJBPAS/2021/10.11.1069
Z. Ahmed, K. Mohamed, S. Zeeshan and X. Dong, Artificial intelligence with multi-functional ML platform development for better healthcare and precision medicine, 2020. doi: https://doi.org/10.1093/database/baaa010
S. Nanglia, M. Ahmad, F.A. Khan, N.Z. Jhanjhi, An enhanced Predictive heterogeneous ensemble model for breast cancer prediction. Biomedical Signal Processing and Control, vol. 72, 103279, 2022. doi: https://doi.org/10.1016/j.bspc.2021.103279
S. Sahan, K. Polat, H. Kodaz and S. Güneş, “A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis,” Computers in Biology and Medicine, vol. 37, no. 3, pp. 415-423, 2007. doi: https://doi.org/10.1016/j.compbiomed.2006.05.003
A. Eleyan, “Breast cancer classification using moments”, Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting, 1–4, 2012. doi: https://doi.org/10.1109/SIU.2012.6204778
M. A. Al-Hashem, A. M. Alqudah, and Q. Qananwah, “Performance Evaluation of Different ML Classification Algorithms for Disease Diagnosis,” International Journal of E-Health and Medical Communications (IJEHMC), vol. 12, no. 6, pp. 1-28, 2021. [Online]. Available: https://www.igi-global.com/gateway/article/full-text-pdf/278822
L. G. Ahmad, A. T. Eshlaghy, A. Poorebrahimi, M. Ebrahimi and A. R. Razavi, “Using three ML techniques for predicting breast cancer recurrence,” J Health Med Inform, vol. 4, no. 124, p. 3. 2013. doi: https://dx.doi.org/10.4172/2157-7420.1000124
K. Williams, P. A. Idowu, J. A. Balogun, and A. I. Oluwaranti, "Breast cancer risk prediction using data mining classification techniques,” Transactions on Networks and Communications, vol. 3, no. 2, pp. 1, 2015. doi: https://dx.doi.org/10.14738/tnc.32.662
T. M. Mejía, M. G. Pérez, V. H. Andaluz and A. Conci, “Automatic segmentation and analysis of thermograms using texture descriptors for breast cancer detection,” Asia-Pacific Conference on Computer Aided System Engineering, IEEE, 2015. doi: https://doi.org/10.1109/APCASE.2015.12
M. Tahmooresi, A. Afshar, B. B. Rad, K. B. Nowshath, and M. A. Bamiah, "Early detection of breast cancer using ML techniques,” Journal of Telecommunication, Electronic and Computer Engineering (JTEC), vol. 10, no. 3-2, pp. 21-27, 2018. [Online]. Available: https://jtec.utem.edu.my/jtec/article/view/4706/3462
H. K. K. Zand, “A comparative survey on data mining techniques for breast cancer diagnosis and prediction,” Indian Journal of Fundamental and Applied Life Sciences, 4330-9, 2015. doi: https://dx.doi.org/10.26808/rs.re.v3i5.04
V. Bevilacqua, A. Brunetti, M. Triggiani, D. Magaletti, M. Telegrafo, M., M. Moschetta, “An optimized feed-forward artificial neural network topology to support radiologists in breast lesions classification,” In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, pp. 1385-1392, 2016. doi: https://doi.org/10.1145/2908961.2931733
L. Hussain, W. Aziz, S. Saeed, S. Rathore and M. Rafique, “Automated breast cancer detection using ML techniques by extracting different feature extracting strategies,” In 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE) pp. 327-331, IEEE, 2018. doi: https://dx.doi.org/10.1109/TrustCom/BigDataSE.2018.00057
M. Amrane, S. Oukid, I. Gagaoua, T. Ensari, “Breast cancer classification using ML,” In 2018 electric electronics, computer science, biomedical engineerings' meeting (EBBT) pp. 1-4, IEEE, 2018. doi: https://doi.org/10.1109/EBBT.2018.8391453
E. A. Bayrak, P. Kırcı and T. Ensari, “Comparison of ML methods for breast cancer diagnosis,” In 2019 Scientific meeting on electrical-electronics & biomedical engineering and computer science (EBBT) pp. 1-3, IEEE, 2019. doi: https://doi.org/10.1109/EBBT.2019.8741990
T. A. Asfaw, “Comparative Analysis of Classification Approaches For Breast Cancer,” International Journal of Computer Engineering and Technology (IJCET), vol. 10, no. 4, pp. 10-16, 2019. [Online]. Available: https://iaeme.com/Home/issue/IJCET?Volume=10&Issue=4
V. P. C. Magboo and M. S. A. Magboo, “ML Classifiers on Breast Cancer Recurrences," Procedia Computer Science, vol. 192, pp. 2742-2752, 2021. doi: https://doi.org/10.1016/j.procs.2021.09.044
A. Yadav, I. Jamir, R. R. Jain, M. Sohani, “Comparative study of ML algorithms for breast cancer prediction-a review,” Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, vol. 5, no. 2, pp. 979-985, 2019. doi: https://doi.org/10.32628/CSEIT1952278
F. J. M. Shamrat, M. A. Raihan, A. S. Rahman, I. Mahmud, R. Akter, “An analysis on breast disease prediction using ML approaches,” International Journal of Scientific & Technology Research, vol. 9, no. 02, pp. 2450-2455, 2020. [Online]. Available: http://www.ijstr.org/final-print/feb2020/An-Analysis-On-Breast-Disease-Prediction-Using-Machine-Learning-Approaches.pdf
T. A. Assegie, “An optimized K-Nearest Neighbor based breast cancer detection,” Journal of Robotics and Control (JRC), vol. 2, no. 3, pp. 115-118, 2021. doi: https://doi.org/10.18196/jrc.2363
M. Manjurul Ahsan and Z. Siddique, Z, “ML based disease diagnosis: A comprehensive review,” arXiv:2201.02755v1, arXiv e-prints, arXiv-2112, pp. 1-5, 2022. doi: https://doi.org/10.48550/arXiv.2112.15538
S. Muhtadi, “Breast Tumor Classification Using Intratumoral Quantitative Ultrasound Descriptors,” Computational and Mathematical Methods in Medicine, pp. 1-18, 2022. doi: https://doi.org/10.1155/2022/1633858
D. Dua and C. Graff, UCI ML Repository. Irvine, CA: University of California, School of Information and Computer Science, 2019.
B. Chandra, M. Gupta, “Robust approach for estimating probabilities in Naïve–Bayes Classifier for gene expression data,” Expert Systems with Applications, vol. 38, no. 3, pp.1293-1298, 2011. doi: https://doi.org/10.1016/j.eswa.2010.06.076
S. Ray, 6 Easy Steps to Learn Naive Bayes Algorithm. 2017. [Online]. Available: https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained.
I. Zobu, İ. Naive Bayes, Teorisi ve Python uygulaması. 2019. [Online]. Available: https://medium.com/kaveai/naive-bayes-ve-uygulamalar%C4%B1-d7d5a56c689b.
G. I. Webb, Naive bayes, Encyclopedia of Machine Learning, C. Sammut and G. I. Webb, Eds., pp. 713–714, Springer, New York, NY, USA, 2010.
T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE transactions on information theory, vol. 13, no. 1, pp. 21-27, 1967. doi: https://doi.org/10.1109/TIT.1967.1053964
S. Dhanabal and S. Chandramathi, “A review of various k-nearest neighbor query processing techniques,” International Journal of Computer Applications, vol. 31, no. 7, pp. 14-22, 2011.
X. Huang, “An improved KNN algorithm and its application in real-time car-sharing prediction,” M.S. thesis, Dalian University of Technology, Daian, China, 2018.
Z. Lv, K. Ota, J. Lloret, W. Xiang, P. Bellavista, “Complexity Problems Handled by Advanced Computer Simulation Technology in Smart Cities 2021,” Hindawi Complexity, Article ID 9847249, 2022. doi: https://doi.org/10.1155/2022/9847249
R. I. Borman, R. Napianto, N. Nugroho, D.Pasha, Y. Rahmanto and Y. E. P. Yudoutomo, “Implementation of PCA and KNN Algorithms in the Classification of Indonesian Medicinal Plants,” In 2021 International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE), pp. 46-50, IEEE, 2021. doi: https://doi.org/10.1109/ICOMITEE53461.2021.9650176
F. Rossi, A. Aizzuddin and A. Rahni, A., “Joint Segmentation Methods of Tumor Delineation in PET – CT Images : A Review”, 7, pp. 137–145, 2018. doi: https://dx.doi.org/10.14419/ijet.v7i3.32.18414
P. Prasetyawan, I. Ahmad, R. I. Borman, Y. A. Pahlevi, D. E. Kurniawan, “Classification of the Period Undergraduate Study Using Back-propagation Neural Network,” In 2018 International Conference on Applied Engineering (ICAE) pp. 1-5, IEEE, 2018. doi: https://doi.org/10.1109/INCAE.2018.8579389
W. Sullivan, “ML For Beginners Guide Algorithms: Supervised & Unsupervised Learning, Decision Tree & Random Forest Introduction”, CreateSpace Independent Publishing Platform, 2017.
L. Breiman, Random forests, ML, vol. 45, no. 1, pp. 5-32, 2001. doi: https://doi.org/10.1023/A:1010933404324
H. Ampadu, Random Forests, Understanding. 2021. [Online]. Available: https://aipool.com/a/s/random-forests-understanding. Accessed on: April 1, 2022.
N. Seliya, T. M. Khoshgoftaar, J. Van Hulse, “A study on the relationships of classifier performance metrics,” In 2009 21st IEEE international conference on tools with artificial intelligence, pp. 59-66, IEEE, 2009. doi: https://doi.org/10.1109/ICTAI.2009.25
H. Nizam and S. S. Akın, “Sosyal medyada makine öğrenmesi ile duygu analizinde dengeli ve dengesiz veri setlerinin performanslarının karşılaştırılması”, XIX. Türkiye'de İnternet Konferansı,pp. 1-6, 2014. [Online]. Available: https://inet-tr.org.tr/inetconf19/bildiri/10.pdf
M. Almseidin, M. Alzubi, S. Kovacs, M. Alkasassbeh, “Evaluation of ML algorithms for intrusion detection system,” In 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY) (pp. 000277-000282). IEEE, 2017. doi: https://doi.org/10.48550/arXiv.1801.02330
M. Wu, X. Zhong, Q. Peng, M. Xu, S. Huang, S., Yuan, J., J. Ma, T. Tan, “Prediction of molecular subtypes of breast cancer using BI-RADS features based on a “white box” ML approach in a multi-modal imaging setting,” European journal of radiology, vol. 114, pp. 175-184. 2019. doi: https://doi.org/10.1016/j.ejrad.2019.03.015
S. Balakrishna, M. Thirumaran, V. Solanki, “Machine Learning based Improved Gaussian Mixture Model for IoT Real-Time Data Analysis: Análisis de los datos,” Revista Ingeniería Solidaria, vol. 16, no. 1, Jan. 2020. doi: https://doi.org/10.16925/2357-6014.2020.01.02
H. M. Dodeen, “Effectiveness of valid mean substitution in treating missing data in attitude assessment,” Assessment & Evaluation in Higher Education, vol. 28, no. 5, pp. 505-513, 2003. doi: https://doi.org/10.1080/02602930301674
N. A. Obuchowski, “ROC analysis,” American Journal of Roentgenology, vol. 184, no. 2, 364-372, 2005. doi: https://doi.org/10.2214/ajr.184.2.01840364
A. Fernández, S. Garcia, F. Herrera,“SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary,” Journal of artificial intelligence research,” vol. 61, pp. 863-905, 2018. doi: https://doi.org/10.1613/jair.1.11192
O. I. Obaid, M. A. Mohammed, M. K. A. Ghani, A. Mostafa, F. Taha, “Evaluating the performance of ML techniques in the classification of Wisconsin Breast Cancer,” International Journal of Engineering & Technology, vol. 7, no. 4, pp. 160-166, 2018. [Online]. Available: http://185.104.157.219:8080/repoAnbar/bitstream/123456789/4488/1/IJET-23737.pdf
V. Chaurasia, S. Pal, B. B. Tiwari, “Prediction of benign and malignant breast cancer using data mining techniques,” Journal of Algorithms & Computational Technology, vol. 12, no. 2, pp. 119-126, 2018. doi: https://dx.doi.org/10.1177/1748301818756225
Copyright (c) 2023 Ingeniería Solidaria
This work is licensed under a Creative Commons Attribution 4.0 International License.
Cession of rights and ethical commitment
As the author of the article, I declare that is an original unpublished work exclusively created by me, that it has not been submitted for simultaneous evaluation by another publication and that there is no impediment of any kind for concession of the rights provided for in this contract.
In this sense, I am committed to await the result of the evaluation by the journal Ingeniería Solidaría before considering its submission to another medium; in case the response by that publication is positive, additionally, I am committed to respond for any action involving claims, plagiarism or any other kind of claim that could be made by third parties.
At the same time, as the author or co-author, I declare that I am completely in agreement with the conditions presented in this work and that I cede all patrimonial rights, in other words, regarding reproduction, public communication, distribution, dissemination, transformation, making it available and all forms of exploitation of the work using any medium or procedure, during the term of the legal protection of the work and in every country in the world, to the Universidad Cooperativa de Colombia Press.