Predicción del cáncer de mama utilizando algoritmos de aprendizaje automático en diferentes conjuntos de datos
Department of Management Information Systems. Karadeniz Technical University.
email: omercagriyavuz@gmail.com
Department of Management Information Systems. Ankara Hacı Bayram Veli University.
email: hanefi.calp@hbv.edu.tr
Department of Management Information Systems. Karadeniz Technical University.
email: hazelceren@ktu.edu.tr
Introducción: El trabajo de investigación “Predicción del cáncer de mama utilizando algoritmos de aprendizaje automático en diferentes conjuntos de datos”, se desarrolló en la Universidad Técnica de Karadeniz en el año 2022.
Problema: El cáncer de mama es una enfermedad cada vez más común, día a día, provocando reacciones emocionales y conductuales y con consecuencias fatales si no se detecta a tiempo. En este punto, los métodos tradicionales son insuficientes, sobre todo en el diagnóstico precoz. Este estudio tiene como objetivo predecir el cáncer de mama mediante el uso de algoritmos de aprendizaje automático (ML) en diferentes conjuntos de datos y demuestra la aplicabilidad de estos algoritmos.
Metodología: se compararon los rendimientos de los algoritmos en conjuntos de datos equilibrados y no equilibrados, teniendo en cuenta las métricas de rendimiento obtenidas en aplicaciones en diferentes conjuntos de datos. Además, se desarrolló un modelo basado en el método Borda Voting al incluir en el proceso los resultados obtenidos de cuatro algoritmos diferentes (NB, KNN, DT y RF).
Originalidad y Limitaciones de la Investigación: En el modelo desarrollado en el marco del estudio se combinaron los valores de los resultados obtenidos de diferentes algoritmos como NB, KNN, DT y RF; el objetivo es aumentar el rendimiento del modelo con este proceso, que se basa en el método Borda Voting.
Resultados: Los valores de predicción obtenidos de cada algoritmo se escribieron en diferentes columnas en la misma hoja de cálculo y se aceptó el valor más repetitivo como valor final del resultado. El modelo desarrollado se probó en datos reales que constaban de 60 registros y se analizaron los resultados.
Conclusión: Cuando se examinaron los resultados, se observó que se obtuvo un mayor rendimiento con el modelo de RF propuesto en comparación con estudios similares en la literatura.
B. Mahesh, “ML algorithms-a review,” International Journal of Science and Research (IJSR), vol. 9, 381-386, 2020. doi: https://doi.org/10.21275/ART20203995
R. D. Nindrea, T. Aryandono, L. Lazuardi and I. Dwiprahast, “Diagnostic accuracy of different ML algorithms for breast cancer risk calculation: a meta-analysis,” Asian Pacific journal of cancer prevention: APJCP, vol. 19, no. 7, 1747, 2018. doi: https://doi.org/10.22034/APJCP.2018.19.7.1747
World Health Organization. Cancer, 2022. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cancer.
S. Chaudhury, S. Mukhopadhyay and D. Kbah, “A Systematic Review of Cad System Based Approach in Diagnosing Breast Cancer and Analyze Effectiveness of ML and Deep Learning Algorithms in Early Detection,” IJBPAS, vol. 10, no. 11, pp. 804-827. doi: https://doi.org/10.31032/IJBPAS/2021/10.11.1069
Z. Ahmed, K. Mohamed, S. Zeeshan and X. Dong, Artificial intelligence with multi-functional ML platform development for better healthcare and precision medicine, 2020. doi: https://doi.org/10.1093/database/baaa010
S. Nanglia, M. Ahmad, F.A. Khan, N.Z. Jhanjhi, An enhanced Predictive heterogeneous ensemble model for breast cancer prediction. Biomedical Signal Processing and Control, vol. 72, 103279, 2022. doi: https://doi.org/10.1016/j.bspc.2021.103279
S. Sahan, K. Polat, H. Kodaz and S. Güneş, “A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis,” Computers in Biology and Medicine, vol. 37, no. 3, pp. 415-423, 2007. doi: https://doi.org/10.1016/j.compbiomed.2006.05.003
A. Eleyan, “Breast cancer classification using moments”, Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting, 1–4, 2012. doi: https://doi.org/10.1109/SIU.2012.6204778
M. A. Al-Hashem, A. M. Alqudah, and Q. Qananwah, “Performance Evaluation of Different ML Classification Algorithms for Disease Diagnosis,” International Journal of E-Health and Medical Communications (IJEHMC), vol. 12, no. 6, pp. 1-28, 2021. [Online]. Available: https://www.igi-global.com/gateway/article/full-text-pdf/278822
L. G. Ahmad, A. T. Eshlaghy, A. Poorebrahimi, M. Ebrahimi and A. R. Razavi, “Using three ML techniques for predicting breast cancer recurrence,” J Health Med Inform, vol. 4, no. 124, p. 3. 2013. doi: https://dx.doi.org/10.4172/2157-7420.1000124
K. Williams, P. A. Idowu, J. A. Balogun, and A. I. Oluwaranti, "Breast cancer risk prediction using data mining classification techniques,” Transactions on Networks and Communications, vol. 3, no. 2, pp. 1, 2015. doi: https://dx.doi.org/10.14738/tnc.32.662
T. M. Mejía, M. G. Pérez, V. H. Andaluz and A. Conci, “Automatic segmentation and analysis of thermograms using texture descriptors for breast cancer detection,” Asia-Pacific Conference on Computer Aided System Engineering, IEEE, 2015. doi: https://doi.org/10.1109/APCASE.2015.12
M. Tahmooresi, A. Afshar, B. B. Rad, K. B. Nowshath, and M. A. Bamiah, "Early detection of breast cancer using ML techniques,” Journal of Telecommunication, Electronic and Computer Engineering (JTEC), vol. 10, no. 3-2, pp. 21-27, 2018. [Online]. Available: https://jtec.utem.edu.my/jtec/article/view/4706/3462
H. K. K. Zand, “A comparative survey on data mining techniques for breast cancer diagnosis and prediction,” Indian Journal of Fundamental and Applied Life Sciences, 4330-9, 2015. doi: https://dx.doi.org/10.26808/rs.re.v3i5.04
V. Bevilacqua, A. Brunetti, M. Triggiani, D. Magaletti, M. Telegrafo, M., M. Moschetta, “An optimized feed-forward artificial neural network topology to support radiologists in breast lesions classification,” In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, pp. 1385-1392, 2016. doi: https://doi.org/10.1145/2908961.2931733
L. Hussain, W. Aziz, S. Saeed, S. Rathore and M. Rafique, “Automated breast cancer detection using ML techniques by extracting different feature extracting strategies,” In 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE) pp. 327-331, IEEE, 2018. doi: https://dx.doi.org/10.1109/TrustCom/BigDataSE.2018.00057
M. Amrane, S. Oukid, I. Gagaoua, T. Ensari, “Breast cancer classification using ML,” In 2018 electric electronics, computer science, biomedical engineerings' meeting (EBBT) pp. 1-4, IEEE, 2018. doi: https://doi.org/10.1109/EBBT.2018.8391453
E. A. Bayrak, P. Kırcı and T. Ensari, “Comparison of ML methods for breast cancer diagnosis,” In 2019 Scientific meeting on electrical-electronics & biomedical engineering and computer science (EBBT) pp. 1-3, IEEE, 2019. doi: https://doi.org/10.1109/EBBT.2019.8741990
T. A. Asfaw, “Comparative Analysis of Classification Approaches For Breast Cancer,” International Journal of Computer Engineering and Technology (IJCET), vol. 10, no. 4, pp. 10-16, 2019. [Online]. Available: https://iaeme.com/Home/issue/IJCET?Volume=10&Issue=4
V. P. C. Magboo and M. S. A. Magboo, “ML Classifiers on Breast Cancer Recurrences," Procedia Computer Science, vol. 192, pp. 2742-2752, 2021. doi: https://doi.org/10.1016/j.procs.2021.09.044
A. Yadav, I. Jamir, R. R. Jain, M. Sohani, “Comparative study of ML algorithms for breast cancer prediction-a review,” Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, vol. 5, no. 2, pp. 979-985, 2019. doi: https://doi.org/10.32628/CSEIT1952278
F. J. M. Shamrat, M. A. Raihan, A. S. Rahman, I. Mahmud, R. Akter, “An analysis on breast disease prediction using ML approaches,” International Journal of Scientific & Technology Research, vol. 9, no. 02, pp. 2450-2455, 2020. [Online]. Available: http://www.ijstr.org/final-print/feb2020/An-Analysis-On-Breast-Disease-Prediction-Using-Machine-Learning-Approaches.pdf
T. A. Assegie, “An optimized K-Nearest Neighbor based breast cancer detection,” Journal of Robotics and Control (JRC), vol. 2, no. 3, pp. 115-118, 2021. doi: https://doi.org/10.18196/jrc.2363
M. Manjurul Ahsan and Z. Siddique, Z, “ML based disease diagnosis: A comprehensive review,” arXiv:2201.02755v1, arXiv e-prints, arXiv-2112, pp. 1-5, 2022. doi: https://doi.org/10.48550/arXiv.2112.15538
S. Muhtadi, “Breast Tumor Classification Using Intratumoral Quantitative Ultrasound Descriptors,” Computational and Mathematical Methods in Medicine, pp. 1-18, 2022. doi: https://doi.org/10.1155/2022/1633858
D. Dua and C. Graff, UCI ML Repository. Irvine, CA: University of California, School of Information and Computer Science, 2019.
B. Chandra, M. Gupta, “Robust approach for estimating probabilities in Naïve–Bayes Classifier for gene expression data,” Expert Systems with Applications, vol. 38, no. 3, pp.1293-1298, 2011. doi: https://doi.org/10.1016/j.eswa.2010.06.076
S. Ray, 6 Easy Steps to Learn Naive Bayes Algorithm. 2017. [Online]. Available: https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained.
I. Zobu, İ. Naive Bayes, Teorisi ve Python uygulaması. 2019. [Online]. Available: https://medium.com/kaveai/naive-bayes-ve-uygulamalar%C4%B1-d7d5a56c689b.
G. I. Webb, Naive bayes, Encyclopedia of Machine Learning, C. Sammut and G. I. Webb, Eds., pp. 713–714, Springer, New York, NY, USA, 2010.
T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE transactions on information theory, vol. 13, no. 1, pp. 21-27, 1967. doi: https://doi.org/10.1109/TIT.1967.1053964
S. Dhanabal and S. Chandramathi, “A review of various k-nearest neighbor query processing techniques,” International Journal of Computer Applications, vol. 31, no. 7, pp. 14-22, 2011.
X. Huang, “An improved KNN algorithm and its application in real-time car-sharing prediction,” M.S. thesis, Dalian University of Technology, Daian, China, 2018.
Z. Lv, K. Ota, J. Lloret, W. Xiang, P. Bellavista, “Complexity Problems Handled by Advanced Computer Simulation Technology in Smart Cities 2021,” Hindawi Complexity, Article ID 9847249, 2022. doi: https://doi.org/10.1155/2022/9847249
R. I. Borman, R. Napianto, N. Nugroho, D.Pasha, Y. Rahmanto and Y. E. P. Yudoutomo, “Implementation of PCA and KNN Algorithms in the Classification of Indonesian Medicinal Plants,” In 2021 International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE), pp. 46-50, IEEE, 2021. doi: https://doi.org/10.1109/ICOMITEE53461.2021.9650176
F. Rossi, A. Aizzuddin and A. Rahni, A., “Joint Segmentation Methods of Tumor Delineation in PET – CT Images : A Review”, 7, pp. 137–145, 2018. doi: https://dx.doi.org/10.14419/ijet.v7i3.32.18414
P. Prasetyawan, I. Ahmad, R. I. Borman, Y. A. Pahlevi, D. E. Kurniawan, “Classification of the Period Undergraduate Study Using Back-propagation Neural Network,” In 2018 International Conference on Applied Engineering (ICAE) pp. 1-5, IEEE, 2018. doi: https://doi.org/10.1109/INCAE.2018.8579389
W. Sullivan, “ML For Beginners Guide Algorithms: Supervised & Unsupervised Learning, Decision Tree & Random Forest Introduction”, CreateSpace Independent Publishing Platform, 2017.
L. Breiman, Random forests, ML, vol. 45, no. 1, pp. 5-32, 2001. doi: https://doi.org/10.1023/A:1010933404324
H. Ampadu, Random Forests, Understanding. 2021. [Online]. Available: https://aipool.com/a/s/random-forests-understanding. Accessed on: April 1, 2022.
N. Seliya, T. M. Khoshgoftaar, J. Van Hulse, “A study on the relationships of classifier performance metrics,” In 2009 21st IEEE international conference on tools with artificial intelligence, pp. 59-66, IEEE, 2009. doi: https://doi.org/10.1109/ICTAI.2009.25
H. Nizam and S. S. Akın, “Sosyal medyada makine öğrenmesi ile duygu analizinde dengeli ve dengesiz veri setlerinin performanslarının karşılaştırılması”, XIX. Türkiye'de İnternet Konferansı,pp. 1-6, 2014. [Online]. Available: https://inet-tr.org.tr/inetconf19/bildiri/10.pdf
M. Almseidin, M. Alzubi, S. Kovacs, M. Alkasassbeh, “Evaluation of ML algorithms for intrusion detection system,” In 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY) (pp. 000277-000282). IEEE, 2017. doi: https://doi.org/10.48550/arXiv.1801.02330
M. Wu, X. Zhong, Q. Peng, M. Xu, S. Huang, S., Yuan, J., J. Ma, T. Tan, “Prediction of molecular subtypes of breast cancer using BI-RADS features based on a “white box” ML approach in a multi-modal imaging setting,” European journal of radiology, vol. 114, pp. 175-184. 2019. doi: https://doi.org/10.1016/j.ejrad.2019.03.015
S. Balakrishna, M. Thirumaran, V. Solanki, “Machine Learning based Improved Gaussian Mixture Model for IoT Real-Time Data Analysis: Análisis de los datos,” Revista Ingeniería Solidaria, vol. 16, no. 1, Jan. 2020. doi: https://doi.org/10.16925/2357-6014.2020.01.02
H. M. Dodeen, “Effectiveness of valid mean substitution in treating missing data in attitude assessment,” Assessment & Evaluation in Higher Education, vol. 28, no. 5, pp. 505-513, 2003. doi: https://doi.org/10.1080/02602930301674
N. A. Obuchowski, “ROC analysis,” American Journal of Roentgenology, vol. 184, no. 2, 364-372, 2005. doi: https://doi.org/10.2214/ajr.184.2.01840364
A. Fernández, S. Garcia, F. Herrera,“SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary,” Journal of artificial intelligence research,” vol. 61, pp. 863-905, 2018. doi: https://doi.org/10.1613/jair.1.11192
O. I. Obaid, M. A. Mohammed, M. K. A. Ghani, A. Mostafa, F. Taha, “Evaluating the performance of ML techniques in the classification of Wisconsin Breast Cancer,” International Journal of Engineering & Technology, vol. 7, no. 4, pp. 160-166, 2018. [Online]. Available: http://185.104.157.219:8080/repoAnbar/bitstream/123456789/4488/1/IJET-23737.pdf
V. Chaurasia, S. Pal, B. B. Tiwari, “Prediction of benign and malignant breast cancer using data mining techniques,” Journal of Algorithms & Computational Technology, vol. 12, no. 2, pp. 119-126, 2018. doi: https://dx.doi.org/10.1177/1748301818756225
Derechos de autor 2023 Ingeniería Solidaria
Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
Compromiso ético y cesión de derechos
El autor debe declarar que su trabajo es original e inédito y que no se ha postulado a evaluación simultánea para su publicación por otro medio. Además, debe asegurar que no tiene impedimentos de ninguna naturaleza para la concesión de los derechos previstos en el contrato.
El autor se compromete a esperar el resultado de evaluación de la revista Ingeniería Solidaria, antes de considerar su presentación a otro medio; en caso de que la respuesta de publicación sea positiva, adicionalmente, se compromete a responder por cualquier acción de reivindicación, plagio u otra clase de reclamación que al respecto pudiera sobrevenir por parte de terceros.
Asimismo, debe declarar que, como autor o coautor, está de acuerdo por completo con los contenidos presentados en el trabajo y ceder todos los derechos patrimoniales, es decir, su reproducción, comunicación pública, distribución, divulgación, transformación, puesta a disposición y demás formas de utilización de la obra por cualquier medio o procedimiento, por el término de su protección legal y en todos los países del mundo, al Fondo Editorial de la Universidad Cooperativa de Colombia, de manera gratuita y sin contraprestación presente o futura.