Prediction of breast cancer using machine learning algorithms on different datasets

Ömer Çağrı Yavuz

Karadeniz Technical University

M. Hanefi Calp

Ankara Hacı Bayram Veli University

Hazel Ceren Erkengel

Karadeniz Technical University

Breast cancer is a disease that is becoming more and more common day by day, causing emotional and behavioral reactions and having fatal consequences if not detected early. At this point, traditional methods are insufficient, especially in early diagnosis. In this context, this study aimed to predict breast cancer by using machine learning (ML) algorithms on different datasets and to demonstrate the applicability of these algorithms. Algorithm performances were compared on balanced and unbalanced datasets, taking into account the performance metrics obtained in applications on different datasets. In addition, a model based on the Borda Voting method was developed by including the results obtained from four different algorithms (NB, KNN, DT, and RF) in the process. The prediction values obtained from each algorithm were written in different columns on the same excel file and the most repetitive value was accepted as the final result value. The developed model was tested on real data consisting of 60 records and the results were analyzed. When the results were examined, it was seen that higher performance was obtained with the proposed RF model compared to similar studies in the literature. Finally, the prediction results obtained with the developed model revealed the applicability of ML algorithms in the diagnosis of breast cancer.

Keywords: breast cancer, classification algorithms, machine learning, unbalanced dataset
Published
2023-01-22
Downloads
Metrics
Metrics Loading ...
https://plu.mx/plum/a/?doi=10.16925/2357-6014.2023.01.08