Comparative Analysis of K-Nn, Naïve Bayes, and logistic regression for credit card fraud detection

Dr. Kavita Arora

Manav Rachna International Institute of Research & Studies

Dr. Sonal Pathak

Manav Rachna International Institute of Research and Studies

Nguyen Thi Dieu Linh

Hanoi University of Industry

Introduction: This paper highlights the outcome of the comparative study of “Various Machine learning algorithms namely K-NN, Naive Bayes, and Logistic Regression for Credit Card Fraud Detection” carried out based on a dataset taken from UCI.com in 2022-23 at Manav Rachna International Institute of Research and Studies.

Problem: Credit card fraud is still rife today and the modes are increasingly varied. Quite often we hear of fraud cases that cause irreplaceable injury to banks and financial institutions which cannot be compensated in terms of costs. To avoid scams with various modes of credit cards, we must be able to identify and find out the modes often used by fraudsters. This scheme liberates such financial institutions and banks with complete and appropriate information using Machine Learning Techniques, not only about the modes that scammers or fraudsters often use but also ways to protect against such frauds.

Objective: The present paper discusses the various machine learning models based on classification and regression, namely K-Nearest Neighbors, Naïve Bayes, and Logistic Regression, which are successfully able to achieve the classification accuracy of 80% using Logistic Regression with a Precision of 78%, Recall of 100%, and F1-Score of 88% for fraudulent credit card transactions.

Methodology: The comparative analysis demonstrates that for Precision, Recall, and Accuracy parameters, the K-Nearest Neighbor is a better approach for detecting fraudulent transactions than the Logistic Regression and Naïve Bayes.

Results: The accuracy is marginal high in Logistic Regression but the False Positive parameters are not able to identify the imbalanced data; therefore, they disguise the results and accuracy of Logistic Regression and K-Nearest Neighbor deems fit for such cases.

Conclusion: This scheme depicts the automated fraud classification systems using machine learning techniques, namely K-Nearest Neighbor, Logistic Regression, and Naive Bayes, to produce a model that can distinguish valid and invalid credit card transactions.

Originality: Through this research, the most relevant features are used to go through the visualization of accuracy with the confusion matrix, and accuracy calculations are obtained from the dataset used.
Limitations: Deep learning techniques could have been used to fetch even better results.

Keywords: Naïve Bayes, K Nearest Neighbor, fraud detection, logistic regression, machine learning
Published
2023-09-22
Downloads
Metrics
Metrics Loading ...
https://plu.mx/plum/a/?doi=10.16925/2357-6014.2023.03.05