Credit Card Fraud Detection & Machine Learning

credit card fraud detection machine learning

Credit card fraud detection is a major concern for credit card companies. With credit cards so prevalent in our society, credit card companies must be able to prevent credit card fraud and protect their customers. Machine learning techniques can provide a powerful and effective way of detecting credit card fraud. In this blog post we will discuss machine learning techniques that data scientists can use to design appropriate credit card fraud detection solutions including algorithms such as Bayesian networks, support vector machines, neural networks and decision trees.

What are different types of credit card fraud?

The following are different types of credit card fraud:

  • Counterfeit credit cards: Counterfeit credit cards is about credit cards being created using fraudulent way by copying real credit card numbers and information from a legitimate credit card
  • New credit card account fraud: When credit card accounts are opening in someone else’s name.
  • Credit card account stealing: Individual steals the identity of an existing credit card holder to take over their credit accounts. The thief changes the billing address on file for these credit cards, making it difficult to track the fraud.
  • In-store credit card fraud via skimming: Credit cards are skimmed from their magnetic strip when a customer swipes his or her credit card at a merchant’s terminal. The skimmed credit cards are then used for fraudulent purposes.
  • Online credit card fraud: The number of online credit card transactions have been increasing rapidly
  • Card not present (CNP) fraud: When the credit card is used without physical possession of the credit or debit cards.
  • Stolen credit card numbers which are then resold to criminals for use in making online purchases or over the phone.
  • Canceled credit cards which are then used by criminals to make online purchases or over the phone.

What are different machine learning use cases for credit card fraud detection?

There are different aspects related to training machine learning models for credit card fraud detection:

  • Challenges including imbalanced dataset
  • Some popular features set
  • Machine learning use cases / algorithms
  • Machine learning models evaluation

What are some of the challenges in relation to training ML models for credit card fraud detection?

Before getting into learning machine learning use cases and related techniques, lets understand one common challenge related to training machine learning models for credit card fraud detection.

Imbalanced dataset is one of the key challenges in relation to training models for detecting credit card fraudulent transactions. In order to tackle the imbalanced data set, SMOTE, a sampling technique is used. In this technique, the minority class i.e credit card fraudulent transactions are oversampled to balance out the dataset of credit cards not being used fraudulently and credit cards which are used fraudulently. SMOTE technique can be used to generate new credit card examples by using the nearest neighbors of minority class credit cards. Another technique which can be used to creating balanced dataset is by leveraging K-means clustering and the genetic algorithm to create new data samples for minority clusters to create a balanced dataset.

Other challenges include ever changing nature of frauds and thus, need to innovate with machine learning solutions at regular intervals. Other challenge is high number of false alarms and thus need to determine the most appropriate machine learning models pertaining to different class of datasets.

What are some of the features which can be used for training the models?

The following are few of the important features which can be used for training models:

  • Uncommon purchase made by the cardholder
  • Sudden identical purchases on the same credit card
  • Purchases with overnight shipping
  • Purchases with international shipment
  • Multiple card shipments to a single address
  • Multiple transactions on a card in short time
  • Geolocation of transaction compared with cardholders registered location
  • Usage of single IP address for multiple credit cards.

What are different machine learning solutions for credit card fraud detection?

The following are different use cases for credit card fraud detection which can be dealt with machine learning models / solutions:

  • Predict/classify whether credit card transactions are fraudulent or not in batch as well as real time. Credit card fraudulent transactions classification has been dealt using different machine learning and deep learning algorithms such as Logistic Regression, Support Vector Machines (SVM), Genetic algorithm, deep neural networks, Random Forests, Bayesian network classifier, Bayesian belief network, K-nearest neighbors, Hidden Markov Models etc. Hidden Markov Models (HMMs) can be used to model human behavior based on card holders spending habits and a state in the model is the type of the purchase. Recall that HMMs are based on the Markov property that states that future events do not depend on the earlier states and only on the current ones. Algorithms such as logistic regression, random forests, SVM, deep neural networks and related implementations such as auto encoders, long short-term memory (LSTM) networks, and convolutional neural networks (CNNs) etc can be used to classify the whether the credit card transactions are fraudulent or not. Out of different algorithms, deep neural network has been found to perform better with high precision. In between Naive Bayes and Bayesian network classifier, Bayesian network classifier is found to have better performance. For credit card fraud, K-nearest neighbors algorithm can be used with different distance measures such as Euclidean and cosine distance to find similar patterns between fraudulent credit cards (past) transactions and normal credit cards (present/future). Recently, transfer learning models are also explored for credit card fraud detection.
  • Predict/classify whether credit cards are being used by the owner or not in batch as well as real time. Credit cards profiling for predicting credit usage of different users have been dealt using different machine learning and deep learning algorithms.
  • Adaptive stress testing with reinforcement learning agents: Recently, different solutions such as reinforcement learning agent are explored to attack credit card fraud detection models for finding the credit cards which are being used fraudulently.
  • Outlier detection methods are used in credit card fraud detection. Outliers can be defined as the data points which are very considerably different from the other credit cards transactions and can be used to detect credit card fraud. Various outlier detection techniques such as distance-based, density-based and distribution-based etc are explored for credit card fraud detection. Distance-based methods compare each transaction with every other transaction in the data set using a measure of similarity or dissimilarity (for example, Euclidean distance). Density-based methods compare the credit card transactions with random samples of “similar” credit cards transactions. Distribution-based outlier detection identifies outliers using one or more statistical distributions estimated from the data set, typically assuming that new observations are generated according to those same distributions.

What are different model evaluation techniques which can be used?

The performance of the machine learning models for credit card fraud detection is evaluated combination of accuracy, recall and area under the precision-recall curve (AUC) are used. Accuracy alone is not used for evaluation because of the data imbalance in credit card frauds. The recall is the ratio of correctly identified fraud transactions over the actual number of fraud transactions. This ensures the robustness of the system.

Here are few good white papers in relation to credit card fraud detection:

Machine learning techniques can be used for credit card fraud detection. It will help data scientist design appropriate credit card fraud detection solution which is a challenging problem in the machine learning domain. In this blog post, we’ve provided you with different aspects of machine learning solution design including algorithms such as logistic regression, support vector machines (SVM), genetic algorithm, deep neural networks and related implementations such as LSTM, CNN etc that are being explored to classify credit cards transactions whether it’s fraudulent or not. Outliers method also needs to be considered while designing credit card fraud detection system because outliers can detect credit card frauds effectively by identifying unusual patterns between normal credit cards transactions and fraudulent ones among all other credit cards transactions present in the dataset.

Ajitesh Kumar
Follow me
Latest posts by Ajitesh Kumar (see all)

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog, Revive-n-Thrive.com
Posted in Data Science, Deep Learning, Machine Learning. Tagged with , , .

Leave a Reply

Your email address will not be published. Required fields are marked *