Fraud Detection using Analytics in R

📅 May 23-24, 2019 (9am-4.30pm)
🌍 English

Course description

The Association of Certified Fraud Examiners estimates that fraud costs organizations worldwide $3.7 trillion a year and that a typical company loses five percent of annual revenue due to fraud. Fraud attempts are expected to even increase further in future, making fraud detection highly necessary in most industries. This course will show how learning fraud patterns from historical data can be used to fight fraud. Combining theoretical and practical insights, the use of predictive analytics (using a labeled dataset) and descriptive analytics (using an unlabeled dataset) are discussed.

Two main challenges when building a supervised tool for fraud detection are the imbalance or skewness of the data and the various costs for different types of misclassification. We present methodologies to solve these issues. Moreover, we present techniques from robust statistics and digit analysis to detect unusual observations that are likely to be associated with fraud.

The discussed techniques can be applied across a wide variety of fraud applications, such as insurance fraud, credit card fraud, anti-money laundering, healthcare fraud, telecommunications fraud, click fraud, tax evasion, and counterfeiting. Various real-life case studies and examples are presented to illustrate the methodologies at work (using the R programming language) and to show how and why these machine learning tools complement traditional expert-based fraud-detection approaches.

Course outline

Chapter 1: Introduction and motivation

Importance of fraud detection
Defining fraud
Types of fraud
Fraud detection challenges
Fraud analytics process model

Chapter 2: Data preprocessing

Types of variables
Visual data exploration
Missing values
Standardizing and transforming data
Coarse classification

Chapter 3: Featurization and Social Network Analysis

Traditional features for fraud detection
Creating interesting features based on
- Time
- Frequency
- Recency
Social Network Analysis
- Social network components and characteristics
- Is Fraud a social phenomenon?
- Social Network Metrics
- Adding features based on social networks

Chapter 4: Dealing with imbalanced datasets

Random oversampling (ROS) of minority class and random undersampling (RUS) of majority class
Synthetic Minority Over-sampling Techniques (SMOTE & MWMOTE)
Diversified Sensitivity-based Under-Sampling (DSUS) or cluster-based under-sampling (CLUS)
Combining over- and under-sampling

Chapter 5: Supervised techniques for fraud detection