Bart Baesens

← Back to courses

Fraud Detection using Analytics in R

📅 May 23-24, 2019 (9am-4.30pm)
🌍 English

Course description

The Association of Certified Fraud Examiners estimates that fraud costs organizations worldwide $3.7 trillion a year and that a typical company loses five percent of annual revenue due to fraud. Fraud attempts are expected to even increase further in future, making fraud detection highly necessary in most industries. This course will show how learning fraud patterns from historical data can be used to fight fraud. Combining theoretical and practical insights, the use of predictive analytics (using a labeled dataset) and descriptive analytics (using an unlabeled dataset) are discussed.

Two main challenges when building a supervised tool for fraud detection are the imbalance or skewness of the data and the various costs for different types of misclassification. We present methodologies to solve these issues. Moreover, we present techniques from robust statistics and digit analysis to detect unusual observations that are likely to be associated with fraud.

The discussed techniques can be applied across a wide variety of fraud applications, such as insurance fraud, credit card fraud, anti-money laundering, healthcare fraud, telecommunications fraud, click fraud, tax evasion, and counterfeiting. Various real-life case studies and examples are presented to illustrate the methodologies at work (using the R programming language) and to show how and why these machine learning tools complement traditional expert-based fraud-detection approaches.

Course outline

Chapter 1: Introduction and motivation

  • Importance of fraud detection
  • Defining fraud
  • Types of fraud
  • Fraud detection challenges
  • Fraud analytics process model

Chapter 2: Data preprocessing

  • Types of variables
  • Visual data exploration
  • Missing values
  • Standardizing and transforming data
  • Coarse classification

Chapter 3: Featurization and Social Network Analysis

  • Traditional features for fraud detection
  • Creating interesting features based on
    • Time
    • Frequency
    • Recency
  • Social Network Analysis
    • Social network components and characteristics
    • Is Fraud a social phenomenon?
    • Social Network Metrics
    • Adding features based on social networks

Chapter 4: Dealing with imbalanced datasets

  • Random oversampling (ROS) of minority class and random undersampling (RUS) of majority class
  • Synthetic Minority Over-sampling Techniques (SMOTE & MWMOTE)
  • Diversified Sensitivity-based Under-Sampling (DSUS) or cluster-based under-sampling (CLUS)
  • Combining over- and under-sampling

Chapter 5: Supervised techniques for fraud detection

  • Linear and logistic regression
  • Decision trees and ensemble methods
  • Neural networks
  • Evaluating fraud detection models

Chapter 6: Unsupervised techniques for fraud detection

  • Digit analysis using Benford’s Law
  • Multivariate outlier detection using robust statistics
  • Clustering approaches
  • Dimension reduction techniques
👩‍🏫 Lecturers

Prof. dr. Tim Verdonck
Professor at University of Antwerp

Prof. dr. Bart Baesens
Professor at KU Leuven

🏢 Location

Van der Valk Hotel Brussels Airport (Belgium)

Culliganlaan 4b
1831 Diegem
Belgium
hotelbrusselsairport.com

🏫 Organizer

Bart Baesens

💼 Register

This course is in the past, registration is no longer possible.


Price and Registration

This course is in the past, registration is no longer possible.