Bart Baesens

← Back to courses

Fraud Detection using Analytics in R

📅 September 17 -18, 2020 (9am-5pm)
🌍 English

Course description

The Association of Certified Fraud Examiners estimates that fraud costs organizations worldwide $3.7 trillion a year and that a typical company loses five percent of annual revenue due to fraud. Fraud attempts are expected to even increase further in future, making fraud detection highly necessary in most industries. This course will show how learning fraud patterns from historical data can be used to fight fraud. Combining theoretical and practical insights, the use of predictive analytics (using a labeled dataset) and descriptive analytics (using an unlabeled dataset) are discussed.

Two main challenges when building a supervised tool for fraud detection are the imbalance or skewness of the data and the various costs for different types of misclassification. We present methodologies to solve these issues. Moreover, we present techniques from robust statistics and digit analysis to detect unusual observations that are likely to be associated with fraud.

The discussed techniques can be applied across a wide variety of fraud applications, such as insurance fraud, credit card fraud, anti-money laundering, healthcare fraud, telecommunications fraud, click fraud, tax evasion, and counterfeiting. Various real-life case studies and examples are presented to illustrate the methodologies at work (using the R programming language) and to show how and why these machine learning tools complement traditional expert-based fraud-detection approaches.

Course outline

Chapter 1: Introduction and motivation

  • Importance of fraud detection
  • Defining fraud
  • Types of fraud
  • Fraud detection challenges
  • Fraud analytics process model

Chapter 2: Data preprocessing

  • Types of variables
  • Visual data exploration
  • Missing values
  • Standardizing and transforming data
  • Coarse classification

Chapter 3: Featurization and Social Network Analysis

  • Traditional features for fraud detection
  • Creating interesting features based on
    • Time
    • Frequency
    • Recency
  • Social Network Analysis
    • Social network components and characteristics
    • Is Fraud a social phenomenon?
    • Social Network Metrics
    • Adding features based on social networks

Chapter 4: Dealing with imbalanced datasets

  • Random oversampling (ROS) of minority class
  • Random undersampling (RUS) of majority class
  • Synthetic Minority Over-sampling Techniques (SMOTE & MWMOTE)
  • Diversified Sensitivity-based Under-Sampling (DSUS) or cluster-based under-sampling (CLUS)
  • Combining over- and under-sampling

Chapter 5: Supervised techniques for fraud detection

  • Linear and logistic regression
  • Decision trees and ensemble methods
  • Neural networks
  • Evaluating fraud detection models

Chapter 6: Unsupervised techniques for fraud detection

  • Digit analysis using Benford’s Law
  • Multivariate outlier detection using robust statistics
  • Clustering approaches
  • Dimension reduction techniques
👩‍🏫 Lecturers

Prof. dr. Tim Verdonck
Professor at University of Antwerp

Prof. dr. Bart Baesens
Professor at KU Leuven

🏢 Location

Van der Valk Hotel Brussels Airport (Belgium)

Culliganlaan 4b
1831 Diegem

🏫 Organizer

Bart Baesens

💼 Register

The price is 1,000 Euro (VAT Exclusive) for both days. This includes:

  • a copy of the course material (including R  code, and toy fraud data sets)
  • lunches and coffees

Online registration

Price and Registration

The price is 1,000 Euro (VAT Exclusive) for both days. This includes:

Please register through the link below. After processing your payment, you'll be sent a confirmation e-mail to confirm your registration.

Online registration