Bart Baesens

← Back to courses

Machine Learning Essentials

📅 April 16th-17th, 2020
🌍 English

About this course

In this course, participants learn the essentials of Machine Learning. We start with an introduction to machine learning and its applications. We then discuss data preprocessing and feature engineering.  Both are essential steps to build high-performing machine learning models.  This is followed by introducing the basic concepts of regression and classification. We then discuss how to measure the performance of predictive analytics techniques.  Next, we zoom in on association rules, sequence rules and clustering.  We then elaborate on advanced machine learning techniques such as neural networks and ensemble models.  A next section reviews variable selection.  We extensively discuss machine learning model interpretation and deployment.  The course concludes by highlighting some machine learning pitfalls. The course provides a sound mix of both theoretical and technical insights, as well as practical implementation details. These are illustrated by several real-life case studies and examples. The course also features code examples in both R and Python. Throughout the course, the instructor also extensively reports upon his research and industry experience.

Requirements

Before subscribing to this course, you should have a basic understanding of descriptive statistics (e.g., mean, median, standard deviation, histograms, scatter plots, etc.) and inference (e.g., confidence intervals, hypothesis testing). Previous R and Python experience is helpful but not necessary.

Outline

Day 1

Introduction

  • Instructor team
  • Our Machine Learning Publications
  • Software
  • R/Python tutorials
  • Data sets
  • Disclaimer

Introduction to Machine Learning

  •  Machine Learning
  •  Machine Learning Examples
  •  Machine Learning Process Model
  •  Types of Machine Learning
  •  Quiz

Data Preprocessing

  • Motivation
  • Types of data
  • Types of variables
  • Denormalizing data
  • Sampling
    • Sampling in R
    • Sampling in Python
  • Visual data exploration
    • Visual data exploration in R
    • Visual data exploration in Python
  • Descriptive statistics
  • Missing values
    • Missing values in R
    • Missing values in Python
  • Outliers
    • Outliers in R
    • Outliers in Python
  • Categorization
    • Categorization in R
    • Categorization in Python
  • WOE and IV
    • WOE and IV in R
    • WOE and IV in Python
  • Quiz

Feature Engineering

  • Feature Engineering Defined
  • RFM features
  • Trend features
  • t-SNE
  • UMAP
  • Quiz

Regression

  • Linear Regression
    • Linear Regression in R
    • Linear Regression in Python
  • High Dimensional Data
  • Ridge Regression
    • Ridge Regression in R
    • Ridge Regression in Python
  • LASSO Regression
    • LASSO Regression in R
    • LASSO Regression in Python
  • Elastic Net
    • Elastic net in R
    • Elastic net in Python
  • Principal Component Regression
  • Partial Least Squares (PLS) regression
  • Generalized Linear Models (GLMs)
  • Generalized Additive Models (GAMs)

Classification

  • Linear Regression
  • Logistic Regression
    • Logistic Regression in R
    • Logistic Regression in Python
  • Decision trees
    • Decision trees in R
    • Decision trees in Python
  • Multiclass classification
  • One versus One coding
  • One versus All coding coding
  • Multiclass decision trees

Measuring the performance of predictive analytics techniques

  • Performance measurement
  • Split sample method
  • Cross-validation
  • Single sample method
  • Performance measures for classification
  • Confusion matrix (classification accuracy, classification error, sensitivity, specificity)
  • ROC curve and area under ROC curve
    • ROC curve in R
    • ROC curve in Python
  • CAP curve and Accuracy Ratio
  • Lift curve
  • Performance measures for regression
  • Quiz

Day 2

Association and Sequence Rules

  • Association Rules
  • Support and Confidence
  • Association rule mining
    • Association rule mining in R
    • Association rule mining in Python
  • Lift
  • Association rule extensions
  • Post-Processing Association Rules
  • Association rules applications
  • Sequence rules
  • Quiz

Clustering Techiques

  • Hierharchical clustering
    • Hierarchical clustering in R
    • Hierarchical clustering in Python
  • K-means clustering
    • K-means clustering in R
    • K-means clustering in Python
  • DBSCAN
    • DBSCAN in R
    • DBSCAN in Python
  • Evaluating clustering solutions
  • Quiz

Neural Networks

  • Neural Networks
    • Neural Networks in R
    • Neural Networks in Python
  • Deep Learning Neural Networks
  • Opening Neural Network Black Box
  • Variable Selection
  • Rule Extraction
  • Quality of Extracted Rule Set
  • Rule Extraction Example
  • Two-Stage Model
  • Quiz

Ensemble Methods

  • Ensemble methods
  • Bootstrapping
  • Bagging
    • Bagging in Python
  • Boosting
  • Adaboost in Python
  • Random Forests
    • Random Forests in R
    • Random Forests in Python
  • XGBoost
    • XGBoost in R
    • XGBoost in Python
  • Quiz

Variable Selection

  • Variable selection
  • Filter methods (gain, Cramer’s V, Fisher score)
    • Cramer’s V in R
    • Cramer’s V in Python
    • Information Value in R
    • Information Value in Python
  • Forward/Backward/Stepwise regression
    • Forward/Backward/Stepwise in R
  • BART: Backward Regression Trimming
    • BART variable selection in R
  • Criteria for variable selection
  • Quiz

Model interpretation

  • Model interpretation
  • Feature Importance
  • Permutation based feature importance
  • Partial dependence plots
    • Partial dependence plots in Python
  • Individual conditional expectation (ICE) plots
    • ICE plots in Python
  • Visual analytics
  • Decision tables
  • LIME
    • LIME in Python
  • Shapley values
    • Shapley values in Python

Model deployment

  • Model deployment
  • Model governance
  • Model ethics
  • Model documentation
  • Model backtesting
  • Model benchmarking
  • Model stress testing
  • Privacy and security
  • Quiz

Machine Learning Pitfalls

  • Sample bias
  • Model risk
  • Deep everything
  • Leader versus follower
  • Complexity versus trust
  • Statistical myopia
  • Profit Driven Machine Learning
  • Quiz
👩‍🏫 Lecturers

Prof. dr. Seppe vanden Broucke
Assistant professor at UGent

🏢 Location

Van der Valk Hotel Brussels Airport (Belgium)

Culliganlaan 4b
1831 Diegem
Belgium
hotelbrusselsairport.com

🏫 Organizer

Bart Baesens

💼 Register

The price is 1000 Euro (VAT Exclusive) and includes:

  • A digital copy of the course notes
  • R and Python tutorial
  • R and Python code scripts
  • Hands-on Python notebooks
  • Data sets to experiment with
  • Lunches and coffee

Online registration


Price and Registration

The price is 1000 Euro (VAT Exclusive) and includes:

Please register through the link below. After processing your payment, you'll be sent a confirmation e-mail to confirm your registration.

Online registration