Robust Analytics for Business Applications using R

📅 June 12th, 2019
🌍 English

Aim

Most analytical tools in business are very sensitive to the presence of contamination in the data. This has led to the emergence of the field of robust statistics, which contains methods and procedures that perform well when the ideal assumptions are satisfied, but at the same time remain reliable when the data deviates from these ideal assumptions. Robust high-breakdown methods have been developed that can reliably estimate the parameters of the postulated model and provide valid predictions when a minority (i.e. less than 50%) of the data deviates from the assumed model. Note that we do not know the proportion of so-called outliers in advance. An important benefit of such a robust analysis is the automatic detection of anomalies in the data. Classical nonrobust methods such as least squares or maximum likelihood techniques try to fit the model optimally to all the observations, both regular and outliers. As a result, these methods are heavily influenced by contamination in the data. Since the nonrobust fit is attracted by the outliers, these observations do not appear as outliers anymore, an effect known as masking. In the worst case the effect of outliers on a nonrobust fit can be so large that regular observations appear to be outlying. This effect is called swamping. On the other hand, robust methods can resist the effect of contamination and therefore allow to detect anomalies as the observations that deviate substantially from the robust fit. It is important to note that the detected anomalies are not necessarily errors in the data. The presence of outliers may reveal that the data is more heterogeneous than previously assumed and cannot be handled by the original statistical model. Outliers can be isolated or may come in clusters, indicating the existence of subgroups in the population that behave differently. A robust analysis can thus provide a deeper and more reliable insight in the structure of the data and reveal structures that would remain hidden in a traditional nonrobust analysis.

Course Objectives

The course Robust Analytics for Business: Techniques and Applications provides actionable guidance on selecting and using the most appropriate robust method for popular data analytical models in business. Combining theoretical and practical insight into robust estimators and their fast numerical algorithms, this course acts as an up-to-date manual for practitioners seeking to study, apply, manage and develop advanced robust tools in their data analysis. High-level and practical explanations and demonstrations of the different robust methodologies lead to an intuitive and clear understanding of the various input parameters, which is necessary to obtain a correct analysis and conclusion. Real-life business examples illustrate the different tools in the robust analytics toolbox at work and show the gain in performance and reliability that is obtained by this robust methodology. These examples also show the effect of contamination on the classical procedures and the resulting decision making process, while correct anomaly detection is achieved using the robust counterparts. With step-by-step instruction on data handling, analytical fine-tuning, and interpreting results, this course provides invaluable guidance for practitioners seeking to reap the advantages of robust business analytics.

Course outline

Chapter 1: Introduction

- Introduction to robust statistics

o Motivation

o Toy example

o Classical versus robust estimators

- Anomaly detection using robust statistics

o Definition of anomalies

o Masking and swamping effects

- Measures of robustness

o Breakdown value

o Sensitivity curve and influence function

Chapter 2: Robust methods for univariate data

- Classical estimators and anomaly detection

- Robust location and scale estimators

- Anomaly detection using robust statistics

o Robust z-scores

o Boxplot and adjusted boxplot

Chapter 3: Robust methods for multivariate location and scatter

- Classical estimators and anomaly detection

o Sample mean and covariance matrix

o Mahalanobis distances and tolerance ellipsoid

- Robust alternative: Minimum Covariance Determinant (MCD) estimator

o Definition and properties

o Computational issues and fast algorithms

o Extensions for high-dimensional data

- Anomaly detection using robust statistics

o Bagplot

o Robust Mahalanobis distances

o Distance-distance plot

Chapter 4: Robust methods for clustering

- Classical estimators

o Agglomeratie versus divisive methods

o K-means clustering

- Robust alternatives

Chapter 5: Robust methods for linear regression

- Classical estimators

o Least squares estimator

o Different types of outliers

o Traditional outlier diagnostics

- Robust alternative: Least Trimmed Squares (LTS) estimator

o Definition and properties

o Fast algorithm

- Anomaly detection using robust statistics

o Outlier map

👩‍🏫 Lecturers

Prof. dr. Tim Verdonck
Professor at University of Antwerp

Prof. dr. Bart Baesens
Professor at KU Leuven

🏢 Location

Van der Valk Hotel Brussels Airport (Belgium)

Culliganlaan 4b
1831 Diegem
Belgium
hotelbrusselsairport.com

🏫 Organizer

Bart Baesens

💼 Register

This course is in the past, registration is no longer possible.

Price and Registration

This course is in the past, registration is no longer possible.

Bart Baesens

Robust Analytics for Business Applications using R

📅 June 12th, 2019 🌍 English

Aim

Course Objectives

Course outline

👩‍🏫 Lecturers

🏢 Location

🏫 Organizer

💼 Register

Price and Registration

📅 June 12th, 2019
🌍 English