Detecting Anomalous Data Cells

Home - Seminar - Detecting Anomalous Data Cells

List upcoming seminars by location
Amsterdam
Rotterdam

List past seminars by location
Amsterdam
Rotterdam

:-)

Rotterdam Seminars Econometric Institute

Speaker(s): Peter Rousseeuw (KU Leuven, Belgium)
Date: Thursday, May 12, 2016
Location: Rotterdam

A multivariate dataset consists of n cases in d dimensions, and is often stored in an n by d data matrix. It is well-known that real data may contain outliers. Depending on the circumstances, outliers may be (a) undesirable errors which can adversely affect the data analysis, or (b) valuable nuggets of unexpected information. In statistics and data analysis the word outlier usually refers to a row of the data matrix, and the methods to detect such outliers only work when at most 50% of the rows are contaminated. But often only one or a few cell values in a row are outlying, and they may not be found by looking at each variable (column) separately. We propose the first method to detect cellwise outliers in the data which takes the correlations between the variables into account. It has no restriction on the number of contaminated rows, and can deal with high dimensions. Other advantages are that it provides estimates of the ‘expected’ values of the outlying cells, while imputing the missing values at the same time. We illustrate the method on several real data sets, where it uncovers more structure than found by purely columnwise methods or purely rowwise methods.

This is joint work with Wannes Van den Bossche of the KU Leuven.

Tinbergen

Local navigation

Article

Detecting Anomalous Data Cells

Rotterdam Seminars Econometric Institute

Related Content

Contact seminars Rotterdam

Private: Seminars Rotterdam

Contact seminars Amsterdam

Private: Christina Månsson

Contact TI lectures