Ruiz-Gazen, Anne, Thomas-Agnan, Christine, Laurent, Thibault and Mondon, Camille (2022) Detecting outliers in compositional data using invariant coordinate selection. TSE Working Paper, n. 22-1320, Toulouse.

[thumbnail of wp_tse_1320.pdf]
Download (1MB) | Preview


Invariant Coordinate Selection (ICS) is a multivariate statistical method introduced by Tyler et al. (2009) and based on the simultaneous diagonalization of two scatter matrices. A model based approach of ICS, called Invariant Coordinate Analysis, has already been adapted for compositional data in Muehlmann et al.(2021). In a model free context, ICS is also helpful at identifying outliers (Nordhausen and Ruiz-Gazen, 2022). We propose to develop a version of ICS for outlier detection in compositional data. This version is first introduced in coordinate space for a specific choice of ilr coordinate system associated to a contrast matrix and follows the outlier detection procedure proposed by Archimbaud et al. (2018a). We then show that the procedure is independent of the choice of contrast matrix and can be defined directly in the simplex. To do so, we first establish some properties of the set of matrices satisfying the zero-sum property and introduce a simplex definition of the Mahalanobis distance and the one-step M-estimators class of scatter matrices. We also need to define the family of elliptical distributions in the simplex. We then show how to interpret the results directly in the simplex using two artificial datasets and a real dataset of market shares in the automobile industry.

Item Type: Monograph (Working Paper)
Language: English
Date: March 2022
Place of Publication: Toulouse.
Divisions: TSE-R (Toulouse)
Institution: Université Toulouse 1 Capitole.
Site: UT1
Date Deposited: 21 Mar 2022 10:57
Last Modified: 31 Aug 2023 08:07
OAI Identifier:
View Item


Downloads per month over past year