RT Monograph SR 00 A1 Gadat, Sébastien A1 Villeneuve, Stéphane T1 Parsimonious Wasserstein Text-mining YR 2023 FD 2023-09 VO 23-1471 SP 20 K1 Natural Language Processing K1 Textual Analysis K1 Wasserstein distance K1 clustering AB This document introduces a parsimonious novel method of processing textual data based on the NMF factorization and on supervised clustering withWasserstein barycenter’s to reduce the dimension of the model. This dual treatment of textual data allows for a representation of a text as a probability distribution on the space of profiles which accounts for both uncertainty and semantic interpretability with the Wasserstein distance. The full textual information of a given period is represented as a random probability measure. This opens the door to a statistical inference method that seeks to predict a financial data using the information generated by the texts of a given period. T2 TSE Working Paper PB TSE Working Paper PP Toulouse AV Published LK https://publications.ut-capitole.fr/id/eprint/48255/ UL http://tse-fr.eu/pub/128497