Gadat, Sébastien and Villeneuve, Stéphane (2023) Parsimonious Wasserstein Text-mining. TSE Working Paper, n. 23-1471, Toulouse

[thumbnail of wp_tse_1471.pdf]
Preview
Text
Download (1MB) | Preview

Abstract

This document introduces a parsimonious novel method of processing textual data based on the NMF factorization and on supervised clustering withWasserstein barycenter’s to reduce the dimension of the model. This dual treatment of textual data allows for a representation of a text as a probability distribution on the space of profiles which accounts for both uncertainty and semantic interpretability with the Wasserstein distance. The full textual information of a given period is represented as a random probability measure. This opens the door to a statistical inference method that seeks to predict a financial data using the information generated by the texts of a given period.

Item Type: Monograph (Working Paper)
Language: English
Date: September 2023
Place of Publication: Toulouse
Uncontrolled Keywords: Natural Language Processing, Textual Analysis, Wasserstein distance, clustering
Subjects: B- ECONOMIE ET FINANCE
Divisions: TSE-R (Toulouse)
Institution: Université Toulouse Capitole
Site: UT1
Date Deposited: 25 Sep 2023 08:45
Last Modified: 25 Sep 2023 08:45
OAI Identifier: oai:tse-fr.eu:128497
URI: https://publications.ut-capitole.fr/id/eprint/48255
View Item

Downloads

Downloads per month over past year