eprintid: 48255 rev_number: 11 eprint_status: archive userid: 1482 importid: 105 dir: disk0/00/04/82/55 datestamp: 2023-09-25 08:45:47 lastmod: 2024-11-04 12:18:51 status_changed: 2024-11-04 12:18:51 type: monograph metadata_visibility: show creators_name: Gadat, Sébastien creators_name: Villeneuve, Stéphane creators_id: sebastien.gadat@tse-fr.eu creators_id: stephane.villeneuve@tse-fr.eu creators_idrefppn: 080889433 creators_idrefppn: 061507105 creators_affiliation: Toulouse School of Economics;Institut Universitaire de France creators_affiliation: Toulouse School of Economics creators_halaffid: 1002422 creators_halaffid: 1002422 title: Parsimonious Wasserstein Text-mining ispublished: pub subjects: subjects_ECO abstract: This document introduces a parsimonious novel method of processing textual data based on the NMF factorization and on supervised clustering withWasserstein barycenter’s to reduce the dimension of the model. This dual treatment of textual data allows for a representation of a text as a probability distribution on the space of profiles which accounts for both uncertainty and semantic interpretability with the Wasserstein distance. The full textual information of a given period is represented as a random probability measure. This opens the door to a statistical inference method that seeks to predict a financial data using the information generated by the texts of a given period. date: 2023-09 date_type: published publisher: TSE Working Paper official_url: http://tse-fr.eu/pub/128497 faculty: tse divisions: tse keywords: Natural Language Processing keywords: Textual Analysis keywords: Wasserstein distance keywords: clustering language: en has_fulltext: TRUE view_date_year: 2023 full_text_status: public monograph_type: working_paper series: TSE Working Paper volume: 23-1471 place_of_pub: Toulouse pages: 20 institution: Université Toulouse Capitole department: Toulouse School of Economics book_title: TSE Working Paper oai_identifier: oai:tse-fr.eu:128497 harvester_local_overwrite: department harvester_local_overwrite: pending harvester_local_overwrite: creators_idrefppn harvester_local_overwrite: creators_halaffid harvester_local_overwrite: abstract harvester_local_overwrite: place_of_pub harvester_local_overwrite: creators_affiliation harvester_local_overwrite: institution harvester_local_overwrite: pages harvester_local_overwrite: creators_id oai_lastmod: 2024-04-29T08:30:06Z oai_set: tse site: ut1 citation: Gadat, Sébastien and Villeneuve, Stéphane (2023) Parsimonious Wasserstein Text-mining. TSE Working Paper, n. 23-1471, Toulouse document_url: https://publications.ut-capitole.fr/id/eprint/48255/1/wp_tse_1471.pdf