Brée, SandraIdRefORCIDORCID: https://orcid.org/0000-0002-2802-5563, Gay, VictorIdRefORCIDORCID: https://orcid.org/0000-0001-9912-3841, Leturcq, MarionIdRefORCIDORCID: https://orcid.org/0000-0003-2243-1760, Doignon, YoannIdRefORCIDORCID: https://orcid.org/0000-0002-7383-3009 and Coulmont, BaptisteIdRef (2026) POPP. An OCR-Generated Database of the Population Censuses of Paris (1926–1936). Historical Life Course Studies, vol. 16. pp. 3-28.

[thumbnail of gay_52094.pdf]
Preview
Text
["lib/citation/licence" not defined] Creative Commons Attribution.

Download (2MB) | Preview
Identification Number : 10.52024/hlcs18627

Abstract

Empirical research in historical demography is usually time-consuming and labour-intensive. Recent developments in machine learning offer new possibilities for building very large databases with reduced time and costs, though these new methods raise new challenges as well. This article describes the process of constructing the POPP database, a data collection project based on the exploitation of the nominative lists of the Parisian population censuses of 1926, 1931, and 1936. This database provides a host of information for almost 9 million individuals: their name and surname, year and location of birth, nationality, relation to the household head, and occupation. The article discusses the digitisation of archival sources — several hundred thousand handwritten pages — their transformation into a database by computer scientists using machine learning techniques, and the work required on the part of social scientists to correct and adapt the resulting data for statistical purposes. Beyond its methodological contribution, this article also discusses the various ways in which the POPP database will improve our knowledge of the economic, social, and demographic evolution of an important European urban population.

Item Type: Article
Language: English
Date: February 2026
Refereed: Yes
Place of Publication: Amsterdam
Uncontrolled Keywords: Database, Census, Machine learning, Artificial Intelligence, Paris, France, Interwar
JEL Classification: C82 - Methodology for Collecting, Estimating, and Organizing Macroeconomic Data
H22 - Incidence
N01 - Development of the Discipline - Historiographical; Sources and Methods
N43 - Europe - Pre-1913
Subjects: B- ECONOMIE ET FINANCE
Divisions: TSE-R (Toulouse)
Ecole doctorale: Toulouse School of Economics (Toulouse)
Site: UT1
Date Deposited: 17 Feb 2026 10:09
Last Modified: 18 Feb 2026 10:35
OAI Identifier: oai:tse-fr.eu:131444
URI: https://publications.ut-capitole.fr/id/eprint/52094
View Item

Downloads

Downloads per month over past year