Brée, Sandra
ORCID: https://orcid.org/0000-0002-2802-5563, Gay, Victor
ORCID: https://orcid.org/0000-0001-9912-3841, Leturcq, Marion
ORCID: https://orcid.org/0000-0003-2243-1760, Doignon, Yoann
ORCID: https://orcid.org/0000-0002-7383-3009 and Coulmont, Baptiste
(2026)
POPP. An OCR-Generated Database of the Population Censuses of Paris (1926–1936).
Historical Life Course Studies, vol. 16.
pp. 3-28.
Preview |
Text
["lib/citation/licence" not defined] Creative Commons Attribution. Download (2MB) | Preview |
Abstract
Empirical research in historical demography is usually time-consuming and labour-intensive. Recent developments in machine learning offer new possibilities for building very large databases with reduced time and costs, though these new methods raise new challenges as well. This article describes the process of constructing the POPP database, a data collection project based on the exploitation of the nominative lists of the Parisian population censuses of 1926, 1931, and 1936. This database provides a host of information for almost 9 million individuals: their name and surname, year and location of birth, nationality, relation to the household head, and occupation. The article discusses the digitisation of archival sources — several hundred thousand handwritten pages — their transformation into a database by computer scientists using machine learning techniques, and the work required on the part of social scientists to correct and adapt the resulting data for statistical purposes. Beyond its methodological contribution, this article also discusses the various ways in which the POPP database will improve our knowledge of the economic, social, and demographic evolution of an important European urban population.
| Item Type: | Article |
|---|---|
| Language: | English |
| Date: | February 2026 |
| Refereed: | Yes |
| Place of Publication: | Amsterdam |
| Uncontrolled Keywords: | Database, Census, Machine learning, Artificial Intelligence, Paris, France, Interwar |
| JEL Classification: | C82 - Methodology for Collecting, Estimating, and Organizing Macroeconomic Data H22 - Incidence N01 - Development of the Discipline - Historiographical; Sources and Methods N43 - Europe - Pre-1913 |
| Subjects: | B- ECONOMIE ET FINANCE |
| Divisions: | TSE-R (Toulouse) |
| Ecole doctorale: | Toulouse School of Economics (Toulouse) |
| Site: | UT1 |
| Date Deposited: | 17 Feb 2026 10:09 |
| Last Modified: | 18 Feb 2026 10:35 |
| OAI Identifier: | oai:tse-fr.eu:131444 |
| URI: | https://publications.ut-capitole.fr/id/eprint/52094 |

Tools
Tools
