DECIDON — NER on parliamentary debates
Research internship (ANR project DECIDON): named-entity recognition on Third Republic parliamentary debates.
Objective
Research internship (May – December 2026) on the ANR project DECIDON (EPITA · ENC · EHESS). The project carries out a computational study of parliamentary life (verbatim records of parliamentary debates) under the Third Republic (1870-1940): how public problems circulate between the press and Parliament, and agenda-setting mechanisms. My contribution focuses on named-entity recognition (NER) (WP3) applied to digitized, OCR-processed parliamentary debates.
Annotated entities
Speakers, functions, mentioned persons, section titles, stage directions.
Current state & Next steps
- Designing the annotation guide and schema, and coordinating the annotation effort (Label Studio).
- Annotated ground-truth corpus; several NER pipelines compared (regex, spaCy-CNN, BERTs, GLiNER2, GLiNER-bi-V2, LLM). No final choice yet.
- OCR sources: PERO OCR and a vision-language model (Chandra) being integrated.
- Next: scaling up over the corpus, then event / geographic-entity detection.
Resources
- Initial public repository of the project: decidon-ner.
Progress log
- 05/2026: internship begins; annotation schema design and resumption of annotation.
- 06/2026: benchmark of several NER approaches (rules, spaCy, BERT, GLiNER, LLM).
Comments