The HTRomance project is anchored to the field of handwriting recognition (HTR). In particular, it proposes to evaluate and improve the capabilities of this technology when applied to literary manuscripts and public and private archives, in Latin and Romance languages, from the 11th to the 19th century, kept at the French National Library (BnF).
The main objective of the project is the production of training data and transcription models resistant to changes in handwriting and language. It also intends to produce language models applicable to documents in ancient languages or to ancient language states. The development of training corpora will be accompanied and consolidated by the development and implementation of a novel process for evaluating the readability of output texts and the costs of producing new training data for HTR. HTRomance is complementary to editing or data mining projects: the models produced are likely to be used to obtain the textual data needed for editing or text mining.
Clérice, T., Chagué, A., Gille-Levenson, M., Brisville-Fertin, O., Pinche, A., Camps, J., Fischer, F., Boschetti, F., Guadagnini, E., Guilhem Couffignal, G., Canteaut, O., Romary, L., Reboul, M., Perreaux, N., Poibeau, T., Smith, M., Norindr, J., Glaise, A., Navas Farré, M., Bordier, J., Leroy, N., Alba, R., & Rubin, G. HTRomance [Data set]. https://htromance-project.github.io/
@misc{Clerice_HTRomance, author = {Clérice, Thibault and Chagué, Alix and Gille-Levenson, Matthias and Brisville-Fertin, Olivier and Pinche, Ariane and Camps, Jean-Baptiste and Fischer, Franz and Boschetti, Federico and Guadagnini, Elisa and Guilhem Couffignal, Gilles and Canteaut, Olivier and Romary, Laurent and Reboul, Marianne and Perreaux, Nicolas and Poibeau, Thierry and Smith, Marc and Norindr, Jade and Glaise, Anthony and Navas Farré, Marina and Bordier, Julie and Leroy, Noé and Alba, Rachele and Rubin, Giorgia}, title = , url = {https://htromance-project.github.io/} }
This project was funded by the Bibliothèque nationale de France through the 2022 project calls from Datalab for 2023.
This project relied on the CREMMA infrastructure.
➡️ Medieval Latin
➡️ Medieval Italian
➡️ Medieval French
➡️ Medieval Occitan
➡️ Middle Ages in Spain
➡️ Modern Roman Languages