Transforming text into discovery: OCR enrichment of digital collections in the University of Galway Library
Dereza, Oksana ; Rouget, Marie-Louise ; Egwu, Chidi ; Joy, Cillian
Dereza, Oksana
Rouget, Marie-Louise
Egwu, Chidi
Joy, Cillian
Loading...
Files
Publication Date
2025-06-09
Type
conference paper
Downloads
Citation
Dereza, Oksana; Rouget, Marie-Louise; Egwu, Chidi; Joy, Cillian. (2025). Transforming Text into Discovery: OCR Enrichment of Digital Collections in the University of Galway Library. In Proceedings of the 4th Conference on Digital Preservation and processing technology of Written Heritage (DPWH), IEEE Congress on Information Science and Technology (CiSt), Marrakesh, Morocco, 04-10 October. IEEE. [in print]
Abstract
The University of Galway Library has been working on an Optical Character Recognition (OCR) pipeline to transform scanned archival materials into machine-readable text at scale. This significantly enhances the accessibility and searchability of the University’s digitised heritage collections, supporting diverse areas of research interest and fostering deeper engagement with the Library’s holdings.
The paper discusses key aspects of building an OCR pipeline, including the performance of available OCR software on heritage data, pre-processing of digitised images, quality assurance, and converting the OCR engine outputs for DAMS upload and seamless IIIF integration. The pipeline aims at balancing automation with quality control for successful extraction of printed, typewritten and handwritten text. We believe that our experience may help other GLAM institutions that are considering incorporating automatic text extraction into their digital collections workflow.
Publisher
IEEE
Publisher DOI
Rights
CC BY-NC-ND