Read and extract text and other content from PDFs in C# (port of PDFBox)
OCR engine for all the languages
Document Layout Analysis resources repos for development with PdfPig.
Conversions between various OCR formats
An OCR evaluation tool
ALTO XML schema - latest and all former versions
Text Overlay plugin for Mirador 3
Python tools for performing various operations on ALTO XML files
Kitodo.Presentation is a feature-rich framework for building a METS- or ...