HOCR To ALTO Save

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

Project README

hOCR-to-ALTO

Convert between Tesseract hOCR and ALTO XML 2.0/2.1/3/4 using XSL stylesheets

The XSLT scripts use XSLT 2.0 features - so a XSLT 2.0 capable transformer is required - ie. Saxon

Running the conversion using Saxon-HE command line - example converting ALTO to hOCR:

 > java -jar saxon-he.jar -s:input-alto.xml -xsl:alto__hocr.xsl -o:output-hocr.xml

See ocr-fileformat for an interface to using these stylesheets.

hOCR-spec https://github.com/kba/hocr-spec

File naming scheme: sourceFormatVersion__targetFormatVersion.xsl

CONTENTS

Open Source Agenda is not affiliated with "HOCR To ALTO" Project. README Source: filak/hOCR-to-ALTO
Stars
50
Open Issues
0
Last Commit
1 month ago
Repository
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating