Project Description hOcr2Pdf.NET is a .NET library to create or convert .hocr html produced by Tesseract or Cuneiform into highly compressed searchable pdfs using HtmlAgilityPack, Jbig2 and iTextSharp. It is written in C#.
Features
Simple design. Create or edit pdf files with PDFDoc.Open() or PDFDoc.Create()
Easily add new scanned image pages
Ocr new or existing PDFs
Use different images for OCR and display
Optionally Define fonts to use for OCR output for perfectly underlayed text.
Compress PDFs with Jbig2
Provides common utility methods for searching, rotating, bookmarking, setting attributes such as title, author, etc...