hOcr2Pdf.NET is a library that programmers can use to create highly compressed, searchable pdf's for applications.

Requirements:
.NET 4.0 or higher
Tesseract 3.0 w/ the ability to produce hOcr files or Cuneiform For Linux
JBig2.exe (included) in the same path as the dll

Major Classes:
PDFDoc (PDFDoc.Open() OR PDFDoc.Create())

Example Usage

Compress PDF to Jbig2.

    PDFDoc doc = PDFDoc.Open(file);
    doc.CompressJBig2()

Get page image (Jbig2 and jpeg2000 pages require Ghostscript to be installed)

    PDFDoc doc = PDFDoc.Open(file);
    doc.GetPageImage(1);

Ocr PDF

    PDFDoc doc = PDFDoc.Open(file);
    doc.Ocr(Utils.OcrMode.Tesseract, "eng", WriteTextMode.Word, null);
 

Create a new PDF

    PDFDoc doc = PDFDoc.Create(file);
    doc.AddPage(img, PageSize.Letter);
    doc.Rotate(...)
    doc.Save()
    doc.Ocr(...)
    doc.Compress(...)
    doc.Save()

Get Object graph of HOCR document

            hDocument d = OcrController.CreateHOCR(OcrMode.Tesseract, "eng", img);
            foreach(var p in d.Pages)
                foreach(var para in p.Paragraphs)
                    foreach(var l in para.Lines)
                        foreach(var w in l.Words)
                            Console.WriteLine(w.Text);

Tips

Be sure and Save() the pdf when using an image format that requires Ghostscript to extract. For example,
if you compress a pdf to jbig2 and then try to ocr it before calling Save() then all bets are off. Save() writes any change to disk so that Ghostscript can access the changed pages for image extraction.