Batch OCR with Acrobat

From MagnetoWiki
Jump to: navigation, search

Notes on using Abobe Acrobat 8 Professional to batch OCR documents, such as scanned documents, papers, or reprints.

Steps

  1. Goto Advanced > Document Processing > Batch Processing...
  2. In the Batch Sequence Dialog
    1. Create a New Sequence with the name "OCR to Searchable Exact Image"
    2. In the Edit Batch Sequence Dialog
      1. Select the "Recognize Text using OCR" command, and edit the Command to output "Searchable Image (Exact)".
      2. Under Output Options, the Output Format should be "Adobe PDF Files", with "Fast Web View" checked, but "PDF Optimizer" unchecked. If desired, set a prefix of suffix for output file names.
    3. Now select the new sequence, and press "Run Sequence" and follow the directions. Be sure not to overwrite the original scanned PDFs, in case there is a problem.

Known Problem

  • Acrobat's OCR will crash when it encounters a rotated page, so before batch scanning, make sure all pages are upright.
  • Manually performing OCR with the "Searchable Image (Exact)" Output Style and saving the PDF will double the size of the PDF document. This does not happen if the PDF is "Save As..." with a different name.