Using OmniPage Ultimate on windows 7.
I have a directory tree with a few hundred PDF files that contain images, fairly deeply nested in subdirectories. (This is a one-time project, not a recurring issue, if that matters.) No file is particularly large (2--100 pages each), but I have many files. I would like OmniPage to batch process the whole set, creating a searchable PDF file for each input file, and preserving the directory structure. I don't need to make any manual corrections while the batch is running (no manual proofreading).
I have been working with the "Scanned Document to Searchable PDF" workflow with some success processing small collections. I have not yet tried feeding it the whole set.
First problem: I name an output folder and it gets *every* file; the original subdirectory structure is discarded. The software handles the (inevitable) name collisions gracefully by adding a suffix, but I really would much prefer to keep the tree. Is there a way?
Second problem: the workflow feature seems to load EVERY PAGE into memory. I'm sure this will fail on the big jobs that I need to run because the machine will run out of memory and/or start swapping. Is there a way to tell the workflow it should process one file at a time, and when it's done, move on cleanly to the next file, dropping the pages from memory?
Third problem: the manual proofread-and-correct dialog. I don't want to do any of that. Can I remove that step from the workflow entirely?
Please tell me, do I have to give up on the canned workflow and start writing custom code?
Thanks in advance for all hints, tips, suggestions.
Here are my notes on exactly what I'm doing now, maybe someone can point out where I could do things a bit smarter:
1.Click the “1-2-3” button so it drops down, then pick the workflow named “Scanned Document to Searchable PDF”.
2.Right click the Load Files step button and pick Step properties. Here the task is to select the input folder that is the topmost level of the data set.
a.Uncheck the box at the top “Select files for loading each time..”.
b.Below “Load automatically from..” click Browse button.
c.In the Load Files dialog that appears, navigate to the folder so it’s selected in the window.
d.Click the Advanced button at the bottom so the screen expands.
e.Click Add Selected With Subfolders. This should populate the Files/Folders box.
f.In the Files/Folders box, remove all but the “*\*.pdf” rule. On that PDF rule, make sure the tiny Subfolders checkbox is checked.
h.Click Finish to close the Load Files step properties.
3.Right click the Recognize step button and pick Step properties.
a.Select radio button Optimize the OCR process for Accuracy
b.Be sure the English language is checked.
c.Check the boxes for professional dictionary as needed
d.Click Finish to close the Recognize step properties.
4.Ignore the Correct Recognition Results button.
5.Right click the Save step button and pick Step Properties.
a.Select radio button Save As Multiple.
b.For Naming options, pick Use input file names.
c.Uncheck the box under Prompting (“Prompt for file saving name and location”)
d.Under Save automatically with a specific name and location, click the Specify Location button and browse to a folder, then click OK.
e.Click Finish to close the Save step properties.
6.Finally! Run the workflow.
7.At a certain point it will halt waiting for OCR Correction. Click on the tiny button on the toolbar with two arrows pointing to the right (fast forward). Then it will proceed to save.
I've searched high and low without success - what is the upgrade path and cost from Omnipage 14 Pro to Omnipage 18 Pro?
The only info I've found says upgrade from 16 and 17 is $199.99 but I'm not willing to hope it will accept 14, especially since I will have to switch from a Win2K Pro to a WinXP Pro SP3 machine. (I still have the full retail upgrade package for OP Pro 14.)
I have purchased a new laptop and am trying to install OmniPage 16 on it. When trying to activate it, it tells me that I have too many versions on too many computers. I have uninstalled the version on my old laptop. What do I do to get it to recognize that I only have one installation?
I'm using Windows 7 and at the end of the ocr proofreading the program freezes. I can't get out of the ocr nor close the program normally. I have to use the task manager to close it out. I can only use the scanned material after restarting Omnipage and opening the workload.
SIAP, I've looked for a previous thread and got nowhere.
I'd like to capture the stats in the document manager, and put them into a database. I can select them for copy, but can't paste them anywhere.
Is there a way to get this data other than to look at it and key it? The stats are quite helpful to me.
Do the batches I put through Omnipage for OCR need to consist of only PDF files, or will the program skip the other files of different types (like videos and excel sheets, word docs, etc.). In other words, if I have a large drive of files, and some of them are pdfs, do I need to sort them out first into folders with only the pdf files or can I send through my files "as-is" for the OCR feature. I have about 2TB of files that I need to make searchable but not all are pdfs.
Also, will the output retain folder/subfolder format directory? This is very important. I looked on the discussion forum and saw a couple of posts on this topic and one found a solution for the subfolder creation by using certain settings but another did not.
Thank you for your assistance & advice.
I have a membership card for my shop which has a picture and text details on the side, kinda like an id card ro something like that. The problem is that when I try to use OCR on it in order to change some details on older cards, after OCR only the text is shown (and editable) but the picture and background is nowhere to be seen. All I need is to change some details, any help?
We sporadically have problems with the RecAPI.kRecRecognize procedure throwing a "BAR_ERR" error when attempting to recognize barcodes. When it fails, the page rarely has a barcode on it. The page will also consistently fail when sent through. If we resize the page or alter it in some way, the recognizer no longer fails.
On each page, we do these same initial steps: kRecLoadImg, kRecDetectBlankPage, kRecGetImgInfo, kRecInsertZone, kRecRecognize.
The kRecInsertZone inserts a zone using the full page dimensions (from the kRecGetImgInfo call), with recognition module RM_BAR and filling method FM_BARCODE.
We process thousands of pages without issue, this error has happened on less than 10 pages total.
Does anybody have any suggestions? The error results from the CSDK are completely unhelpful. Just a RECERR.BAR_ERR value is returned, and calling RecAPI.kRecGetLastErrorEx only returns this:
<error code="0x8004cb01" sym="BAR_ERR"/>
Any assistance or troubleshooting ideas would be greatly appreciated.
I bought OmniPage Ultimat with PaperPort pro, and installed for fist test only OmniPage, because scanning and OCR is what I need (first). Problem is, that it does not work as it should, and that already the versioning is very strange: About box shows "Version: 19.0" with (c) 1995-2013, but according to < http://www.nuance.de/for-individuals/by-product/omnipage/index.htm> current version is 18: "OmniPage 18"!?
Most important problems:
1. German spell checkin marks lot of completely correct and very common words as false, which are not marked as false on other pages. And some words are only partly marked, even if I mark all text and set all to same font, same size, style "Regular", automatic colours, no sub- or superscript etc. (see screen shot).
2. Only "PDF searchable image" is usable, all other PDF output is unusable (missing text, different lines "overstriking" each other etc. (see screen shot).
3. Normal text edit leads in many cases to strange changes of layout.
Each installer is complaining that the other application must be installed first and the installation stops