SmartOCR auto rotating documents? Trying to read vertical text on edge of page

  • Last Post 5 weeks ago
Adrian Pernice posted this 11 July 2022


Im setting up a workflow that need to read a form identifying type orientated vertically along the side of form. I am using SmartOCR module to read form type via regex, but it will never match the vertical text even if i rotate the original before import, or using the image processing module to rotate.

I have tried with tif and pdf originals, both orientated as original and in landscape- so that the vertical text is correct along the top of the page, (with the main form text being sideways) and each time it appears to search the document and only match text against regex if it appears elsewhere in the main text body on the form.

The only time I can get it to match the desired text is when I crop the rest of the form out and only have the text I wish to match, but this will not work for my application.

I hope you can assist.


luigi.zurolo posted this 5 weeks ago

Hi Adrian,

this is a normal behavior with the Smart OCR module because the rules are evaluated based on the full text of the OCR in order to locate the patterns and the position of the requested rule.

That vertical text might be missing in the result of the full OCR or found with a not usual position scheme (being vertical) hence not matching the rules provided in the Smart OCR module.

You may check if the sentence appears in the OCRTEXT variable or in a S-PDF, it is not the same exact OCR result but it gives an indication of what the OCR engine extracts at document level.

Moreover about the rotation there is no specific instruction to do so during Smart OCR but the OCR will auto rotate based on the natural human reading direction in order to match the most rules possible, hence you get it portrait even if you rotate it landscape manually.