Is there a possibility to do some OCR on provided region so I can parse some numbers in an image?

Hi there!
I encountered a weird case, that some “strings” in pdf file are not really strings, but images or whatever.
image

(cross posted from Slack by admin)

Yes, you can do selective OCR using a preprocessor, but at the page level, not at the level of granularity of a region in a page. Still, if you’re only OCRing 1 page, that should still be OK for performance. Here’s an example.
Let’s say your image containing the numbers is preceded by a heading, “CURRENT POLICY LIMITS”, on the same page as the image. You could write the following to OCR the page on which that heading occurs:

  "preprocessors": [
    {
      "type": "ocr",
      "match": {
        "type": "equals",
        "text": "CURRENT POLICY LIMITS",
        "isCaseSensitive": true
      },
      "pageOffset": 0
    }
  ],
  "fields": []