I am working on some loss run templates, and in one example we have, the fields seem to insert random spaces, which causes claims to not be extracted.
For example, in the rendered PDF say that a claim number looks like:
But Sensible extracts this as:
U1 23 4
Is there a way to collapse spaces:
- in an anchor
within an extracted field via preprocessing (“merge fields”?) or something like that?
(cross posted into community by admin)
It sounds like the whitespaces you’re seeing are output by the OCR engine you selected for the document type in the Sensible app. You can try configuring a different OCR engine (amazon, google, or microsoft) in the app, or take the following approaches:
You can use the
whiteSpaceFilter parameter in your method object to account for unpredictable whitespace output. Here are the docs on the method object that include a description for whiteSpaceFilter.
There’s no whiteSpaceFilter for anchors, but you could design a regex statement to account for extra whitespaces. For text anchor matches, we also have a parameter called editDistance to account for poor-quality scans that have poor quality OCR output. This parameter takes an integer value that allows you to set the number of possible edits for a match. Here are the docs on the match object that includes a description of editDistance and a description of regex matching.