Hi, we have some issues with this expiration date in this acord 25 document:
We receive expiration date as 1/6/2022 instead 11/6/22
here’s how we’re capturing expiration dates:
{
"id": "coverages.policy_expiration_date_1",
"type": "date",
"method": {
"id": "box",
"position": "right",
"offsetBoxes": {
"direction": "below",
"number": 1
},
"tiebreaker": ">",
},
"anchor": {
"start": {
"text": "coverages",
"type": "startsWith"
},
"match": {
"text": "policy exp",
"type": "startsWith"
},
"end": {
"text": "certificate holder",
"type": "startsWith"
}
}
It looks like the images of the text are somewhat pixelated. Probably as a result the OCR output is imperfect and it’s inserting a gap in the date. Regardless of the cause, there’s a small (but big enough) gap between the 1s in the 11 that our Date type is not recognizing the first 1 as part of the month.
You can solve this using the whiteSpaceFilter to ignore the OCR-outputted gap; I recommend you add that to all your date-type fields for this document. For example adding "whitespaceFilter": "spaces"
to your expiration date field looks like this:
{
"id": "coverages.policy_expiration_date_1",
"type": "date",
"method": {
"id": "box",
"position": "right",
"offsetBoxes": {
"direction": "below",
"number": 1
},
"tiebreaker": ">",
"whitespaceFilter": "spaces"
},
"anchor": {
"start": {
"text": "coverages",
"type": "startsWith"
},
"match": {
"text": "policy exp",
"type": "startsWith"
},
"end": {
"text": "certificate holder",
"type": "startsWith"
}
}