I am trying to extract data from studies but I have a dilemma. In studies there is a unique format. The first half of the page is on the left side of the page and the second half is on the right side. You are supposed to read all paragraphs on the left from the top down and then read all paragraphs on the right from the top down. There is a line of blank white space running down from the top-middle to the bottom-middle separating the two halves of the page. However, when I try to extract a specific paragraph with sensible the app extracts text from top to bottom and goes over the white line. So what happens is the writing from paragraphs that are horizontal to each other (one on the right side of the page and one on the left) gets combined together and therefore becomes unreadable.
So how do I stop this issue from happening: Is there a way that I can make paragraphs that are on the left not be mixed together with paragraphs that are on the right? Alternatively, maybe there is a way to quickly modify pdfs so that they don’t have that kind of formatting and have the formatting of a google doc and can still be uploaded to sensible?
I will write the code that I used to extract data incase that is helpful:
{
"fields": [
{
"id": "rent_topic_paragraphs",
"anchor": {
"match": {
"type": "first"
}
},
"method": {
"id": "topic",
"numParagraphs": 1,
"terms": [
"pay",
"leesee",
"rent",
"dollars"
]
}
}
]
}