Hi! We don’t have a purpose-built method to accomplish exactly what you’re describing, but there’s probably a way to do it using our existing methods.
It looks to me like your ‘word’ headings (aback, abalienate, etc) are in a larger font than their accompanying text. So you could perhaps use the Sections method to segment each record with something like this:
{
"fields": [],
/* each section is a word + definitions/data */
"sections": [
{
"id": "dictionary_entry",
"range": {
"anchor": {
/* each section starts with a large font */
"match": {
"type": "regex",
"pattern": ".+",
/* in Sensible app, click on lines to see their height */
"minimumHeight": 0.18
}
}
},
"fields": [
/* the word being defined (1st line in section) */
{
"id": "defined_word",
"anchor": {
"match": {
"type": "first"
}
},
"method": {
"id": "passthrough"
}
},
/* grab everything to the right
of the defined word (1st line in section) */
{
"id": "ccm",
"anchor": {
"match": {
"type": "first"
}
},
"method": {
"id": "row"
}
},
/* grab definitions: each is a paragraph starting with # */
{
"id": "definitions",
"match": "all",
"anchor": {
"match": {
"type": "regex",
"pattern": "^[0-9]"
}
},
"method": {
"id": "paragraph"
}
},
/* fallback field if there's only one un-numbered definition */
{
"id": "definitions",
"anchor": {
"match": {
"type": "regex",
"pattern": "^.+"
}
},
"method": {
"id": "paragraph"
}
},
/* for troubleshooting/to illustrate section range, output all text in this section */
{
"id": "_everything_in_this_section",
"method": {
"id": "documentRange",
"includeAnchor": true
},
"anchor": {
"match": {
"type": "first"
}
}
}
]
}
]
}
Let me know if that works for you after you’ve reconfigured it for your specific situation (font size, etc)!
It should give you output like the following for each record:
{
"dictionary_entry": [
{
"defined_word": {
"type": "string",
"value": "abalienated"
},
"ccm": null,
"definitions": [
{
"type": "string",
"value": "1 (obsolete) caused mental aberration"
},
{
"type": "string",
"value": "2 in civil law transferred land title"
}
],
"_everything_in_this_section": {
"type": "string",
"value": "abalienated 1 (obsolete) caused mental aberration 2 in civil law transferred land title"
}
},
{
"defined_word": {
"type": "string",
"value": "another word"
},
"ccm": "blah",
"definitions": [
{
"type": "string",
"value": "1 def 1"
},
{
"type": "string",
"value": "2 def 2"
}
],
"_everything_in_this_section": {
"type": "string",
"value": "blah blah blah"
}
}
]
}
To output to Excel, there are a couple things you can do – you can take advantage of Sensible’s native excel output (for more info see Quickstart PDF to Excel and SenseML to spreadsheet reference ). To get your columns just right you may need to use a computed field or advanced computed field method.
Or you can use a Zapier integration