1. 11
  1.  

  2. 3

    I find the idea intriguing since I try to do something similar for my scanned documents. Would be nice if there was an example workflow with “screenshots” so I could see what results can be generated from what input. Maybe I will take a look at it if I can find out how to set the whole thing up without docker (even thought I’m not a fan of node.js)

    1. 2

      Yeah, that would be a good idea. I found that there is a description of the output in /docs: https://github.com/axa-group/Parsr/blob/master/docs/json-output.md and there is an install guide if you do not want to use docker. I tried it with docker though and it worked well.

      What is cool is that it unifies outputs from OCRs and you can swap them on the fly.

      1. 2

        What is cool is that it unifies outputs from OCRs and you can swap them on the fly.

        Yeah in my hack I try to compare the outputs to find the “correct” transcriptions, but it’s mostly an ugly hack. M workflow is like the ocropy workflow, but the resulting snippets get translated with different engines. In my case I found I get the best results with tesseract so the other engines are more white noise, but it could be because of my preparation steps…. Oh and I can’t extract tables yet