Tesseract[0] is the classic example. There's a bunch of advice for improving your accuracy with it, like making your images larger (literally just scale it up x2 or x4).
It would be interesting to the benchmark from the article repeated with different scaling options (or other preprocessing, depending on platform).
Running tesseract (4.0.0 using the LSTM engine) on the same images leaves a lot to be desired for handwriting, but does well on the (non-handwriting) website image (the source images are linked in the "OCR Image Processing Results" section).
Tesseract works really good for literal OCR but I haven't had much luck before with more common work (like documents with tables and such). Has anything changed as of late?
Since October there is a new version, v4: “Added a new OCR engine that uses neural network system based on LSTMs, with major accuracy gains. [..] Added trained data that includes LSTM models to 123 languages.”
No idea how well it does for structured content like tables.
There seems to be a recent v2 of a javascript port (i.e. Tesseract v4 compiled to wasm), if anyone wants to do OCR in their browser:
It would be interesting to the benchmark from the article repeated with different scaling options (or other preprocessing, depending on platform).
[0]: https://github.com/tesseract-ocr/tesseract