Tesseract[0] is the classic example. There's a bunch of advice for improving you...

jordoh · on July 18, 2019

Running tesseract (4.0.0 using the LSTM engine) on the same images leaves a lot to be desired for handwriting, but does well on the (non-handwriting) website image (the source images are linked in the "OCR Image Processing Results" section).

ocrcustomserver · on July 20, 2019

From the Tesseract FAQ:

"Can I use Tesseract for handwriting recognition?

You can, but it won’t work very well, as Tesseract is designed for printed text. Look for projects focused on handwriting recognition."

https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-us...

syntaxing · on July 18, 2019

Tesseract works really good for literal OCR but I haven't had much luck before with more common work (like documents with tables and such). Has anything changed as of late?

jacobolus · on July 19, 2019

Since October there is a new version, v4: “Added a new OCR engine that uses neural network system based on LSTMs, with major accuracy gains. [..] Added trained data that includes LSTM models to 123 languages.”

No idea how well it does for structured content like tables.

There seems to be a recent v2 of a javascript port (i.e. Tesseract v4 compiled to wasm), if anyone wants to do OCR in their browser:

https://observablehq.com/@tmcw/tesseract-js-v2-alpha

https://github.com/naptha/tesseract.js

ocrcustomserver · on July 20, 2019

  (literally just scale it up x2 or x4).

Just to clarify, the input image should be 300dpi.