It seems like with the current progress in ML models, doing OCR should be an easy task. After all, recognizing handwritten numbers was one of the prime benchmarks for image recognition (MNIST was released in 1994).
Yet, when I try to OCR any of my handwritten notes all I ever get is a jumbled mess of nonsense. Am I missing something, is my handwriting really that atrocious or is it the models?
Here’s a quick example, a random passage from a scientific article:
I tried EasyOCR, Tesseract, PPOCR and a few online tools. Only PPOCR was able to correctly identify the numbers and the words “J.” and “Chem.”. The rest is just a random mess of characters.
Edit: thank you all for shitting on my handwriting. That was not asked for, and also not helpful. That sample was intentionally “not nice” but is how I would write a note for myself. (You should see how my notes look like when I don’t need to read them again, lol)
chatGPT can transcribe it perfectly, and also works on a slightly larger sample. Deepseek works ok-ish but made some mistakes, and gemini is apparently not available in my country atm. I guess the context awareness is what makes those models better in transcription, and also why I can read it back without problems.
I just asked chatGPT to transcribe it and it said
There was a post on HN recently about using LLMs for OCR. https://news.ycombinator.com/item?id=42952605
That’s perfect. Now I’m just wondering why chatGPT is apparently much better in OCR than a dedicated OCR model like EasyOCR or Tesseract.
Btw, Deepseek did a good job but not perfect. I also fed chatGPT a full page of notes and the transcription to markdown worked quite well, although not perfect. However, if I supply the same note as part of a larger pdf, it will refuse to transcribe it, stating that it’s unreadable.
Because LLMs can fill in gaps where the recognition fails.
Which can be problematic. If it makes a mistake and isn’t obviously wrong, that could go unnoticed.
100% agreed. But it doesn’t change the answer of why they are apparently better than OCR.
Yep
Try gemini 2 it seems is pretty good at that as well