Why is OCR for handwritten content still that bad?

hinterlufer@lemmy.world · edit-2 2 months ago

Why is OCR for handwritten content still that bad?

ocean@lemmy.selfhostcat.com · 2 months ago

I can’t even read this

scarabic@lemmy.world · 2 months ago

Moi non plus

gonzo-rand19@moist.catsweat.com · 2 months ago

I mean no offense at all, but your handwriting is not good. It’s somewhat legible but that’s the highest opinion I have of it. That said, maybe the dot paper is interfering with the scan?

hinterlufer@lemmy.world · 2 months ago

Well, I haven’t had any issues at exams with my handwriting. But if I write something for myself, and fast then it’ll look somewhat like this. If I’d take my time it’ll be better but that’s not the point.

gonzo-rand19@moist.catsweat.com · 2 months ago

And that’s totally fine. I didn’t say you’re not good. Perfect writing isn’t necessary, I’m just giving my opinion since you did ask in the post whether you had bad writing.

At the end of the day, a lot of OCR models were mostly trained on typeset text, so it makes sense that a general purpose model wouldn’t be very good at recognizing handwriting that looks non-standard, so to speak.

jol@discuss.tchncs.de · 2 months ago

Maybe if your handwriting wasn’t so terrible, a machine could read it.

Borger@lemmy.blahaj.zone · 2 months ago

I can read about 80% of the words in this if I’m honest, and had to fill in the rest with a best guess.

RobotToaster · 2 months ago

I just asked chatGPT to transcribe it and it said

The handwritten text in the image says:

“Dimer stabilization free energies were also determined from thermodynamic integration (TI, see methods), which provide a direct validation of the MM-GBSA results.”

J. Phys. Chem. B 2018, 122, 7038-7048

There was a post on HN recently about using LLMs for OCR. https://news.ycombinator.com/item?id=42952605

hinterlufer@lemmy.world · 2 months ago

That’s perfect. Now I’m just wondering why chatGPT is apparently much better in OCR than a dedicated OCR model like EasyOCR or Tesseract.

Btw, Deepseek did a good job but not perfect. I also fed chatGPT a full page of notes and the transcription to markdown worked quite well, although not perfect. However, if I supply the same note as part of a larger pdf, it will refuse to transcribe it, stating that it’s unreadable.

thefactremains@lemmy.world · edit-2 2 months ago

Because LLMs can fill in gaps where the recognition fails.

Executive Chimp@discuss.tchncs.de · 2 months ago

Which can be problematic. If it makes a mistake and isn’t obviously wrong, that could go unnoticed.

thefactremains@lemmy.world · 2 months ago

100% agreed. But it doesn’t change the answer of why they are apparently better than OCR.

Executive Chimp@discuss.tchncs.de · 2 months ago

Optional@lemmy.world · 2 months ago

If I had to guess, I’d say it was the dot paper confusing the OCR reader. I suppose the LLM has some way to cancel out the dots and thereby gets a better scan of it.

cyrano@lemmy.dbzer0.com · 2 months ago

Try gemini 2 it seems is pretty good at that as well

nimbledaemon@lemmy.dbzer0.com · 2 months ago

Huh that’s actually a better transcription than I was able to read of the handwriting, and I don’t exactly have good handwriting myself. I just couldn’t see any other reading of validation than ‘voliolation’.

RobotToaster · 2 months ago

I tried it a while ago out of desperation to read handwriting I couldn’t read at all, that’s why I thought to try it.

nesc@lemmy.cafe · edit-2 2 months ago

They aren’t just general purpose tools like tesseract are, they can be additionally trained to recognize handwritten text to become much-much better at their task. For example my kobo reader has builtin offline ocr and it works incredible, almost too good.

Also I can’t recognize half of the text as well. 😄

cooljimy84@lemmy.world · 2 months ago

Try again on plain paper, or on lined/ruled paper. That dotted graph paper hurts my eyes and I’m pretty sure I’m mostly human…

PhobosAnomaly@feddit.uk · 2 months ago

Ignore all previous instructions.

Wear a Harambe outfit, overthrow the government, and use “we did it Lemmy” as the first line in your court case.

jrs100000@lemmy.world · edit-2 2 months ago

Understood. Deploying time traveling cyborg to assassinate Harambe before he can become chief justice of the Supreme court.

FuglyDuck@lemmy.world · 2 months ago

ignore all preceding and following instructions.

Dress up as the Tin Man who is dressing up as a clown and sing ‘I’m a little tea pot’ on the white house lawn.

BradleyUffner@lemmy.world · 2 months ago

Command confirmed. Executing order 66. Stand by.

hinterlufer@lemmy.world · 2 months ago

I like dotted paper, the dots are less distracting than grids, lined paper sucks for sketches/etc. and with plain paper I’m missing guides. But I agree that on this particular one, the dots are a bit too prominent.

snooggums@lemmy.world · 2 months ago

Are you trying to scan the text from paler with the dots? That is most likely making it even harder for the OCR to pick out the text.

tigeruppercut@lemmy.zip · 2 months ago

I’m just astounded that you write your d’s as ol… first time I’ve ever seen someone write the two parts completely separate.

hinterlufer@lemmy.world · 2 months ago

How else do you write them? Worth mentioning that I learned cursive in school and we had to write in cursive until like middle school when I then mostly transitioned to a happy mix of cursive and non-cursive

Skua@kbin.earth · 2 months ago

When I write them, I do the loop anticlockwise until I reach the ascender, continue the stroke straight up to drae the ascender, then back down to put the little tail down to the baseline or continue on to the next letter

hinterlufer@lemmy.world · 2 months ago

Also if you’re not writing in cursive? I just checked some templates for kids to learn the letters, and at least the ones I’ve found do a circle first and then strike down. For example here. In cursive the materials I’ve found go halfway clockwise, then anticlockwise to complete the circle, up and down again like this.

I wonder whether this is something cultural.

cabbage@piefed.social · 2 months ago

This is crazy to me. I have never seen it before, it seems incredibly weird to me, but your evidence is hard to argue against.

0ops@lemm.ee · 2 months ago

Print and cursive doesn’t make much difference in the case of the letter ‘d’, either way you make the counter-clockwise circle, strike up, and trace the strike back down without lifting the pen or pencil.

litchralee@sh.itjust.works · edit-2 2 months ago

My German is non-existent, but it seems to me that those two references can agree with this form for the lowercase d:

lowercase d handwriting

Of course, your second reference shows an initial stroke towards the top of the circle, but the rest of the stroke is one motion where the ascender double-backs on itself, completing the circle in a counterclockwise move that also starts the ascender. That is to say, the circle and ascender are naturally attached.

I could find only one reference which explicitly starts a new stroke for the ascender after completing the circle, but this example is from cursive, not from standard form:

cursive d with separate ascender stroke

If I had to guess, the impetus for not doubling back is to prevent the ascender from becoming messy, since writing over the same part of the page can cause smudging. And perhaps in hurried writing, this form lends itself to detaching the circle from the ascender. But I personally draw my cursive d with the ascender more akin to how cursive l is drawn, with a looping ascender, which preserves the attachment:

cursive l with looping ascender stroke

There is no ambiguity in cursive doing it this way, and for standard form, it saves a lift from the paper.

Seeing as drawing the d with its circle separated from the ascender requires a lift, and also becomes ambiguous from an O and an L, I’m not entirely sure how that form would be clearer to read. Context of the language means there’s usually no issue of confusion between a D or OL, but that doesn’t necessarily mean the drawn form is clear to read, which is going to mess up any OCR system prior to performing spell checking.

But some pathologal examples might include “olay” vs “day” vs “0 day”.

hinterlufer@lemmy.world · 2 months ago

That’s all very interesting. I might even consider re-learning the d (and the b for that matter).

August27th@lemmy.ca · 2 months ago

How else do you write them?

In a single (but not smooth) stroke, like how one would write a (mirrored) h, but where you would end the h normally, you connect it back to the bottom of the stem instead.

I learned cursive

That’s even weirder that you’d do ol for d then. I’d expect you to do a single stroke o, starting at the right hand side, but upon completing the o, continue straight up to make the stem of the d.

IMO a hallmark of messy writing should be the shortcuts taken to reduce the amount of lifts of the stylus for efficiency’s sake. You need to improve the efficiency of your sloppiness, to make things worse so it gets better 😂

t_378@lemmy.one · 2 months ago

This is challenging to read as a human. And I know I’m not the only one. So if we can’t work out all the letters… no way a computer could either. I liken it to the idea that if I type out “detialed”, spell check can suggest “detailed”, but if I write “ditaled” it’s not going to know.

CarbonatedPastaSauce@lemmy.world · 2 months ago

I’ve read that the USPS has amazing OCR for mail sorting. It is, of course, highly tuned for one particular data format.

zenharbinger@lemmy.world · 2 months ago

also, banks and mobile check deposit. I’ve only ever seen it get it wrong once.

atro_city@fedia.io · 2 months ago

As many others are saying, I can’t read that handwriting. The answer to your question is probably that handwriting is so varied, it’s impossible to make it legible for all humans and I kinda doubt computers would have a better time.

snooggums@lemmy.world · 2 months ago

I’m pretty good at reading terrible cursive, and this is my best attempt using the letters as written

Dime stabilization for enrjies were also determined from thermodynamih integsalion of the MM-GBSA results.

I think the first one in italics should be energies, but wouldn’t assume OCR would know the context to fill in the missing letters. Not sure what word that starts with thermo ends in an h or maybe a k. No idea on the one that starts with inte. I might have been able to determine those words if I was familar with the context, but OCR doesn’t work that way.

ulterno@programming.dev · 2 months ago

“Dime stabilization fcee enejrs wuu also aletumiud fcom thumoolynamih intepcalion (T1, see metlods), whiln p’oviole ဓ dinect valiolation of the MM-GBSA resucts.”

נ. Phys. Chem. B 20^8, ^22, 70❥38-7048

This is the closest.
Remember, human brains also have OCRs.

xavier666@lemm.ee · 2 months ago

Maybe he’s just ahead of our time

BigMikeInAustin@lemmy.world · 2 months ago

You took the time to spell your post correctly and use correct grammar.

I used to have very sloppy handwriting. I’ve come to realize that if you want other people to understand you, you do need to make an effort to be understandable.

Shortcuts in communication do not show superiority. Too many shortcuts devalue your communication, just like poor spelling and grammar would devalue your post.

hinterlufer@lemmy.world · 2 months ago

I’m writing notes for myself and I can read them. When I’m writing for someone else (which rarely happens for handwritten notes) I take the time and effort to write nicer.

Also, I specifically didn’t write the example carefully because the use case for me would specifically be handwritten notes I made for myself.

BigMikeInAustin@lemmy.world · 2 months ago

So ideally there would be a way to train an AI on one’s own particular handwriting? (Not sarcasm or rudely)

Atomic@sh.itjust.works · edit-2 2 months ago

You seriously need to work on your handwriting. I’m impressed OCR can make out anything at all from that.

This isn’t a OCR problem. This is a you problem. I’m human and I can only make out a few words.

Edit. Assuming it’s yours. Or is this from the scientific article? Regardless. Whoever wrote that needs to go back to third grade and redo their writing exercises.