Don’t use AI to summarize documents — it’s worse than humans in every way

David Gerard@awful.systems · 7 months ago

Don’t use AI to summarize documents — it’s worse than humans in every way

Lvxferre [he/him] · 7 months ago

All these chatbots do is guess.

To call it “guessing” overestimates its abilities. It’s doing something even dumber - picking words and throwing them into a grammatically consistent whole, with barely any regards to meaning.

But sometimes dumb shit is still useful. Just like my bash scripts.

I’m just saying a researcher might as well cut out the hallucinating middleman.

That’s why I called it a “nirvana fallacy” - it does hallucinate so it is not perfect, nor it’ll ever be (contrariwise to what tech bros want you to have “faith” = brainlessness towards). But by “cutting out the hallucinating middleman” you’re probably ignoring a lot of articles that might be useful for your research. Yes, ignoring them - because if you’re going by title alone you won’t read them, as it’s too much stuff to bother.

I’ll give you a practical example. Suppose for a moment that you need info on the evolution of Hittite and other Anatolian languages. Based on the title alone (as you proposed in an earlier comment), would this article be useful? Probably not - the title doesn’t mention “Anatolian”, “Hittite”, “Luwian”, anything like that. And the article is 30 pages long so might as well skip it.

Now throw “Summarise https://allegatifac.unipv.it/silvialuraghi/Gender FoL.pdf into three paragraphs.” into Gemini.

Here's the output:

The PDF you linked discusses the origin of the gender system in Proto-Indo-European (PIE), the ancestral language to many European and Asian languages. In the early 20th century, all known Indo-European languages seemed to have a three-gender system, including feminine. However, the discovery of Hittite, a language with only two genders and no clear feminine pronouns, challenged this theory.

The paper explores two opposing viewpoints that emerged among Hittitologists and Indo-Europeanists. The first, called the Schwundhypothese, suggests that Hittite lost its feminine gender over time. The other, the Herkunfthypothese, proposes that PIE itself only had a two-gender system, and the feminine arose later in some descendant languages.

The debate continues, with the unearthed evidence from Hittite sparking more questions than answers. The paper delves into the typological considerations of these gender systems, but a definitive explanation for the origin of the Proto-Indo-European gender system remains elusive.

Can you trust the claims within that output? Fuck no, Gemini is likely losing its marbles. (For example, it implies that early PIE had masculine vs. neuter - not quite.) But it mentions Hittite enough times to make you correctly conclude that it’s a topic that the article talks about, and perhaps you should be reading those thirty pages instead of ignoring them.

[Sorry for the fairly specific example. I wanted something that I could talk about, so I used an article that I know by heart about a topic that I know something about.]

queermunist she/her@lemmy.ml · 7 months ago

It just so happened to work out in your very specific example.

Do you think it will always give a summary that is useful to researchers? It obviously won’t! It will, often, work just fine. When sorting through hundreds of documents, though, it will produce sorting errors and that will cause researchers to dismiss important documents by accident.

Maybe that’s fine. Maybe having the occasional error in the summary, which causes the researcher to dismiss the paper erroneously, is better than just guessing based on the title.

But that’s a lot of power to put in the hands of a dumbass chatbot.

Lvxferre [he/him] · 7 months ago

It just so happened to work out in your very specific example.

Emphasis mine. This shit is so trivial to reproduce with other texts, but so trivial, that you’re only making a fool of yourself with this hand-waving.

I’ll provide you further examples, showing that no, it didn’t “just so” happened to work in this “very specific” example. Now using content linked in this comm instead of sci articles so you don’t assume/lie cherry picking later on.

The link in the OP

[Input] Summarise https://pivot-to-ai.com/2024/09/04/dont-use-ai-to-summarize-documents-its-worse-than-humans-in-every-way/ into a single paragraph.

[Output] The article titled “Don’t use AI to summarize documents - it’s worse than humans in every way” argues that Large Language Models (LLMs) are not effective at summarizing documents. The author, Marcelo Soares, believes LLMs are best suited for generating generic text that isn’t particularly interesting and that their attempts at summarization are flawed.

AI and the Silicon Valley Hype Machine - NYT

[Input] Summarise https://www.nytimes.com/2024/05/15/opinion/artificial-intelligence-ai-openai-chatgpt-overrated-hype.html into three paragraphs.

[Output] The New York Times opinion piece titled “Will A.I. Ever Live Up to Its Hype?” by Julia Angwin argues that Artificial Intelligence, particularly OpenAI’s ChatGPT, is falling short of its grand promises.

Just a year ago, researchers worried AI might become too powerful. Now, the concern is whether it’s powerful enough. Angwin criticizes OpenAI’s recent announcement of a faster, cheaper ChatGPT as underwhelming, suggesting it’s not the “magic” improvement that was hyped. She compares AI to an unreliable intern whose mistakes make it easier to do things yourself.

Angwin believes acknowledging AI’s limitations is crucial. It would allow for more efficient resource allocation and a shift towards more realistic solutions. We shouldn’t be fooled by Silicon Valley’s hype machine, she argues.

It shows what I said - you can’t use it for the claims themselves, but it’s a good enough way to know the topic of the article a bit past the title.

Do you think it will always give a summary that is useful to researchers? [emphasis in the original]

Still pretending that either something needs to be perfect or it’s useless (nirvana fallacy).

Maybe that’s fine. Maybe having the occasional error in the summary, which causes the researcher to dismiss the paper erroneously, is better than just guessing based on the title.

What you’re proposing (to guess based on title) leads to more papers being dismissed erroneously. You’re making the problem worse by ignoring the tool than by using it with all its flaws.

And it is not just sci articles. Every bloody time that you have more text than you can reasonably read, those “AI shortened versions” make you pick up something to read that you would not do otherwise.

Since both of us are clearly repeating arguments I’m going to end the discussion from my part here. I’ll still read any potential reply, but I’m not going to reply further myself.