Don’t use AI to summarize documents — it’s worse than humans in every way

David Gerard@awful.systems · 8 months ago

Don’t use AI to summarize documents — it’s worse than humans in every way

Lvxferre [he/him] · edit-2 8 months ago

You could use them to know what the text is about, and if it’s worth your reading time. In this situation, it’s fine if the AI makes shit up, as you aren’t reading its output for the information itself anyway; and the distinction between summary and shortened version becomes moot.

However, here’s the catch. If the text is long enough to warrant the question “should I spend my time reading this?”, it should contain an introduction for that very purpose. In other words if the text is well-written you don’t need this sort of “Gemini/ChatGPT, tell me what this text is about” on first place.

EDIT: I’m not addressing documents in this. My bad, I know. [In my defence I’m reading shit in a screen the size of an ant.]

queermunist she/her@lemmy.ml · edit-2 8 months ago

ChatGPT gives you a bad summary full of hallucinations and, as a result, you choose not to read it based on that summary.

Lvxferre [he/him] · 7 months ago

(For clarity I’ll re-emphasise that my top comment is the result of misreading the word “documents” out, so I’m speaking on general grounds about AI “summaries”, not just about AI “summaries” of documents.)

The key here is that the LLM is likely to hallucinate the claims of the text being shortened, but not the topic. So provided that you care about the later but not the former, in order to decide if you’re going to read the whole thing, it’s good enough.

And that is useful in a few situations. For example, if you have a metaphorical pile of a hundred or so scientific papers, and you only need the ones about a specific topic (like “Indo-European urheimat” or “Argiope spiders” or “banana bonds”).

That backtracks to the OP. The issue with using AI summaries for documents is that you typically know the topic at hand, and you want the content instead. That’s bad because then the hallucinations won’t be “harmless”.

queermunist she/her@lemmy.ml · 7 months ago

But the claims of the text are often why you read it in the first place! If you have a hundred scientific papers you’re going to read the ones that make claims either supporting or contradicting your research.

You might as well just skim the titles and guess.

Lvxferre [he/him] · 7 months ago

But the claims of the text are often why you read it in the first place!

By “not caring about the former” [claims], I mean in the LLM output, because you know that the LLM will fuck them up. But it’ll still somewhat accurately represent the topic of the text, and you can use this to your advantage.

You might as well just skim the titles and guess.

Nirvana fallacy.

self@awful.systems · 7 months ago

not reading the fucking sidebar and thinking this is high school debate club fallacy

Lvxferre [he/him] · 7 months ago

not reading the fucking sidebar

Yeah, I get that this is a place to vent. And I get why to vent about this. LLMs and other A"I" systems (with quotation marks because this shite is not intelligent!) are being shoved down every bloody where, regardless of actual usefulness, safety, or user desire. Telling you to put glue on your pizza, to eat poisonous mushrooms, that “cherish” has five letters, that Latin had no [w], that the Chinese are inferior to Westerners.

While a crowd of irrationals tell you “it is intelligent, you can’t prove otherwise! CHRUST IT YOU DIRTY SCEPTIC/INFIDEL/LUDDITE REEEE! LALALA I’M PRETENDING TO NOT SEE THE HALLUCINATION LALALA”.

I also get the privacy nightmare that this shit is. And the whole deal behind “we’re using your content as training data, and then selling the result back to you”. Or that it’s eating electricity like there’s no tomorrow, in a planet where global warming is a present issue.

I get it. I get it all. That’s why I’m here. And if you (or anyone else) think that I’m here for any other reason, by all means, check my profile - you’ll find plenty pieces of criticism against those stupid corporate AI takes from vulture capital. (And plenty instances of me calling HN “Redditors LARPing as Hax0rz”. )

However. Pretending that there’s no use case ever for LLMs is the wrong way to go.

and thinking this is high school debate club fallacy

If calling it “nirvana fallacy” rubs you the wrong way, here’s an alternative: “this argument is fucking stupid, in a very specific way: it pretends that either something is perfect or it’s useless, with no middle ground.”

The other user however does not deserve the unnecessary abrasiveness so I’ll keep simply calling it “nirvana fallacy”.

self@awful.systems · 7 months ago

holy shit, imagine getting a second chance to not be a fucking debatelord and doubling down this hard

off you fuck

David Gerard@awful.systems · 7 months ago

phallusy fallacy: posting like a cock

blakestacey@awful.systems · 7 months ago

People just out here acting like a fundamentally, inextricably unreliable and unethical technology has a “use case”

smdh

Lvxferre [he/him] · 7 months ago

Holy shit, imagine crying “waah, debatelord!” because someone is trying to follow some non-fallacious = non-fucking-stupid reasoning here. It’s almost like you want a circlejerk on the exact same level as those muppets burning effigies, just with the opposite discourse.

froztbyte@awful.systems · edit-2 7 months ago

this argument

I agree, you’re quite right, and I thank you for taking the time and putting in the effort on such a wonderfully thorough portrayal of why your argument is total horseshit

Lvxferre [he/him] · edit-2 7 months ago

I agree, you’re quite right, and I think [thank] you for taking the time and putting in the effort on such a wonderfully thorough portrayal of why your argument is total horseshit

Me: “yeah, I get why you’re venting, those people saying that 2+2=5 are stupid. However we shouldn’t be claiming that 2+2=3 either, it’s also stupid. 2+2=4.”

You: “think u for proving that ppl who think that 2+2 is not 3, like you’re self, are wrong! lol lmao haha”

If this isn’t clear enough, I’m saying that both sides are wrong - both the people who are hyping AI and shoving it everywhere, and the people trying to dismiss it as completely useless. Is this clear now?

Stop throwing the baby out with the dirty bathtub water dammit.

queermunist she/her@lemmy.ml · 7 months ago

Unless it doesn’t accurately represent the topic, which happens, and then a researcher chooses not to read the text based on the chatbot’s summary.

Nirvana fallacy.

All these chatbots do is guess. I’m just saying a researcher might as well cut out the hallucinating middleman.

Lvxferre [he/him] · 7 months ago

All these chatbots do is guess.

To call it “guessing” overestimates its abilities. It’s doing something even dumber - picking words and throwing them into a grammatically consistent whole, with barely any regards to meaning.

But sometimes dumb shit is still useful. Just like my bash scripts.

I’m just saying a researcher might as well cut out the hallucinating middleman.

That’s why I called it a “nirvana fallacy” - it does hallucinate so it is not perfect, nor it’ll ever be (contrariwise to what tech bros want you to have “faith” = brainlessness towards). But by “cutting out the hallucinating middleman” you’re probably ignoring a lot of articles that might be useful for your research. Yes, ignoring them - because if you’re going by title alone you won’t read them, as it’s too much stuff to bother.

I’ll give you a practical example. Suppose for a moment that you need info on the evolution of Hittite and other Anatolian languages. Based on the title alone (as you proposed in an earlier comment), would this article be useful? Probably not - the title doesn’t mention “Anatolian”, “Hittite”, “Luwian”, anything like that. And the article is 30 pages long so might as well skip it.

Now throw “Summarise https://allegatifac.unipv.it/silvialuraghi/Gender FoL.pdf into three paragraphs.” into Gemini.

Here's the output:

The PDF you linked discusses the origin of the gender system in Proto-Indo-European (PIE), the ancestral language to many European and Asian languages. In the early 20th century, all known Indo-European languages seemed to have a three-gender system, including feminine. However, the discovery of Hittite, a language with only two genders and no clear feminine pronouns, challenged this theory.

The paper explores two opposing viewpoints that emerged among Hittitologists and Indo-Europeanists. The first, called the Schwundhypothese, suggests that Hittite lost its feminine gender over time. The other, the Herkunfthypothese, proposes that PIE itself only had a two-gender system, and the feminine arose later in some descendant languages.

The debate continues, with the unearthed evidence from Hittite sparking more questions than answers. The paper delves into the typological considerations of these gender systems, but a definitive explanation for the origin of the Proto-Indo-European gender system remains elusive.

Can you trust the claims within that output? Fuck no, Gemini is likely losing its marbles. (For example, it implies that early PIE had masculine vs. neuter - not quite.) But it mentions Hittite enough times to make you correctly conclude that it’s a topic that the article talks about, and perhaps you should be reading those thirty pages instead of ignoring them.

[Sorry for the fairly specific example. I wanted something that I could talk about, so I used an article that I know by heart about a topic that I know something about.]

queermunist she/her@lemmy.ml · 7 months ago

It just so happened to work out in your very specific example.

Do you think it will always give a summary that is useful to researchers? It obviously won’t! It will, often, work just fine. When sorting through hundreds of documents, though, it will produce sorting errors and that will cause researchers to dismiss important documents by accident.

Maybe that’s fine. Maybe having the occasional error in the summary, which causes the researcher to dismiss the paper erroneously, is better than just guessing based on the title.

But that’s a lot of power to put in the hands of a dumbass chatbot.

Lvxferre [he/him] · 7 months ago

It just so happened to work out in your very specific example.

Emphasis mine. This shit is so trivial to reproduce with other texts, but so trivial, that you’re only making a fool of yourself with this hand-waving.

I’ll provide you further examples, showing that no, it didn’t “just so” happened to work in this “very specific” example. Now using content linked in this comm instead of sci articles so you don’t assume/lie cherry picking later on.

The link in the OP

[Input] Summarise https://pivot-to-ai.com/2024/09/04/dont-use-ai-to-summarize-documents-its-worse-than-humans-in-every-way/ into a single paragraph.

[Output] The article titled “Don’t use AI to summarize documents - it’s worse than humans in every way” argues that Large Language Models (LLMs) are not effective at summarizing documents. The author, Marcelo Soares, believes LLMs are best suited for generating generic text that isn’t particularly interesting and that their attempts at summarization are flawed.

AI and the Silicon Valley Hype Machine - NYT

[Input] Summarise https://www.nytimes.com/2024/05/15/opinion/artificial-intelligence-ai-openai-chatgpt-overrated-hype.html into three paragraphs.

[Output] The New York Times opinion piece titled “Will A.I. Ever Live Up to Its Hype?” by Julia Angwin argues that Artificial Intelligence, particularly OpenAI’s ChatGPT, is falling short of its grand promises.

Just a year ago, researchers worried AI might become too powerful. Now, the concern is whether it’s powerful enough. Angwin criticizes OpenAI’s recent announcement of a faster, cheaper ChatGPT as underwhelming, suggesting it’s not the “magic” improvement that was hyped. She compares AI to an unreliable intern whose mistakes make it easier to do things yourself.

Angwin believes acknowledging AI’s limitations is crucial. It would allow for more efficient resource allocation and a shift towards more realistic solutions. We shouldn’t be fooled by Silicon Valley’s hype machine, she argues.

It shows what I said - you can’t use it for the claims themselves, but it’s a good enough way to know the topic of the article a bit past the title.

Do you think it will always give a summary that is useful to researchers? [emphasis in the original]

Still pretending that either something needs to be perfect or it’s useless (nirvana fallacy).

Maybe that’s fine. Maybe having the occasional error in the summary, which causes the researcher to dismiss the paper erroneously, is better than just guessing based on the title.

What you’re proposing (to guess based on title) leads to more papers being dismissed erroneously. You’re making the problem worse by ignoring the tool than by using it with all its flaws.

And it is not just sci articles. Every bloody time that you have more text than you can reasonably read, those “AI shortened versions” make you pick up something to read that you would not do otherwise.

Since both of us are clearly repeating arguments I’m going to end the discussion from my part here. I’ll still read any potential reply, but I’m not going to reply further myself.

David Gerard@awful.systems · 7 months ago

Both the use cases here are goverment documents. I’m baffled at the idea of it being “fine if the AI makes shit up”.

Lvxferre [he/him] · 7 months ago

As I mentioned another comment I simply skipped the word “document”.

I’m baffled at the idea of it being “fine if the AI makes shit up”.

It’s fine as long as you know that it’ll make shit up, and you aren’t giving its claims an ounce of faith. Still useful to know what a text is about.

The fact that it’ll make shit up is a problem on another level: those systems are being marketed as if they were able to provide you reliable answers, when it is clearly not the case. It will tell you to put glue on your pizza, and yet you’re expected to “trust” it… yeah nah.

V0ldek@awful.systems · 7 months ago

if the text is well-written you don’t need this sort of “Gemini/ChatGPT, tell me what this text is about” on first place.

And if it’s badly written then the LLM will shit itself.

Now let’s ask ourselves how much of the text in the world is “well-written”?

Or even better, you could apply this to Copilot. How much code in the world is good code? The answer is fucking none, mate.

Ball is in parking lot@mastodon.me.uk · 8 months ago

@lvxferre @dgerard have you bumped your head?

Lvxferre [he/him] · 8 months ago

No, it’s just rambling. My bad.

I focused too much on using AI to summarise and ended not talking about it summarising documents, even if the text is about the later.

And… well, the later is such a dumb idea that I don’t feel like telling people “the text is right, don’t do that”, it’s obvious.

David Gerard@awful.systems · 7 months ago

You’d think so, but guess what precise use case LLMs are being pushed hard for.

Lvxferre [he/him] · 7 months ago

No need to guess - I’ve seen it, you’re right about what they’re being pushed hard for: they’re being pushed as intelligent and able to understand language, when neither is true. The expectation that they should be able to output accurate summaries is a consequence of both.