Foggyminds Social

Dan Sugalski

3 weeks ago • •

Dan Sugalski
3 weeks ago • •

Turns out that LLM summaries are actually useful.

Not for *summarizing* text -- they're horrible for that. They're weighted statistical models and by their very nature they'll drop the least common or most unusual bits of things. Y'know, the parts of a message that are actually important.

No, where they're great is as a writing check. If an LLM summary of your work is accurate that indicates what you wrote doesn't really have much interesting information in it and maybe you should try harder.

This entry was edited (3 weeks ago)

reshared this

in reply to Dan Sugalski

Shiri Bailem

in reply to Dan Sugalski • 3 weeks ago •

@Dan Sugalski I disagree, they're great for summarizing. It's akin to high powered skimming, so yeah it's not good for little unusual details, but if you're using an AI to summarize text and have any sense you're not looking at text where you're worried about those little exceptions.

Prime example that I've leaned on often is that Amazon now provides an AI summarization at the top of reviews. Odd exceptions are just that with reviews, but it'll definitely highlight the general patterns and let you know if you should dig deeper.

Also I've honestly used it like above once or twice with the intent of just seeing if my point comes across... most conversations aren't really about the fine details or unusual bits?

@Dan Sugalski

like this

in reply to Shiri Bailem

Dan Sugalski

in reply to Shiri Bailem • 3 weeks ago • •

@shiri This is where they're deceptive and you're setting yourself up for trouble.

If the backing model has seen "this product is useful" 100x more often than "this product is not useful", when summarizing text containing "this product is not useful" or its longwinded analog you are *far* more likely to get "this product is useful" as a result of the summarization, which inverts the meaning. That is probably not what you want.

@Shiri Bailem

in reply to Dan Sugalski

Shiri Bailem

in reply to Dan Sugalski • 3 weeks ago •

@Dan Sugalski not at all how it works and maybe you should be checking your bias on some of this?

What it does is highlight the points being brought up, some examples:

A dehumidifier I recently purchased with good reviews, basically boiling down to nothing of note (it's a very small dehumidifier and I wasn't worried about water capacity because it has a drain hose and I planned to put it on the counter next to the sink):

Customers are satisfied with the dehumidifier's performance, quiet operation, and small size. They find it effective at removing moisture from the air without disturbing sleep. Many appreciate its ease of use and emptying process. However, opinions differ on its value for money and water capacity.

Now a different dehumidifier with worse reviews, if I was considering this one I'd check in deeper about those complaints:

Customers appreciate the dehumidifier's ability to remove moisture from basements and windows. They find it quiet and effective in controlling humidity levels. However, some customers report durability issues with the product breaking down after a short period of use. There are also complaints about error codes and vibrations. Opinions differ on functionality, noise level, pump performance, recharging time, and energy efficiency.

Despite what you said before, these are not weighted statistical models, it's drastically different from that and does show debatable levels of comprehension of text (debatable only because it's an argument for philosophers where the line of comprehension is).

I'm not advocating for the abuses of AI (ranging from ecological impact of poor tuning and overuse of high end models to efforts by companies to replace artists), but this technology really does have meaning and value when used properly. Text summarization for large bulk non-sensitive text is absolutely one of those cases.

@Dan Sugalski

like this

in reply to Shiri Bailem

Roufnichan LeGauf 🌈

in reply to Shiri Bailem • 3 weeks ago • •

@shiri

these are not weighted statistical models

So what would you say is an LLM if not a weighted statistical model of language?

@Shiri Bailem

in reply to Roufnichan LeGauf 🌈

Shiri Bailem

in reply to Roufnichan LeGauf 🌈 • 3 weeks ago •

@Robert Wire 🌈 ... they are a Large Language Model... There's a reason they're called LLMs and not WSMs...

They are their own thing. A weighted statistical model is far too simplistic for how an LLM works and is more along the lines of an old 90s spam filter.

LLMs are very complex systems which go far beyond my scope of patience to explain. Might be good to just review the wikipedia article on it: en.wikipedia.org/wiki/Large_la…

@Roufnichan LeGauf 🌈

in reply to Shiri Bailem

Roufnichan LeGauf 🌈

in reply to Shiri Bailem • 3 weeks ago • •

@shiri I checked the Wikipedia article and it confirms what Dan said. Not sure what your point is.

@Shiri Bailem

in reply to Shiri Bailem

wizzwizz4

in reply to Shiri Bailem • 3 weeks ago • •

@shiri @barubary I don't think you've developed an intuitive understanding of the universal approximation theorem. en.wikipedia.org/wiki/Universa…

I'm happy to explain further, if you don't understand how a weighted statistical model could demonstrate some behaviours we associate with understanding, while remaining wholly incapable of others.

theorem that a feed-forward network with a single hidden layer can approximate continuous functions

^{Contributors to Wikimedia projects (Wikimedia Foundation, Inc.)}

@Shiri Bailem @Roufnichan LeGauf 🌈

in reply to Dan Sugalski

adaddinsane (Steve Turnbull)

in reply to Dan Sugalski • 3 weeks ago • •

@shiri

How about an actual study...

crikey.com.au/2024/09/03/ai-wo…

tl;dr: they're shit at summarising. But it's obvious anyway - to summarise requires the ability to accurately identify the important points, which requires understanding.

LLMs have no understanding therefore they cannot summarise.

And if you need proof they can't reason:

garymarcus.substack.com/p/llms…

LLMs don’t do formal reasoning - and that is a HUGE problem

Important new study from Apple

^{Gary Marcus (Marcus on AI)}

@Shiri Bailem

in reply to adaddinsane (Steve Turnbull)

Shiri Bailem

in reply to adaddinsane (Steve Turnbull) • 3 weeks ago •

@adaddinsane (Steve Turnbull) @Dan Sugalski yeah... fantastic evidence showing bias and lack of reasoning in your response:

Government review? It wasn't asked to do general summaries, it was asked to reference page numbers, citations, etc all of which are precision tasks, especially precision numerical tasks. This doesn't mean it's bad at summarizing, it means it's bad at the task given.
The "no formal reasoning" argument being cited has been going around for a while now and has nothing to do with reasoning, AI is bad at math and handles data in a similar sense to a quick skim, and a quick skim of a math problem with gotchas in it is going to catch just about everyone.

If you're going to argue they're bad at summarizing then you need to actually show summarizing not compiling paperwork, not citations, not math.

God, I feel like half this anti-AI drivel itself could be AI generated.

There are real problems with AI, but people like y'all smoke screen the shit out of them to the point where I sometimes wonder if corporations are paying for anti-AI sock puppets to just hide the real issues.

@adaddinsane (Steve Turnbull) @Dan Sugalski

like this

in reply to Shiri Bailem

ben_tinc

in reply to Shiri Bailem • 3 weeks ago • •

@shiri @adaddinsane
I mean, sure. If you only count one extremely specific and restricted meaning of "summarizing" as "real summarizing", LLMs might be adequate for it. But this is far from a consensual meaning.

In this sense, your argument resembles the "no true Scotsman" fallacy (unless you _actually_ want everyone to lower their expectations of what an adequate summary should be, which I do not want to assume).

@Shiri Bailem @adaddinsane (Steve Turnbull)

in reply to ben_tinc

Shiri Bailem

in reply to ben_tinc • 3 weeks ago •

@ben_tinc @adaddinsane (Steve Turnbull) @Dan Sugalski from Meriam-Webster: Summarize means to make a summary.

Summary:
(adjective) Covering the main points succinctly
(noun) an abstract, abridgment, or compendium especially of a preceding discourse

Summaries are not citations and definitely aren't math, they're basically just paraphrasing down text.

Digging a little further into the government study being cited, here's a great bit of text from the study itself:

It is important to note that the results should not be extrapolated more widely as:
• the timeframe allowed to optimise the model was limited to one week as this was as short duration PoC; and
• the PoC’s point-in-time results relate to the use of certain prompts using a specific LLM and selected for one specific use-case. This limits the generalisability of the findings to other use cases and LLMs.

In other words the authors themselves said this isn't an indication that it's bad at summarizing, just that this solution doesn't look promising for their specific use case.

I get where you're feeling the no-true-scotsman vibe, but it comes from the fact that I'm pointing out additional requirements on top of summarizing as not being reasons for it being bad at summarizing. One was a government office testing for a specific special use case rather than a general testing of summarization abilities, and the other was entirely about it's ability to work math problems.

@adaddinsane (Steve Turnbull) @Dan Sugalski @ben_tinc

in reply to Shiri Bailem

ben_tinc

in reply to Shiri Bailem • 3 weeks ago • •

@shiri @adaddinsane
I think it is fair to say that LLMs might be adequate for certain tasks which go by the label summary, as your own positive experiences show very clearly.

And it is certainly proper of the study authors to not promote over-generalization of their results. However, they also say that while the model performed _especially_ bad wtt identifying references to ASIC, humans scored better across all metrics.

@Shiri Bailem @adaddinsane (Steve Turnbull)

in reply to ben_tinc

ben_tinc

in reply to ben_tinc • 3 weeks ago • •

@shiri @adaddinsane
[...] and this includes what I would call the absolute minimum: highlighting the central point of the summarized text in the 'general summary' portion of the test (section 5.1 in the full report).

@Shiri Bailem @adaddinsane (Steve Turnbull)

in reply to adaddinsane (Steve Turnbull)

Ben Aveling

in reply to adaddinsane (Steve Turnbull) • 3 weeks ago • •

@adaddinsane No fair using facts.

@wordshaper @shiri

@Shiri Bailem @adaddinsane (Steve Turnbull) @Dan Sugalski

in reply to Ben Aveling

Shiri Bailem

in reply to Ben Aveling • 3 weeks ago •

@Ben Aveling @adaddinsane (Steve Turnbull) @Dan Sugalski unrelated facts that are easily debunkable as not proving the point trying to be made.

@adaddinsane (Steve Turnbull) @Dan Sugalski @Ben Aveling

eanopolsky likes this.

in reply to Shiri Bailem

Ben Aveling

in reply to Shiri Bailem • 3 weeks ago • •

@adaddinsane @shiri go ahead

@Shiri Bailem @adaddinsane (Steve Turnbull)

in reply to Ben Aveling

Shiri Bailem

in reply to Ben Aveling • 3 weeks ago •

@Ben Aveling @adaddinsane (Steve Turnbull) @Dan Sugalski already did: foggyminds.com/display/c6ef095…

Shiri Bailem

2024-10-23 15:29:13

@adaddinsane (Steve Turnbull) @Dan Sugalski yeah... fantastic evidence showing bias and lack of reasoning in your response:
Government review? It wasn't asked to do general summaries, it was asked to reference page numbers, citations, etc all of which are precision tasks, especially precision numerical tasks. This doesn't mean it's bad at summarizing, it means it's bad at the task given.
The "no formal reasoning" argument being cited has been going around for a while now and has nothing to do with reasoning, AI is bad at math and handles data in a similar sense to a quick skim, and a quick skim of a math problem with gotchas in it is going to catch just about everyone.
If you're going to argue they're bad at summarizing then you need to actually show summarizing not compiling paperwork, not citations, not math.
God, I feel like half this anti-AI drivel itself could be AI generated.
There are real problems with AI, but people like y'all smoke screen the shit out of them to the point where I sometimes wonder if corporations are paying for anti-AI sock puppets to just hide the real issues.

@adaddinsane (Steve Turnbull) @Dan Sugalski @Ben Aveling

Leon Bambrick likes this.

in reply to Dan Sugalski

ClickyMcTicker

in reply to Dan Sugalski • 3 weeks ago • •

@shiri It would appear your anti-AI bias has tainted your ability to judge them fairly.

Amazon’s product AI, environmental and other issues aside, provides valid high level summarization of product reviews. At present time, the model has not been tainted or tampered with to provide positive feedback, but rather provides an accurate representation of the typical review feedback. It tends to covers likes, dislikes, concerns, and overall views

@Shiri Bailem

in reply to ClickyMcTicker

Ted Mielczarek

in reply to ClickyMcTicker • 3 weeks ago • •

@ClickyMcTicker @shiri I'm sorry, but how exactly would you back up those claims you're making? "Amazon said so" isn't very compelling.

@Shiri Bailem @ClickyMcTicker

in reply to Ted Mielczarek

Shiri Bailem

in reply to Ted Mielczarek • 3 weeks ago •

@Ted Mielczarek @ClickyMcTicker @Dan Sugalski where did I say "Amazon said so"? I just cited Amazon's use case as a great example from personal experience.

I'm not digging around constantly for some article somewhere that says "AI has been tested to get the general gist right in some arbitrary scientific study", I'm doubtful such a thing exists given how fuzzy it is. But I can sure as hell poke holes in the bullshit y'all are posting.

If y'all are ever interested in the real problems of AI hit me up, but until then maybe enjoy screaming about the evils of how cars are killing the horse drawn carriage business and worse than horses at steering themselves while ignoring all other issues around them.

@Ted Mielczarek @Dan Sugalski @ClickyMcTicker

in reply to Shiri Bailem

Ted Mielczarek

in reply to Shiri Bailem • 3 weeks ago • •

@shiri @ClickyMcTicker this is in response to the post I directly replied to from "Clicky McTicker".

@Shiri Bailem @ClickyMcTicker

in reply to Ted Mielczarek

Shiri Bailem

in reply to Ted Mielczarek • 3 weeks ago •

@Ted Mielczarek @ClickyMcTicker @Dan Sugalski Clicky's comment honestly just reads like an AI summary of mine so you can understand the confusion lol

@Ted Mielczarek @Dan Sugalski @ClickyMcTicker

Ted Mielczarek likes this.

in reply to Shiri Bailem

Ted Mielczarek

in reply to Shiri Bailem • 3 weeks ago • •

@shiri No worries. It's possible to disagree on topics like this and still attempt to communicate effectively!

@Shiri Bailem

in reply to Shiri Bailem

unexpectedteapot

in reply to Shiri Bailem • 3 weeks ago • •

@shiri
>most conversations aren't really about the fine details or unusual bits

I suspect this captures the crux of what the root post was pointing out when C/VP levels were mentioned: a criticism of conversations [between higher level employees and lower level] that are devoid of helpful/new information.

@Shiri Bailem

in reply to Shiri Bailem

Clare Hooley

in reply to Shiri Bailem • 3 weeks ago • •

@shiri Amazon=case study of why ai summaries are useless. On books, just far too generic (readers say the pacing was good; readers say the characterization was good). Drops all the bits I’m most interested in (plot, themes, why the pacing was good). On other items, just lifts phrases out of reviews with no idea of the context (on the negative side, some users report this item boils water very fast - this on a camping stove).

@Shiri Bailem

in reply to Clare Hooley

Shiri Bailem

in reply to Clare Hooley • 3 weeks ago •

@Clare Hooley @Dan Sugalski so your argument that it's useless as a whole is that it's bad in areas where summarizations aren't so useful in general?

Books where a general summary of the comments of everyone about the book... in what world would this be useful?

And on a camping stove you're citing the fact that people often said redundant things about it and that those things showed up in the summary as evidence that it's bad? You think a good summarization would edit our things it thinks unnecessary, and you think that there's any version in which that would be an improvement?

@Dan Sugalski @Clare Hooley

in reply to Shiri Bailem

Clare Hooley

in reply to Shiri Bailem • 3 weeks ago • •

a good summary would provide me information to allow me to assess what readers/users really thought. These do not (in this case of the stove, yes it’s obvious it’s made a mistake with the ‘on the negative side’ but it’s not always obvious to pick out things that are completely wrong but statistically likely).

This entry was edited (3 weeks ago)

in reply to Clare Hooley

Shiri Bailem

in reply to Clare Hooley • 3 weeks ago •

@Clare Hooley @Dan Sugalski The problem is that is what users really thought (aside from as you said it's listed as a negative which is an honest goof), it's saying that because a lot of people apparently thought that was good information to include in the review.

The summary isn't trying to tell you whether or not it's a good product, and it's definitely not trying to extrapolate underlying intents, it's just trying to summarize what's talked about in the various reviews so you can have an idea if there's something you might want to look into before you buy it.

I gave a prime example earlier quoting two summaries from dehumidifiers, the first is one I bought myself and the second was me digging for one that was less than 4 stars.

@Dan Sugalski @Clare Hooley

in reply to Shiri Bailem

Clare Hooley

in reply to Shiri Bailem • 3 weeks ago • •

@shiri 🙂 The only positive thing I can say for the summaries is at least it has picked up ‘we thank blah for an ARC in exchange for an honest review.’ Isn’t useful.

@Shiri Bailem

in reply to Dan Sugalski

Tristan Harward

in reply to Dan Sugalski • 3 weeks ago • •

this aligns to my theory about LLM investment and potential progress toward generalized intelligence; which is that further development of generative AI will just create models that are better and better at being thoroughly and completely mediocre.

in reply to Dan Sugalski

Maho Pacheco 🦝🍻

in reply to Dan Sugalski • 3 weeks ago • •

I could not agree more. I am just experimenting with converting from a blog post to a thread using LLMS.

Check the thread: hachyderm.io/@mapache/11335991…

And the blog post: maho.dev/2024/09/my-coffee-his…

It got rid of the parts I mostly like. A shame.

Maho Pacheco 🦝🍻

2024-10-24 01:56:02

My relationship with coffee probably began in my mother's womb. She may deny it, but I'm pretty sure she drank more coffee than advised or allowed in the 1980s.
#coffee #story

This entry was edited (3 weeks ago)

in reply to Maho Pacheco 🦝🍻

Shiri Bailem

in reply to Maho Pacheco 🦝🍻 • 3 weeks ago •

@Maho Pacheco 🦝🍻 @Dan Sugalski did it get the points and just miss parts you like, or did it miss major highlights of the post?

@Dan Sugalski @Maho Pacheco 🦝🍻

in reply to Shiri Bailem

Maho Pacheco 🦝🍻

in reply to Shiri Bailem • 3 weeks ago • •

it actually did a good job on summarizing and get the points, it did not miss the big parts. But as @shiri mentions, the points that make the post special (nuanced, human) got "washed".

For example, the blog post talks about coffee as a character: "Coffee in my childhood was always a supporting character in the main story."

Also, some of my fav phrases got removed (e.g. "as we say in Latin America, to be young and not revolutionary is almost a biological contradiction.")

@Shiri Bailem

This entry was edited (3 weeks ago)

in reply to Maho Pacheco 🦝🍻

Shiri Bailem

in reply to Maho Pacheco 🦝🍻 • 3 weeks ago •

@Maho Pacheco 🦝🍻 @Dan Sugalski that absolutely tracks

@Dan Sugalski @Maho Pacheco 🦝🍻

in reply to Dan Sugalski

tqwhite

in reply to Dan Sugalski • 3 weeks ago (Received 2 weeks ago) • •

@Walrus word!! I have a standard prompt that I use. It gives great results.

@Walrus 🏴󠁧󠁢󠁷󠁬󠁳󠁿

in reply to tqwhite

GLC

in reply to tqwhite • 3 weeks ago (Received 2 weeks ago) • •

@tqwhite @Walrus

Is the misprint that makes one of your key instructions meaningless also part of your standard prompt?

(I have no real idea, but the layout suggests that it is.)

@Walrus 🏴󠁧󠁢󠁷󠁬󠁳󠁿 @tqwhite

in reply to GLC

tqwhite

in reply to GLC • 3 weeks ago (Received 2 weeks ago) • •

@glc @Walrus I have no idea what you are talking about. However, I use this frequently and it works great.

@Walrus 🏴󠁧󠁢󠁷󠁬󠁳󠁿 @GLC

in reply to tqwhite

flere-imsaho

in reply to tqwhite • 2 weeks ago • •

@tqwhite @glc @Walrus he's talking about your prompt not doing what you expect it is doing, and yourself not noticing it.

@Walrus 🏴󠁧󠁢󠁷󠁬󠁳󠁿 @tqwhite @GLC

in reply to flere-imsaho

tqwhite

in reply to flere-imsaho • 2 weeks ago • •

@mawhrin @glc @Walrus that’s weird since I actually use it all the time and it definitely does what I want it to do. Not theory, frequently observed fact.

@flere-imsaho @Walrus 🏴󠁧󠁢󠁷󠁬󠁳󠁿 @GLC

⇧