Turns out that LLM summaries are actually useful.
Not for *summarizing* text -- they're horrible for that. They're weighted statistical models and by their very nature they'll drop the least common or most unusual bits of things. Y'know, the parts of a message that are actually important.
No, where they're great is as a writing check. If an LLM summary of your work is accurate that indicates what you wrote doesn't really have much interesting information in it and maybe you should try harder.
This entry was edited (3 weeks ago)
reshared this
Shiri Bailem
in reply to Dan Sugalski • •@Dan Sugalski I disagree, they're great for summarizing. It's akin to high powered skimming, so yeah it's not good for little unusual details, but if you're using an AI to summarize text and have any sense you're not looking at text where you're worried about those little exceptions.
Prime example that I've leaned on often is that Amazon now provides an AI summarization at the top of reviews. Odd exceptions are just that with reviews, but it'll definitely highlight the general patterns and let you know if you should dig deeper.
Also I've honestly used it like above once or twice with the intent of just seeing if my point comes across... most conversations aren't really about the fine details or unusual bits?
like this
ClickyMcTicker, Chad Fennell, Devon Strang, gnarf and Jeff Grigg like this.
Dan Sugalski
in reply to Shiri Bailem • • •@shiri This is where they're deceptive and you're setting yourself up for trouble.
If the backing model has seen "this product is useful" 100x more often than "this product is not useful", when summarizing text containing "this product is not useful" or its longwinded analog you are *far* more likely to get "this product is useful" as a result of the summarization, which inverts the meaning. That is probably not what you want.
Shiri Bailem
in reply to Dan Sugalski • •@Dan Sugalski not at all how it works and maybe you should be checking your bias on some of this?
What it does is highlight the points being brought up, some examples:
A dehumidifier I recently purchased with good reviews, basically boiling down to nothing of note (it's a very small dehumidifier and I wasn't worried about water capacity because it has a drain hose and I planned to put it on the counter next to the sink):
Now a different dehumidifier with worse reviews, if I was considering this one I'd check in deeper about those complaints:
Despite what you said before, these are not weighted statistical models, it's drastically different from that and does show debatable levels of comprehension of text (debatable only because it's an argument for philosophers where the line of comprehension is).
I'm not advocating for the abuses of AI (ranging from ecological impact of poor tuning and overuse of high end models to efforts by companies to replace artists), but this technology really does have meaning and value when used properly. Text summarization for large bulk non-sensitive text is absolutely one of those cases.
like this
ClickyMcTicker, Chad Fennell and eanopolsky like this.
Roufnichan LeGauf π
in reply to Shiri Bailem • • •@shiri
So what would you say is an LLM if not a weighted statistical model of language?
Shiri Bailem
in reply to Roufnichan LeGauf π • •@Robert Wire π ... they are a Large Language Model... There's a reason they're called LLMs and not WSMs...
They are their own thing. A weighted statistical model is far too simplistic for how an LLM works and is more along the lines of an old 90s spam filter.
LLMs are very complex systems which go far beyond my scope of patience to explain. Might be good to just review the wikipedia article on it: en.wikipedia.org/wiki/Large_laβ¦
Roufnichan LeGauf π
in reply to Shiri Bailem • • •wizzwizz4
in reply to Shiri Bailem • • •@shiri @barubary I don't think you've developed an intuitive understanding of the universal approximation theorem. en.wikipedia.org/wiki/Universaβ¦
I'm happy to explain further, if you don't understand how a weighted statistical model could demonstrate some behaviours we associate with understanding, while remaining wholly incapable of others.
theorem that a feed-forward network with a single hidden layer can approximate continuous functions
Contributors to Wikimedia projects (Wikimedia Foundation, Inc.)adaddinsane (Steve Turnbull)
in reply to Dan Sugalski • • •@shiri
How about an actual study...
crikey.com.au/2024/09/03/ai-woβ¦
tl;dr: they're shit at summarising. But it's obvious anyway - to summarise requires the ability to accurately identify the important points, which requires understanding.
LLMs have no understanding therefore they cannot summarise.
And if you need proof they can't reason:
garymarcus.substack.com/p/llmsβ¦
LLMs donβt do formal reasoning - and that is a HUGE problem
Gary Marcus (Marcus on AI)Shiri Bailem
in reply to adaddinsane (Steve Turnbull) • •@adaddinsane (Steve Turnbull) @Dan Sugalski yeah... fantastic evidence showing bias and lack of reasoning in your response:
If you're going to argue they're bad at summarizing then you need to actually show summarizing not compiling paperwork, not citations, not math.
God, I feel like half this anti-AI drivel itself could be AI generated.
There are real problems with AI, but people like y'all smoke screen the shit out of them to the point where I sometimes wonder if corporations are paying for anti-AI sock puppets to just hide the real issues.
like this
eanopolsky, Mark Schaffer and Jeff Grigg like this.
ben_tinc
in reply to Shiri Bailem • • •@shiri @adaddinsane
I mean, sure. If you only count one extremely specific and restricted meaning of "summarizing" as "real summarizing", LLMs might be adequate for it. But this is far from a consensual meaning.
In this sense, your argument resembles the "no true Scotsman" fallacy (unless you _actually_ want everyone to lower their expectations of what an adequate summary should be, which I do not want to assume).
Shiri Bailem
in reply to ben_tinc • •@ben_tinc @adaddinsane (Steve Turnbull) @Dan Sugalski from Meriam-Webster: Summarize means to make a summary.
Summary:
(adjective) Covering the main points succinctly
(noun) an abstract, abridgment, or compendium especially of a preceding discourse
Summaries are not citations and definitely aren't math, they're basically just paraphrasing down text.
Digging a little further into the government study being cited, here's a great bit of text from the study itself:
In other words the authors themselves said this isn't an indication that it's bad at summarizing, just that this solution doesn't look promising for their specific use case.
I get where you're feeling the no-true-scotsman vibe, but it comes from the fact that I'm pointing out additional requirements on top of summarizing as not being reasons for it being bad at summarizing. One was a government office testing for a specific special use case rather than a general testing of summarization abilities, and the other was entirely about it's ability to work math problems.
ben_tinc
in reply to Shiri Bailem • • •@shiri @adaddinsane
I think it is fair to say that LLMs might be adequate for certain tasks which go by the label summary, as your own positive experiences show very clearly.
And it is certainly proper of the study authors to not promote over-generalization of their results. However, they also say that while the model performed _especially_ bad wtt identifying references to ASIC, humans scored better across all metrics.
ben_tinc
in reply to ben_tinc • • •[...] and this includes what I would call the absolute minimum: highlighting the central point of the summarized text in the 'general summary' portion of the test (section 5.1 in the full report).
Ben Aveling
in reply to adaddinsane (Steve Turnbull) • • •@adaddinsane No fair using facts.
@wordshaper @shiri
Shiri Bailem
in reply to Ben Aveling • •eanopolsky likes this.
Ben Aveling
in reply to Shiri Bailem • • •Shiri Bailem
in reply to Ben Aveling • •@Ben Aveling @adaddinsane (Steve Turnbull) @Dan Sugalski already did: foggyminds.com/display/c6ef095β¦
Shiri Bailem
2024-10-23 15:29:13
Leon Bambrick likes this.
ClickyMcTicker
in reply to Dan Sugalski • • •@shiri It would appear your anti-AI bias has tainted your ability to judge them fairly.
Amazonβs product AI, environmental and other issues aside, provides valid high level summarization of product reviews. At present time, the model has not been tainted or tampered with to provide positive feedback, but rather provides an accurate representation of the typical review feedback. It tends to covers likes, dislikes, concerns, and overall views
Ted Mielczarek
in reply to ClickyMcTicker • • •Shiri Bailem
in reply to Ted Mielczarek • •@Ted Mielczarek @ClickyMcTicker @Dan Sugalski where did I say "Amazon said so"? I just cited Amazon's use case as a great example from personal experience.
I'm not digging around constantly for some article somewhere that says "AI has been tested to get the general gist right in some arbitrary scientific study", I'm doubtful such a thing exists given how fuzzy it is. But I can sure as hell poke holes in the bullshit y'all are posting.
If y'all are ever interested in the real problems of AI hit me up, but until then maybe enjoy screaming about the evils of how cars are killing the horse drawn carriage business and worse than horses at steering themselves while ignoring all other issues around them.
Ted Mielczarek
in reply to Shiri Bailem • • •Shiri Bailem
in reply to Ted Mielczarek • •Ted Mielczarek likes this.
Ted Mielczarek
in reply to Shiri Bailem • • •unexpectedteapot
in reply to Shiri Bailem • • •@shiri
>most conversations aren't really about the fine details or unusual bits
I suspect this captures the crux of what the root post was pointing out when C/VP levels were mentioned: a criticism of conversations [between higher level employees and lower level] that are devoid of helpful/new information.
Clare Hooley
in reply to Shiri Bailem • • •Shiri Bailem
in reply to Clare Hooley • •@Clare Hooley @Dan Sugalski so your argument that it's useless as a whole is that it's bad in areas where summarizations aren't so useful in general?
Books where a general summary of the comments of everyone about the book... in what world would this be useful?
And on a camping stove you're citing the fact that people often said redundant things about it and that those things showed up in the summary as evidence that it's bad? You think a good summarization would edit our things it thinks unnecessary, and you think that there's any version in which that would be an improvement?
Clare Hooley
in reply to Shiri Bailem • • •Shiri Bailem
in reply to Clare Hooley • •@Clare Hooley @Dan Sugalski The problem is that is what users really thought (aside from as you said it's listed as a negative which is an honest goof), it's saying that because a lot of people apparently thought that was good information to include in the review.
The summary isn't trying to tell you whether or not it's a good product, and it's definitely not trying to extrapolate underlying intents, it's just trying to summarize what's talked about in the various reviews so you can have an idea if there's something you might want to look into before you buy it.
I gave a prime example earlier quoting two summaries from dehumidifiers, the first is one I bought myself and the second was me digging for one that was less than 4 stars.
Clare Hooley
in reply to Shiri Bailem • • •Tristan Harward
in reply to Dan Sugalski • • •Maho Pacheco π¦π»
in reply to Dan Sugalski • • •I could not agree more. I am just experimenting with converting from a blog post to a thread using LLMS.
Check the thread: hachyderm.io/@mapache/11335991β¦
And the blog post: maho.dev/2024/09/my-coffee-hisβ¦
It got rid of the parts I mostly like. A shame.
Maho Pacheco π¦π»
2024-10-24 01:56:02
Shiri Bailem
in reply to Maho Pacheco π¦π» • •Maho Pacheco π¦π»
in reply to Shiri Bailem • • •it actually did a good job on summarizing and get the points, it did not miss the big parts. But as @shiri mentions, the points that make the post special (nuanced, human) got "washed".
For example, the blog post talks about coffee as a character: "Coffee in my childhood was always a supporting character in the main story."
Also, some of my fav phrases got removed (e.g. "as we say in Latin America, to be young and not revolutionary is almost a biological contradiction.")
Shiri Bailem
in reply to Maho Pacheco π¦π» • •tqwhite
in reply to Dan Sugalski • • •GLC
in reply to tqwhite • • •@tqwhite @Walrus
Is the misprint that makes one of your key instructions meaningless also part of your standard prompt?
(I have no real idea, but the layout suggests that it is.)
tqwhite
in reply to GLC • • •flere-imsaho
in reply to tqwhite • • •tqwhite
in reply to flere-imsaho • • •