Foggyminds Social

Yesterday Cory Doctorow argued that refusal to use LLMs was mere "neoliberal purity culture". I think his argument is a strawman, doesn't align with his own actions and delegitimizes important political actions we need to make in order to build a better cyberphysical world.

EDIT: Diskussions under this are fine, but I do not want this to turn into an ad hominem attack to Cory. Be fucking respectful

tante.cc/2026/02/20/acting-eth…

Acting ethically in an imperfect world

Life is complicated. Regardless of what your beliefs or politics or ethics are, the way that we set up our society and economy will often force you to act against them: You might not want to fly somewhere but your employer will not accept another mod…

^{tante (Smashing Frames)}

This entry was edited (3 weeks ago)

cholling likes this.

reshared this

in reply to tante

Cory Doctorow

in reply to tante • 3 weeks ago • •

Dunno where you got the idea that I have a "libertarian" background. I was raised by Trotskyists, am a member of the DSA, am advising and have endorsed Avi Lewis, and joined the UK Greens to back Polanski.

in reply to Cory Doctorow

R.L. LE

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic My impression was, Tante meant this specific argument and the way it is structured, and the way it functions. I hold the both of you in high esteem, and I don't have the impression that he'd somehow characterize anything beyond that argument he discusses.

@Cory Doctorow

in reply to R.L. LE

Cory Doctorow

in reply to R.L. LE • 3 weeks ago • •

@herrLorenz
> Cory shows his libertarian leanings here...

> Many people criticizing LLMs come from a somewhat leftist (in contrast to Cory’s libertarian) background.

@R.L. LE

Shiri Bailem likes this.

in reply to Cory Doctorow

Cory Doctorow

in reply to Cory Doctorow • 3 weeks ago • •

@herrLorenz
This falls into the "you are entitled to your own opinions, but not your own facts" territory.

@R.L. LE

Shiri Bailem likes this.

in reply to Cory Doctorow

R.L. LE

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic I just spoke about my impression, but didn't lay claim to objective truth. I'll keep reading along. ✌️

@Cory Doctorow

in reply to Cory Doctorow

CJPaloma might be hiking

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @herrLorenz that second example goes well into overreach territory, and I can see why you'd be not happy with it.

And/but a big part of libertarian appeal is that it does muddy how being "individually free from regulation" can be cast as liberatory. As if individual freedom is all that's needed. "I'm free when there are no regulations" is obviously shallow to lefties, but it (individual freedom) is also a component of why people are lefties, there's real overlap.

@Cory Doctorow @R.L. LE

in reply to CJPaloma might be hiking

Cory Doctorow

in reply to CJPaloma might be hiking • 3 weeks ago • •

@CJPaloma @herrLorenz
There is no virtue in being constrained or regulated per se.

Regulation isn't a good unto itself.

Regulation that is itself good - drawn up for a good purpose, designed to be administrable, and then competently administered - is good.

@R.L. LE @CJPaloma might be hiking

Shiri Bailem likes this.

in reply to Cory Doctorow

CJPaloma might be hiking

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @herrLorenz Of course! Agreed.

The overlap ends around -when- reasons are "good" enough. Laws about how to treat other people are relatively easy.

But until enough people see rivers on fire, regulations on -doing certain things- aren't imposed, despite many people saying "hey, this isn't good" decades prior.

Not reining in/regulating until after -foreseeable- catastrophes results in all kinds of shit shows (from the MIC, to urban sprawl, to plastics, to tax laws, etc)

@Cory Doctorow @R.L. LE

in reply to Cory Doctorow

Joris Meys

in reply to Cory Doctorow • 3 weeks ago • •

Fair enough, but that's not the core of the argument
@tante made. He had the same complaint for starters (your argument was heavily drenched in 'you ppl are purists' ), but he also makes the valid argument that technology isn't neutral in itself. Open weights based on intellectual theft and forced labor is still a problem. Until we have a discussion on how the weights come to fruitition, LLM's are objectively problematic from an ethical view. That has nothing to do with purism.

@tante

This entry was edited (3 weeks ago)

in reply to tante

Simon Zerafa (Status: 😊)

in reply to tante • 3 weeks ago • •

That doesn't seem to be the best idea @pluralistic

AI and LLM output is 90% bullshit, and most people don't have the time nor the patience to work out which 10% might actually be useful.

That's completely ignoring the environmental and human impacts of the AI bubble.

Try buying DDR memory, a GPU or an SSD / HDD at the moment.

@Cory Doctorow

in reply to Simon Zerafa (Status: 😊)

Cory Doctorow

in reply to Simon Zerafa (Status: 😊) • 3 weeks ago • •

@simonzerafa
What is the incremental environmental damage created by running an existing LLM locally on your own laptop?

As to "90% bullshit" - as I wrote, the false positive rate for punctuation errors and typos from Ollama/Llama2 is about 50%, which is substantially better than, say, Google Docs' grammar checker.

@Simon Zerafa (Status: 😊)

Shiri Bailem likes this.

in reply to Cory Doctorow

kel

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic

I am astonished that I have to explain this,

but very simply in words even a small child could understand:

using these products *creates further demand*

- surely you know this?

Well, either you know this and are being facetious, or you are a lot stupider than I ever thought possible for someone with your privilege and resources.

I am absolutely floored at this reveal, just wow, "where's Cory and what have you done with him?" 🤷

Massive loss of respect!

@simonzerafa @tante

@Cory Doctorow @Simon Zerafa (Status: 😊) @tante

in reply to kel

Shiri Bailem

in reply to kel • 3 weeks ago •

@kel it sounds like your respect is rooted only in someone agreeing with you. If you respected them you'd maybe take a minute to listen to their arguments and ask yourself more about why they might disagree with you.

Namely the fact that you don't understand how "using these products creates further demand" doesn't relate to their arguments at all?

@Cory Doctorow @Simon Zerafa (Status: 😊) @tante

@Cory Doctorow @Simon Zerafa (Status: 😊) @tante @kel

like this

in reply to Cory Doctorow

Simon Zerafa (Status: 😊)

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic
Of course, I am speaking in generalities.

Encouraging the use of LLM's is counterproductive in so many ways, as I highlighted.

Pop a power meter on that LLM adorned PC and let us all know what the power usage looks like with and without your chosen LLM running on a typical task 🙂

That's power that generated somewhere, even if it's with renewable energy.

The main issue with LLM's is that they don't encourage critical thinking, in a world which is already suffering from a massive shortage.

@Cory Doctorow

in reply to Simon Zerafa (Status: 😊)

Cory Doctorow

in reply to Simon Zerafa (Status: 😊) • 3 weeks ago • •

@simonzerafa
As I wrote (and it seems you haven't read what I wrote, which is weird, because that seems like a good first step if you're going to criticize my conduct), I'm running Ollama on a laptop that doesn't even have a GPU.

Its power consumption is comparable to, say, watching a Youtube video.

I know this because my laptop is running free software that lets me accurately monitor its activity, and because the model is also free software.

@Simon Zerafa (Status: 😊)

Shiri Bailem likes this.

in reply to Cory Doctorow

Cory Doctorow

in reply to Cory Doctorow • 3 weeks ago • •

@simonzerafa
Checking for punctuation errors is does not discourage critical thinking. It's weird to laud "critical thinking" and also make this claim.

@Simon Zerafa (Status: 😊)

Shiri Bailem likes this.

in reply to Cory Doctorow

tante

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @simonzerafa on this one for example I fully agree with Cory. This is not him having a genAI system write or anything like that.

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to tante

David Huggins-Daines

in reply to tante • 3 weeks ago • •

@pluralistic @simonzerafa I agree in principle with Cory, but I really wish that he had clarified that:

1. Ollama is not an LLM, it's a server for various models, of varying degrees of openness.
2. Open weights is not open source, the model is still a black box. We should support projects like OLMO, which are completely open, down to the training data set and checkpoints.
3. It's quite difficult to "seize that technology" without using Someone Else's Computer to do so (a.k.a clown/cloud)

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to David Huggins-Daines

David Huggins-Daines

in reply to David Huggins-Daines • 3 weeks ago • •

@pluralistic @simonzerafa But ALSO: using a multi-billion-parameter synthetic text extruding machine to find spelling and syntax errors is a blatant example of "doing everything the least efficient way possible" and that's why we are living on an overheating planet buried under toxic e-waste.

If I think about it harder I could probably come up with a more clever metaphor than killing a mosquito with a flamethrower, but you get the idea.

@Cory Doctorow @Simon Zerafa (Status: 😊)

This entry was edited (3 weeks ago)

in reply to David Huggins-Daines

Cory Doctorow

in reply to David Huggins-Daines • 3 weeks ago • •

@dhd6 @simonzerafa

No. It's like killing a mosquito with a bug zapper whose history includes thousands of years of metallurgy, hundreds of years of electrical engineering, and decades of plastics manufacture.

There is literally no contemporary manufactured good that doesn't sit atop a vast mountain of extraneous (to that purpose) labor, energy expenditure and capital.

@Simon Zerafa (Status: 😊) @David Huggins-Daines

in reply to Cory Doctorow

David Huggins-Daines

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @simonzerafa As always, yes and no. A bug zapper is designed to zap bugs, it is a simple mechanism that does that one thing, and does it well. An LLM is designed to read text and generate more text.

That we have decided that the best way to do NLP is to use massively overparameterized word predictors that we have trained using RL to respond to prompts, rather than just, like, doing NLP, is just crazy from an engineering standpoint.

Rube Goldberg is spinning in his grave!

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to David Huggins-Daines

Cory Doctorow

in reply to David Huggins-Daines • 3 weeks ago • •

@dhd6 @simonzerafa

Remember when Usenet's backbone cabal worried about someone in Congress discovering that the giant, packet-switched research network that had been constructed at enormous public expense was being used for idle chit chat?

The nature of general purpose technologies is that they will be used for lots of purposes.

@Simon Zerafa (Status: 😊) @David Huggins-Daines

in reply to Cory Doctorow

David Huggins-Daines

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @simonzerafa indeed, I guess the question is whether the scale of the *ahem* waste, fraud and abuse *ahem* of resources that LLMs seem to imply, even in benign use cases like yours, is out of line with historical precedent or not.

Am I an old man yelling at a cloud?

No, it's the children who are wrong!

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to David Huggins-Daines

Cory Doctorow

in reply to David Huggins-Daines • 3 weeks ago • •

@dhd6 @simonzerafa

Rockets were literally perfected in Nazi slave labor camps.

@Simon Zerafa (Status: 😊) @David Huggins-Daines

in reply to Cory Doctorow

elle

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @dhd6 @simonzerafa what a shit take dude. rockets being perfected by nazis, project paperclip, and now a neonazi in charge of one of the largest space tech programs on the planet, along with a bullshit generating LLM.

so yeah, maybe this is all fash tech, and maybe taking a stand of "I'm not touching that shit with a thousand-meter pole" is not "neoliberal purity culture". and ollama of all things? the shit pumped out by fucking Meta? are you shitting me?

@Cory Doctorow @Simon Zerafa (Status: 😊) @David Huggins-Daines

in reply to elle

Cory Doctorow

in reply to elle • 3 weeks ago • •

@elle @dhd6 @simonzerafa

"You used the wrong open model because I don't like the company that made it" is the actual definition of nonsense purity culture.

@Simon Zerafa (Status: 😊) @elle @David Huggins-Daines

in reply to Cory Doctorow

elle

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @dhd6 @simonzerafa you wrote a book on how much of a shitbag company corpos like Meta are. now you're saying "oh it's not that bad, look it's marginally better than Google Docs spell checker"?! did someone hack your fucking account?

there are legitimately open models that originate from academic institutions, train on open data with full consent. even those models take tens-of-thousands of euros to train. well outside the resources available to most open-source enjoyers

@Cory Doctorow @Simon Zerafa (Status: 😊) @David Huggins-Daines

in reply to Cory Doctorow

Papageier

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @elle @dhd6 @simonzerafa I beg to differ. Demand is a powerful and legit tool of people responding to corporate behavior. Choosing a different product because you dislike a maker's conduct is nought but the invisible hand of the market slapping that maker for their conduct. Provenience does matter.

Smearing choice of provenience as wilful purism would be the perfect argument for any company to disregard social or ethical standards, constituting a right to demand for the 'best' offer. Which would take us into the field of classic objectivism, and be in itself as willful and naive as the purism it accuses consumers of.

@Cory Doctorow @Simon Zerafa (Status: 😊) @elle @David Huggins-Daines

in reply to Cory Doctorow

Jared White (ResistanceNet ✊)

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @dhd6 @simonzerafa Good grief, these ad hoc rationalizations are absurd and you know it.

FYI, rockets are enormously environmentally destructive (fuel, pollution, noise, etc.). The planet would be better off with as few rockets launching as possible.

Saying an LLM is OK because some completely other "good" technology was invented by evil people is a *non argument*.

@Cory Doctorow @Simon Zerafa (Status: 😊) @David Huggins-Daines

in reply to Jared White (ResistanceNet ✊)

Cory Doctorow

in reply to Jared White (ResistanceNet ✊) • 3 weeks ago • •

@jaredwhite @dhd6 @simonzerafa You're right, that would be a silly thing to say.

Good thing I didn't say it.

@Simon Zerafa (Status: 😊) @Jared White (ResistanceNet ✊) @David Huggins-Daines

in reply to Cory Doctorow

Drew Crecente (they)

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @dhd6 @simonzerafa

The patent system is designed around an acknowledgement that we invent while "standing on the shoulders" of those who have gone before -- this premise is built into the patent system; policy decisions support this approach.

Patent grants exclusive rights to the inventor for a limited time. In exchange, the.inventor must disclose how a person versed in the arts can replicate that invention.

The same is not true of literature / copyright. Different animal, different approach.

@Cory Doctorow @Simon Zerafa (Status: 😊) @David Huggins-Daines

in reply to Drew Crecente (they)

Radio Free Trumpistan

in reply to Drew Crecente (they) • 3 weeks ago • •

This is true, and ever since the landmark case of Apple vs Microsoft re: Windows, the birth of "intellectual property" was manifested by a Microsoft that didn't want other companies screwing it the way they screwed Apple. Patents have a limited time of protection, requires publication--and intellectual property protection is forever, keeping secrets.

This entry was edited (3 weeks ago)

in reply to Cory Doctorow

Ray McCarthy

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @simonzerafa
But Google Docs anything is rubbish.

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to Ray McCarthy

Cory Doctorow

in reply to Ray McCarthy • 3 weeks ago • •

@raymaccarthy @simonzerafa
I see. And do you have moral opinions about whether people should use Google Docs? Do you seek out strangers to tell them that it's dangerous to use Google Docs?

@Simon Zerafa (Status: 😊) @Ray McCarthy

Shiri Bailem likes this.

in reply to Cory Doctorow

Kid Mania

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @simonzerafa
"What is the incremental environmental damage created by running an existing LLM locally on your own laptop?"

I dunno. But how about a couple of million people?

The person who coins the term 'enshittification' defends LLM. Just...wow. We truly are fucked.

Let's all do what Cory does!
☠️
Meanwhile:
technologyreview.com/2025/05/2…
#doomed #ClimateChange

#climatechange #doomed @Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to Kid Mania

Cory Doctorow

in reply to Kid Mania • 3 weeks ago • •

@clintruin @simonzerafa
Which "couple million people" suffer harm when I run a model on my laptop?

@Simon Zerafa (Status: 😊) @Kid Mania

in reply to Cory Doctorow

Kid Mania

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @simonzerafa
Missed the point, sir.

When one person does it...no big deal.

When a couple of million people do it...well, see the MIT article above.

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to Kid Mania

Kid Mania

in reply to Kid Mania • 3 weeks ago • •

@pluralistic @simonzerafa
Subhead quote from the article:
"The emissions from individual AI text, image, and video queries seem small—until you add up what the industry isn’t tracking and consider where it’s heading next."

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to Kid Mania

Cory Doctorow

in reply to Kid Mania • 3 weeks ago • •

@clintruin @simonzerafa
You are laboring under a misapprehension.

I will reiterate my question, with all caps for emphasis.

Which "couple million people" suffer harm when I run a model ON MY LAPTOP?

@Simon Zerafa (Status: 😊) @Kid Mania

in reply to Cory Doctorow

Kid Mania

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @simonzerafa
I'll reiterate my response.

When you *alone* do it...no big deal.
When a couple of million do it ON THEIR OWN LAPTOPS...problem.

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to Kid Mania

Cory Doctorow

in reply to Kid Mania • 3 weeks ago • •

@clintruin @simonzerafa
OK, sorry, i was under the impression that I was having a discussion with someone who understands this issue.

You are completely, empirically, technically wrong.

Checking the punctuation on a document on your laptop uses less electricity than watching a Youtube video.

@Simon Zerafa (Status: 😊) @Kid Mania

in reply to Cory Doctorow

Kid Mania

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @simonzerafa
Fair enough, Cory. You're gonna do what you want regardless of my accuracy or inaccuracy anyway. And maybe I've misunderstood this. The same way many many will.

But visualize this:

"Hey...I just read Cory Doctrow uses an LLM to check his writing."
"Really?"
"Yeah, it's true."
"Cool, maybe what I've read about ChatGPT is wrong too..."

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to Kid Mania

Cory Doctorow

in reply to Kid Mania • 3 weeks ago • •

@clintruin @simonzerafa
This is an absurd argument.

"I just read about a thing that is fine, but I wasn't paying close attention, so maybe something bad is good?"

Come.

On.

@Simon Zerafa (Status: 😊) @Kid Mania

in reply to Cory Doctorow

Kid Mania

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @simonzerafa
Maybe...
Maybe not.

You have a good day.

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to Cory Doctorow

na na na na na na chauve-souris

in reply to Cory Doctorow • 3 weeks ago • •

Which "couple million people" suffer harm when I run a model ON MY LAPTOP?

Anyone who's hosting a website, and is getting hammered by the bots that seek content to train the models on. Those of us are the ones who continue getting hurt.

Whether you run it locally or not, makes little difference. The models were trained, and training very likely involved scraping, and that continues to be a problem to this day. Not because of ethical concerns, but technical ones: a constant 100req/sec 24/7, with over 2.5k req/sec waves may sound little in this day and age, but at around 2.5k req/sec (sustained for about a week!), my cheap VPS's two vCPUs are bogged down trying to deal with all the TLS handshakes, let alone serving anything.

That is a cost many seem to forget. It costs bandwidth, CPU, and human effort to keep things online under the crawler DDoS - which often will require cold, hard cash too, to survive.

Ask Codeberg or LWN how they fare under crawler load, and imagine someone who just wants to have their stuff online having to deal with similar abuse.

That is the suffering you enable when using any LLM model, even locally.

in reply to Kid Mania

Kid Mania

in reply to Kid Mania • 3 weeks ago • •

@pluralistic @simonzerafa
But hey, you do you, Cory.
I'm nobody...your Cory Doctrow.
Let's all do what Cory does...

@Cory Doctorow @Simon Zerafa (Status: 😊)

in reply to Kid Mania

Cory Doctorow

in reply to Kid Mania • 3 weeks ago • •

@clintruin @simonzerafa
Well, you could "do what Cory does" by familiarizing yourself with the conduct that you are criticizing before engaging in ad hominem.

To be fair, that's not unique to me, but people who fail to rise to that standard are doing themselves and others no good.

@Simon Zerafa (Status: 😊) @Kid Mania

in reply to Cory Doctorow

twifkak

in reply to Cory Doctorow • 3 weeks ago • •

I hate to dive into what is clearly a heated debate, but I want to add an answer to your question with a perspective that I think is missing: the power consumption for inference on your laptop is probably greater than in a datacenter. The latter is heavily incentivized to optimize power usage, since they charge by CPU usage or tokens, not watt-hours. (Power consumption != environmental damage exactly, but I have no idea how to estimate that part.)

This entry was edited (3 weeks ago)

in reply to twifkak

Cory Doctorow

in reply to twifkak • 3 weeks ago • •

@twifkak @simonzerafa
parsing a doc uses as much juice as streaming a Youtube video and less juice than performing a gnarly transform on a hi-rez in the Gimp.

I measured.

@Simon Zerafa (Status: 😊) @twifkak

in reply to twifkak

Shiri Bailem

in reply to twifkak • 3 weeks ago •

@twifkak I believe datacenter models likely use more because they're explicitly not efficient. They're running the most cutting edge equipment constantly and this equipment is the stuff built to be the fastest not the most efficient.

But to add to your point though on power consumption vs environmental damage, you're absolutely right there and on the right track. The power consumption isn't the big deal with the datacenters, it's the heat generation.

While the power consumption isn't a non-issue, it's trivial in comparison to the heat management issue which is where we get these conversations of water consumption.

Your individual device, because it's spaced out from other devices running LLMs can just air cool without issue.

The datacenter on the other hand, because they're all crammed in a tight space, has to use bigger and costlier and more impactful systems to move the heat. They can't use air cooling and have to use something like water cooling just to get the heat out of the building.

If the chips didn't have safeties and the cooling system shut down, those buildings would catch fire.

@Cory Doctorow @Simon Zerafa (Status: 😊) @tante

@Cory Doctorow @Simon Zerafa (Status: 😊) @tante @twifkak

Natalie likes this.

reshared this

in reply to Simon Zerafa (Status: 😊)

Ray McCarthy

in reply to Simon Zerafa (Status: 😊) • 3 weeks ago • •

@simonzerafa @pluralistic
At best 40% junk, but unless you are so expert you don't need it, you can't know which is plausible rubbish.
Would you play Russian Roulette every day for hours?

@Cory Doctorow @Simon Zerafa (Status: 😊)

This entry was edited (3 weeks ago)

in reply to Ray McCarthy

Cory Doctorow

in reply to Ray McCarthy • 3 weeks ago • •

@raymaccarthy @simonzerafa
Again, what does checking the punctuation on a single essay per day have to do with "play[ing] Russian Roulette every day for hours?"

@Simon Zerafa (Status: 😊) @Ray McCarthy

in reply to Cory Doctorow

Shiri Bailem

in reply to Cory Doctorow • 3 weeks ago •

@Cory Doctorow I'd be disappointed if I didn't see myself in the pattern of engaging with people on a post like this who are worlds away from having a fair discussion...

They literally can't see the reality of AI beyond their arguments, they've decided it's inherently evil and wrong and locked in their viewpoint.

So their "russian roulette every day for hours" is because, despite you saying what you use it for, they can't comprehend how it can be used outside of the worst possible use cases.

Same reason they're accusing you of being a libertarian, but that's already the purity culture you were originally calling out.

@Simon Zerafa (Status: 😊) @Ray McCarthy @tante

@Cory Doctorow @Simon Zerafa (Status: 😊) @tante @Ray McCarthy

like this

in reply to tante

FediThing

in reply to tante • 3 weeks ago • •

I really like and admire @pluralistic and have utmost respect for him, and that's why I'm totally baffled about why he is claiming "fruit of the poisoned tree" arguments as cause of LLM scepticism.

The objections to LLMs aren't about origins but about what they they are doing right now: destroying the planet, stealing labour, giving power over knowledge to LLM owners etc.

The objections are nothing to do with LLMs' origins, they're entirely about LLMs' effects in the here and now.

@Cory Doctorow

This entry was edited (3 weeks ago)

in reply to FediThing

Cory Doctorow

in reply to FediThing • 3 weeks ago • •

@FediThing
Which parts of running a model on your own laptop are implicated in "destroying the planet?" How is checking punctuation "stealing labor?" Or, for that matter "giving power over knowledge to LLM owners?"

@FediThing

Shiri Bailem likes this.

in reply to Cory Doctorow

Nelson

in reply to Cory Doctorow • 3 weeks ago • •

I think you can answer these questions yourself.

Suppose you wore a coat made out of mink fur. The minks are already dead, simply wearing the coat won't kill more minks. What does wearing mink fur have to do with cruelty to minks?

Suppose you live in the time of the Luddites. Legislation prohibits trade unions and collective bargaining. Mill owners introduce machines, reducing wages. But you build your own machine. Problem solved? You helping labor or capital?

@FediThing @tante

@tante @FediThing

This entry was edited (3 weeks ago)

in reply to Nelson

Cory Doctorow

in reply to Nelson • 3 weeks ago • •

@skyfaller @FediThing
This is a "fruit of the poisoned tree" argument.

Suppose you use a computer to post to Mastodon, despite the fact that silicon transistors were invented by the eugenicist William Shockley, who spent his Nobel money offering bribes to women of color to be sterlized?

Suppose you sent that Mastodon post on a packet-switched network, despite the fact that this technology was invented by the war criminals at the RAND corporation?

@Nelson @FediThing

Shiri Bailem likes this.

in reply to Cory Doctorow

Cory Doctorow

in reply to Cory Doctorow • 3 weeks ago • •

@skyfaller @FediThing
Also, you're wrong about the Luddites, just as a factual matter. The guilds the Luddites sprang from weren't prohibited by law, they were *protected* by law, and the Luddites' cause wasn't about gaining new protections under statute, but rather, enforcing existing statutory protections.

(Also: the Luddites didn't oppose steam looms or stocking frames; their demands were for fair deployment of these)

@Nelson @FediThing

Shiri Bailem likes this.

in reply to Cory Doctorow

Nelson

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic Thank you for the fact check. I was paraphrasing that text from the popular Nib comic: thenib.com/im-a-luddite/

If this contains factual inaccuracies I will need to do more research and perhaps stop sharing that comic.

@FediThing @tante

@Cory Doctorow @tante @FediThing

Shiri Bailem likes this.

in reply to Nelson

Cory Doctorow

in reply to Nelson • 3 weeks ago • •

@skyfaller @FediThing I strongly recommend Brian Merchant's "Blood in the Machine" as the best modern history of the Luddites.

@Nelson @FediThing

Shiri Bailem likes this.

in reply to Cory Doctorow

Nelson

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic I don't think mink fur or LLMs are comparable to criticizing the origins of the internet or transistors. It's the process that produced mink fur and LLMs that is destructive, not merely that it's made by bad people.

For example, LLM crawlers regularly take down independent websites like Codeberg, DDoSing, threatening the small web. You may say "but my LLM is frozen in time, it's not part of that scraping now", but it would not remain useful without updates.

@FediThing @tante

@Cory Doctorow @tante @FediThing

in reply to Nelson

Cory Doctorow

in reply to Nelson • 3 weeks ago • •

@skyfaller @FediThing
No. Literally the same LLM that currently finds punctuation errors will continue to do so. I'm not inventing novel forms of punctuation error that I need an updated LLM to discover.

@Nelson @FediThing

Shiri Bailem likes this.

in reply to Cory Doctorow

Nelson

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic Ok, fair enough, if spell checking is literally the only thing you use LLMs for.

I still think you wouldn't rely on a 1950s dictionary for checking modern language, and language moves faster on the internet, but I'm willing to concede that point.

I still think a deterministic spell checker could have done the job and not put you in this weird position of defending a technology with wide-reaching negative effects. But I guess your post was for just that purpose.

@FediThing @tante

@Cory Doctorow @tante @FediThing

in reply to Nelson

Cory Doctorow

in reply to Nelson • 3 weeks ago • •

@skyfaller @FediThing
I'm not using it for spell checking.

Did you read the article that is under discussion?

@Nelson @FediThing

in reply to Cory Doctorow

Nelson

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic I apologize, I did in fact read the relevant section of your post, and I was using spell-checking as shorthand for all typo checking, because deterministic grammar checkers have also existed for some time, although not as long as spell checkers and perhaps they have not been as reliable. I understand that LLMs can catch some typos that deterministic solutions may not.

I just think we should put more effort into improving deterministic tools instead of giving up.

@FediThing @tante

@Cory Doctorow @tante @FediThing

in reply to Nelson

Cory Doctorow

in reply to Nelson • 3 weeks ago • •

@skyfaller @FediThing Thanks.

@Nelson @FediThing

in reply to Nelson

Shiri Bailem

in reply to Nelson • 3 weeks ago •

@Nelson Funny thing there... a frozen in time LLM doesn't really lose that much functionality. Most good uses of LLMs don't rely on timely knowledge.

For instance @Cory Doctorow 's use case is checking punctuation and grammar. So an LLM only loses functionality there at the rate grammar fundamentally changes... which is glacially.

Also, not all local LLMs are crawler based. For instance when training on wikipedia data to have more recent and accurate knowledge, they offer a bittorrent download of the whole site contents.

The ones creating problems with crawlers are the ones I'm certain Cory will agree are a problem, the big companies that are competing for investors by constantly throwing more and more data at their model in the drive for increasingly small improvements as the only way they have to compete for investors.

@tante @FediThing

@Cory Doctorow @Nelson @tante @FediThing

Natalie likes this.

in reply to Nelson

Correl Roush

in reply to Nelson • 3 weeks ago • •

This is precisely it; it's about the process, not their distance from Altman, Amodei, et al. (which the Ollama project and those like it achieve).

The LLM models themselves are, per this analogy, still almost entirely of the mink-corpse variety, and I think it's a stretch to scream "purity!" at everyone giving you the stink eye for the coat you're wearing.

It's not impossible to have and use a model, locally hosted and energy-efficient, that wasn't directly birthed by mass theft and human abuse (or training directly off of models that were). And having models that aren't, that are genuinely open, is great! That's how the wickedness gets purged and the underlying tech gets liberated.

Maybe your coat is indeed synthetic, that much is still unclear, because so far all the arguing seems to be focused on the store you got it from and the monsters that operate the worst outlets.

in reply to Correl Roush

Cory Doctorow

in reply to Correl Roush • 3 weeks ago • •

@correl @skyfaller @FediThing
More fruit of the poisoned tree.

"This isn't bad, but it has bad things in its origin. The things I use *also* have bad things in their origin, but that's OK, because those bad things are different because [reasons]."

This is the inevitable, pointless dead-end of purity culture.

@Nelson @FediThing @Correl Roush

in reply to Cory Doctorow

Nelson

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic This seems like whataboutism. Valid criticisms can come from people who don't behave perfectly, because otherwise no one would be able to criticize anything. Similarly, we can criticize society while participating in it.

The point I'd like to make (that doesn't seem to be landing) is that LLMs aren't just made by bad people, but are also made through harmful processes. Harm dealt mostly during creation can be better than continuing harm, but still harmful.
@correl @FediThing @tante

@Cory Doctorow @tante @FediThing @Correl Roush

ಚಿರಾಗ್ 🌹✊🏾Ⓥ🌱🇵🇸 (he/him) reshared this.

in reply to Nelson

Nelson

in reply to Nelson • 3 weeks ago • •

@pluralistic @correl @FediThing In the climate crisis we are often concerned about "embodied emissions", things made with fossil fuels that may not use fossil fuels once they're created. If we don't change our fossil fuel using production systems, those embodied emissions could be enough to kill us.

I'd say that the literal and figurative embodied emissions of even local LLMs are sufficient to make them problematic to use. Individuals avoiding them is insufficient but necessary.

@Cory Doctorow @FediThing @Correl Roush

in reply to Nelson

Cory Doctorow

in reply to Nelson • 3 weeks ago • •

@skyfaller @correl @FediThing
That is completely backwards.

The entire point of measuring embodied emissions is to *make use of things that embody emissions*.

We improve old, energy inefficient buildings *because they represent embodied emissions* rather than building new, more efficient buildings because the *net* emissions of building a new, better building exceed the emissions associated with a remediated, older building.

@Nelson @FediThing @Correl Roush

in reply to Cory Doctorow

Nelson

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic You're missing my point. Old houses should be used, but if new houses are built using fossil fuels, then we can cook ourselves by building them even if new buildings are fully electrified.

It feels like you're ignoring the context where LLMs are still being created. It's ethically different to use something made by slaves if slavery is not in the past. If you golf on a golf course maintained by prison labor yesterday, it matters that prisoners will clean it again tomorrow.

@correl

@Cory Doctorow @Correl Roush

in reply to Nelson

Cory Doctorow

in reply to Nelson • 3 weeks ago • •

@skyfaller @correl

I'm not ignoring that context, it is *entirely irrelevant*, because I am *not* using some prospective, as-yet-to-be-trained LLM to check punctuation on my laptop. I am using an *actual, existing* LLM.

So if your argument is, "If you did something that's not the thing you've done, that would be bad," my response is, "Perhaps that's true, but I have no idea why you would seek to a stranger to discuss that subject."

@Nelson @Correl Roush

in reply to Nelson

Cory Doctorow

in reply to Nelson • 3 weeks ago • •

@skyfaller @correl @FediThing
Yes, that is just more fruit of the poisoned tree.

This thing harmed people in its creation, therefore the thing is bad, as are all things derived from it.

However, the things *I* use don't count, because the bad things in their history are different because [insert incoherent rationalization].

@Nelson @FediThing @Correl Roush

in reply to Cory Doctorow

Correl Roush

in reply to Cory Doctorow • 3 weeks ago • •

While I can understand your argument and almost certain exhaustion at hollow criticism, that response feels very dismissive of the points being made against your application of that argument.

I'm not sure how fruitful of an argument can be had with regard to what you may or not be using, as you really haven't clarified that anyhow besides locally hosted software that could be used to run terrible models, so this whole mess is just an endless back and forth of "You seem to be dodging the nature of the evil you may be accepting" vs "You're over-concerned with purity", and I think that's justifiably leaving a bad taste in everyone's mouth.

in reply to Correl Roush

Cory Doctorow

in reply to Correl Roush • 3 weeks ago • •

@correl @skyfaller @FediThing
> as you really haven't clarified that anyhow

I'm sorry, this is entirely wrong.

The fact that you didn't bother to read the source materials associated with this debate in no way obviates their existence.

I set out the specific use-case under discussion in a single paragraph in an open access document. There is no clearer way it could have been stated.

@Nelson @FediThing @Correl Roush

in reply to Cory Doctorow

Radio Free Trumpistan

in reply to Cory Doctorow • 3 weeks ago • •

Chiming in to state that it's routine to re-state and re-re-state principles that get lost in long reads and long threads such as this one, where any late-comer needs to skim because of the tl;dr factor. There's a long standing principle based on this phenomenon: tell them in short what you're saying, explain what you said, then tell them in summary what you said.

in reply to Radio Free Trumpistan

Cory Doctorow

in reply to Radio Free Trumpistan • 3 weeks ago • •

@claralistensprechen3rd @skyfaller @FediThing @correl

I don't know what this has to do with someone stating "you haven't clarified" something, when you have.

Also, I have reposted the paragraph in question TWICE this morning.

@Nelson @Radio Free Trumpistan @FediThing @Correl Roush

in reply to Cory Doctorow

Correl Roush

in reply to Cory Doctorow • 3 weeks ago • •

Again, this feels dismissive, and dodges the argument. The clarity I was referring to wasn't the use case you laid out (automated proofreading) or the platform (Ollama), but (as has been discussed at length through this thread of conversation) which models are being employed.

This entire conversation has been centered around how currently available models not evil due to vague notions of who incepted the technology they're based upon, but the active harm employed in their creation.

To return to the discussion I'm attempting to have here, I find your fruits of the poisoned tree argument weak, particularly when you're invoking William Shockley (who is most assuredly had no direct hand in the transistors installed in the hardware on my desk nor their component materials) as a counterpoint to the stolen work and egregious cost that are intrinsic to even the toy models out there. It reads to me as employing hyperbole and false equivalence defensively rather than focusing on why what you're comfortable using is, well, comfortable.

in reply to Correl Roush

Cory Doctorow

in reply to Correl Roush • 3 weeks ago • •

@correl @skyfaller @FediThing
Scraping work is categorically not "stealing."

@Nelson @FediThing @Correl Roush

in reply to Nelson

Shiri Bailem

in reply to Nelson • 3 weeks ago •

@Nelson I think you should be able to answer these questions yourself, but clearly are struggling...

On your mink fur argument: the one ethical way to wear something like that is to only purchase used and old. The harm is done regardless of whether you purchase, you don't increase demand because your refusal to purchase new or recent means there's no profit in it. (This argument is also flawed because it's assuming local LLMs are made for profit when no profit is made on them)

And on your Luddite argument: When someone is using a machine to further oppress workers, the issue is not the machine but the person using it. You attack the machine to deprive them of it. But when an individual is using a completely separate instance of the machine, contributing nothing to those who are using the machine to abuse people... attacking them is simply attacking the worker.

@tante @FediThing @Cory Doctorow

@Cory Doctorow @Nelson @tante @FediThing

in reply to Shiri Bailem

Nelson

in reply to Shiri Bailem • 3 weeks ago • •

@shiri An used mink coat may not give money directly to mink farmers/killers, but wearing mink fur sends a message about the acceptability of mink. The average passerby can't tell if the mink was bought new. If you walk down the street and there are 10 new mink wearers, the 11th "ethical" mink wearer lends themselves to the message that mink farming is fine, unless they are constantly screaming "this is used mink!" which is strange and obnoxious.

@Shiri Bailem

in reply to Nelson

Shiri Bailem

in reply to Nelson • 3 weeks ago •

@Nelson that is a better argument and I'll definitely accept that.

I think for many of us, myself included, the big thing with AI there is the investment bubble. Users aren't making that much difference on the bubble, the people propping up the bubble are the same people creating the problems.

I know I harp on people about anti-AI rage myself, but I specifically harp on people who are overbroad in that rage. So many people dismiss that there are valid use cases for AI in the first place, they demonize people who are using it to improve their lives... people who can be encouraged now to move on to more ethical platforms, and when the bubble bursts will move anyways.

We honestly don't need public pressure to end the biggest abuses of AI, because it's not public interest that's fueling them... it's investor's believing AI techbros. Eventually they're going to wise up and realize there's literally zero return on their investment and we're going to have a truly terrifying economic crash.

It's a lot like the dot-com bubble... but drastically worse.

@Nelson

Natalie likes this.

in reply to Shiri Bailem

Shiri Bailem

in reply to Shiri Bailem • 3 weeks ago •

@Nelson Added detail: much of the perceived popularity of AI is propped up and manufactured.

We're all aware how we're being force fed AI tools left and right... and the presence of those tools is much of what the perceived popularity comes from.

Like Google force feeding AI results in it's search then touting people actively using and engaging with it's AI.

There's a great post I saw, that sadly I can't easily find, that highlights the cycle where business leaders tout that they'll integrate AI to make things look good to the shareholders. They then roll out AI, and when people don't use it they start forcing people to use it. They then turn around and report to the shareholders that people are using the AI and they're going to integrate even more AI!

Once the bubble pops, we stop getting force fed AI and it starts scaling back to places where people actually want to use it and it actually works.

@Nelson

Natalie likes this.

in reply to Cory Doctorow

FediThing

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic
(Hello Mr Doctorow! Just want to make clear I admire you a great deal and this isn't intended as an attack on you!)

Running a local LLM with no connection to outside providers might be a way of avoiding bad stuff, but I am not clear on how this relates to discussing origins of technologies?

It seems like there's ambiguity in your post about whether it applies just to people with homelabs wondering if they should try offline LLMs, or whether you are discussing LLMs as a general technology?

Almost everyone using LLMs will use the online kind, so objections to LLMs are (reasonably IMHO) based on that scenario.

@Cory Doctorow

This entry was edited (3 weeks ago)

in reply to FediThing

Cory Doctorow

in reply to FediThing • 3 weeks ago • •

@FediThing
> I am not clear on how this connects to discussing origins of technologies

Because the arguments against running an LLM on your own computer boil down to, "The LLM was made by bad people, or in bad ways."

This is a purity culture standard, a "fruit of the poisoned tree" argument, and while it is often dressed up in objectivity ("I don't use the fruit of the poisoned tree"), it is just special pleading ("the fruits of the poisoned tree that I use don't count, because __").

@FediThing

Shiri Bailem likes this.

in reply to Cory Doctorow

Cory Doctorow

in reply to Cory Doctorow • 3 weeks ago • •

@FediThing
> Almost everyone using LLMs will use the online kind, so objections to LLMs are (reasonably IMHO) based on that scenario.

Except that in this specific instance, you are weighing on an article that claims that it is wrong to run a local LLM for the purposes of checking for punctuation errors.

@FediThing

Shiri Bailem likes this.

in reply to Cory Doctorow

FediThing

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic
Thank you for the responses 🙏

"Because the arguments against running an LLM on your own computer"

...ahhh okay. So was this post aimed more at a very narrow homelab kind of audience?

It's just, as a reader, the article's emphasis on examples of tech origins imply it's trying to defend LLMs in general? This probably is my ignorance as a reader, but it's how it came across to me, and led to bafflement.

@Cory Doctorow

This entry was edited (3 weeks ago)

in reply to FediThing

Cory Doctorow

in reply to FediThing • 3 weeks ago • •

@FediThing This is the use-case that is under discussion.

pluralistic.net/2026/02/19/now…

There is one technology that has made my POSSE life better, and it might surprise you. This year, I installed Ollama – an open-source LLM – on my laptop. It runs pretty well, even without a GPU. Every day, before I run Loren's python publication scripts, I run the text through Ollama as a typo-catcher (my prompt is "find typos"). Ollama always spots three or four of these, usually stuff like missing punctuation, or forgotten words, or double words ("the the next thing") or typos that are still valid words ("of top of everything else").

Pluralistic: Six Years of Pluralistic (19 Feb 2026) – Pluralistic: Daily links from Cory Doctorow

^{pluralistic.net}

@FediThing

Shiri Bailem likes this.

in reply to Cory Doctorow

FediThing

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic
Thanks. Can totally see how that makes sense at a technical level for people who run their own offline services.

I think it's the ambiguity that is driving the discourse over this post. People are taking the "refusing to use a technology" section as a defence of LLMs in general?

If the angle was caging LLMs or something like that it, might make it clearer that you aren't endorsing the most common form of LLM?

Anyway, it's your call on this as author, just wanted to feed back on this because your writing matters and I hope feedback is helpful to it.

@Cory Doctorow

in reply to FediThing

Cory Doctorow

in reply to FediThing • 3 weeks ago • •

@FediThing Thank you.

@FediThing

in reply to FediThing

Shiri Bailem

in reply to FediThing • 3 weeks ago •

@FediThing I think the problem in discourse is the overwhelming amount of people experience anti-AI rage.

In the topic of LLMs, the two loudest groups by a wide margin are:
1. People who refuse to see any nuance or detail in the topic, who can not be appeased by anything other than the complete and total end of all machine learning technologies
2. AI tech bros who think they're only moments away from awakening their own personal machine god

I like to think I'm in the same camp as @Cory Doctorow , that there's plenty of valid use for the technology and the problems aren't intrinsic to the technology but purely in how it's abused.

But when those two groups dominate the discussions, it means that people can't even conceive that we might be talking about something slightly different than what they're thinking.

Cory in the beginning explicitly said they were using a local offline LLM to check their punctuation... and all of this hate you see right here erupted. If you read through the other comment threads, people are barely even reading his responses before lumping more hate on him.

And if someone as great with language as Cory can't put it in a way that won't get this response... I think that says alot.

@tante

@Cory Doctorow @tante @FediThing

like this

Daniel Lakeland reshared this.

in reply to Shiri Bailem

FediThing

in reply to Shiri Bailem • 3 weeks ago • •

@shiri
(Untagged Cory as I'm sure he is getting a lot of replies and I don't want to repeat myself at him.)

I don't think it's the first part that caused problems but the later parts, as they didn't explicitly mention offline LLMs and it was possible to read the later text as referring to all LLMs.

@Shiri Bailem

in reply to FediThing

Shiri Bailem

in reply to FediThing • 3 weeks ago •

@FediThing The link in question where he talked about it, and did explicitly say it, though he didn't use the "offline" label specifically he basically described it as such. (The label itself is not purely self explanatory, so wouldn't have helped much)

Here's the article link: pluralistic.net/2026/02/19/now…

On friendica the thumbnail of the page is what I've attached here, incidentally the key paragraph in question.

Screenshot of text reading: There is one technology that has made my POSSE life better, and it might surprise you. This year, I installed Ollama – an open-source LLM – on my laptop. It runs pretty well, even without a GPU. Every day, before I run Loren's python publication scripts, I run the text through Ollama as a typo-catcher (my prompt is "find typos"). Ollama always spots three or four of these, usually stuff like missing punctuation, or forgotten words, or double words ("the the next thing") or typos that are still valid words ("of top of everything else").

@tante

@tante @FediThing

R.L. LE likes this.

in reply to Shiri Bailem

FediThing

in reply to Shiri Bailem • 3 weeks ago • •

Yup, that's the start, but then the text goes onto a discussion of very broad technologies and refusal to use them, which is where the ambiguity sort of creeps in. It isn't clear in the later sections if it's referring to LLMs in general, or just the very specific niche of offline LLMs.

I'm not posting this to attack Cory but to give feedback as a reader. I (incorrectly) took him to be talking about LLMs in general in the later section of the post, and it's possible other people are interpreting the later sections in the same way.

This entry was edited (3 weeks ago)

in reply to Shiri Bailem

prince lucija

in reply to Shiri Bailem • 3 weeks ago • •

fully agree!

@pluralistic @tante @FediThing

@Cory Doctorow @tante @FediThing

in reply to FediThing

prince lucija

in reply to FediThing • 3 weeks ago • •

i feel in the similar way as big tech has taken the notion of AI and LLMs as a cue/excuse to mount a global campaign of public manipulation and massive investments into a speculative project and pumps gazillions$ into it and convinces everyone it's innevitable tech to be put in bag of potato chips, the backlash is then that anything that bears the name of AI and LLM is poisonous plague and people are unfollowing anyone who's touched it in any way or talks about it in any other way than "it's fascist tech, i'm putting a filter in my feed!" (while it IS fascist tech because it's in hands of fascists).

in my view the problem seems not what LLMs are (what kind of tech), but how they are used and what they extract from planet when they are used by the big tech in this monstrous harmful way. of course there's a big blurred line and tech can't be separated from the political, but... AI is not intelligent (Big Tech wants you to believe that), and LLMs are not capable of intelligence and learning (Big Tech wants you to believe that).

so i feel like a big chunk of anger and hate should really be directed at techno oligarchs and only partially and much more critically at actual algorithms in play. it's not LLMs that are harming the planet, but rather the extraction, these companies who are absolute evil and are doing whatever the hell they want, unchecked, unregulated.

or as varoufakis said to tim nguyen: "we don't want to get rid of your tech or company (google). we want to socialize your company in order to use it more productively" and, if i may add, safely and beneficialy for everyone not just a few.

in reply to prince lucija

bazkie 👩🏼‍💻 bitplanes 🎵

in reply to prince lucija • 3 weeks ago • •

@prinlu @FediThing @pluralistic I agree with most things said in this thread, but on a very practical level, I'm curious what training data was used for the model used by @pluralistic 's typo-checking ollama?

for me, that training data is key here. was it consensually allowed for use in training?

because as I understand, LLMs need vast amounts of training data, and I'm just not sure how you would get access to such data consensually. would love to be enlightened about this :)

@Cory Doctorow @FediThing @prince lucija

in reply to bazkie 👩🏼‍💻 bitplanes 🎵

Cory Doctorow

in reply to bazkie 👩🏼‍💻 bitplanes 🎵 • 3 weeks ago • •

@bazkie @prinlu @FediThing
I do not accept the premise that scraping for training data is unethical (leaving aside questions of overloading others' servers).

This is how every search engine works. It's how computational linguistics works. It's how the Internet Archive works.

Making transient copies of other peoples' work to perform mathematical analysis on them isn't just acceptable, it's an unalloyed good and should be encouraged:

pluralistic.net/2023/09/17/how…

How To Think About Scraping – Pluralistic: Daily links from Cory Doctorow

^{pluralistic.net}

@FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

bazkie 👩🏼‍💻 bitplanes 🎵

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @prinlu @FediThing I think the difference to search engines is how LLM reproduces the training data..

as a thought experiment; what if I'd scrape all your blogposts, then start a blog that makes Cory Doctorow styled blogposts, which would end up more popular than your OG blog since I throw billions in marketing money at it.

would you find that ethical? would you find it acceptable?

further thought experiment; lets say you lose most of your income as a result and have to stop making blogs and start flipping burgers at mcDonalds.

your blog would stop existing, and so, my copycat blog would, too - or at least, it would stop bringing novel blogposts.

this kind of effect is real and will very much hinder cultural development, if not grind it to a halt.

that is a problem - this is culturally unsustainable.

@Cory Doctorow @FediThing @prince lucija

in reply to bazkie 👩🏼‍💻 bitplanes 🎵

Cory Doctorow

in reply to bazkie 👩🏼‍💻 bitplanes 🎵 • 3 weeks ago • •

First: checking for punctuation errors and other typos *in my own work* in a model running on *my own laptop* has nothing - not one single, solitary thing - in common with your example.

Nothing.

Literally, nothing.

But second: I literally license my work for commercial republication and it is widely republished in commercial outlets without any payment or notice to me.

This entry was edited (3 weeks ago)

in reply to Cory Doctorow

bazkie 👩🏼‍💻 bitplanes 🎵

in reply to Cory Doctorow • 3 weeks ago • •

but then you consented to that, right? you are in control of that.

also my example IS similar - after all, it's data scraped without consent, used to create another work. the typo-checker changes your blogpost based on my training data, in the same way my copycat blog changes 'my' works based on your training data.

sure, it's on a way different scale - deliberately, to more clearly show the principle - but it's the same thing.

This entry was edited (3 weeks ago)

in reply to bazkie 👩🏼‍💻 bitplanes 🎵

Cory Doctorow

in reply to bazkie 👩🏼‍💻 bitplanes 🎵 • 3 weeks ago • •

@bazkie

Should we ban the OED?

There is literally no way to study language itself without acquiring vast corpora of existing language, and no one in the history of scholarship has ever obtained permission to construct such a corpus.

@bazkie 👩🏼‍💻 bitplanes 🎵

in reply to Cory Doctorow

bazkie 👩🏼‍💻 bitplanes 🎵

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic I gave it a good thought, and you know what, I'm gonna argue that yes, for me there is a degree of unethical-ness to that lack of permission!

the things that makes me not mind that so much are a variety of differences in method and scale;

(*btw just explaining my personal reasons here, not arguing yours)

- every word in the OED was painstakingly researched by human experts to make the most possible sense of it

- coming from a place of passion on the end of the linguists, no doubt

- the ownership of said data isn't "techno-feudal mega-corporations existing under a fascist regime"

- the OED didn't spell the end of human culture (heh) like LLMs very much might.

so yeah. I guess we do agree that, on some level, the OED and an LLM have something in similar.

it's the differences in method and scale that make me draw the line somewhere in between them; in a different spot from where you may draw it.

and like @zenkat mentioned elsewhere, it's the whole thing around LLMs that makes me very wary of normalizing anything to do with it, and I concede I wouldn't mind your slightly unethical LLM spellchecker as much, if we didn't live in this horrible context. :)

I guess this has become a bit of a reconciliatory toot. agree to disagree on where we draw the line, to each their own, and all that.

@Cory Doctorow @zenkat

in reply to Cory Doctorow

David in Tokyo

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @bazkie

Dictionaries reference the sources they use for examples in the entries themselves.

LLMs lose the references at training time.

You've got this dead wrong.

@Cory Doctorow @bazkie 👩🏼‍💻 bitplanes 🎵

in reply to Cory Doctorow

FediThing

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @bazkie @prinlu
This would be my take:

Search engines direct people to the work they index. They reward labour by directing people towards it.

Scraping without consent for training data lets people reproduce the work without crediting or rewarding the people who actually did the labour. That seems like labour theft?

If it is labour theft, then it isn't sustainable and that's part of why LLMs are so questionable as a technology.

@Cory Doctorow @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

This entry was edited (3 weeks ago)

in reply to FediThing

Cory Doctorow

in reply to FediThing • 3 weeks ago • •

@FediThing @bazkie @prinlu
There are tons of private search engines, indices, and analysis projects that don't direct text to other works.

I could scrape the web for a compilation of "websites no one should visit, ever." That's not "labor theft."

@FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

FediThing

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @bazkie @prinlu
Indexing works is a totally different thing to creating knock-offs of works, surely?

What Miyazaki said about AI knock-offs surely illustrates the difference?

@Cory Doctorow @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to FediThing

Cory Doctorow

in reply to FediThing • 3 weeks ago • •

@FediThing @bazkie @prinlu
No one is defending "creating knock offs of works." Why would you raise it here? Who has suggested that this is a good way to use LLMs or a good outcome from scraping?

@FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

Cory Doctorow

in reply to Cory Doctorow • 3 weeks ago • •

@FediThing @bazkie @prinlu
The argument was literally, "It's not OK to check the punctuation in *your own work* if the punctuation checker was created by examining other peoples' work, because performing mathematical analysis on other peoples' work is *per se* unethical."

@FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

Cory Doctorow

in reply to Cory Doctorow • 3 weeks ago • •

@FediThing @bazkie @prinlu
By this standard the OED is unethical.

@FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

bazkie 👩🏼‍💻 bitplanes 🎵

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @FediThing @prinlu I'd say "because performing [automated, mass scale] mathematical analysis on other peoples' work [without their consent] [with the goal of augmenting one's own work] is *per se* unethical" - and in that case, it's a statement I would agree with.

@Cory Doctorow @FediThing @prince lucija

in reply to bazkie 👩🏼‍💻 bitplanes 🎵

Cory Doctorow

in reply to bazkie 👩🏼‍💻 bitplanes 🎵 • 3 weeks ago • •

@bazkie @FediThing @prinlu
You've literally just made the case against:

* Dictionaries
* Encyclopedias
* Bibliographies

And also the entire field of computational linguistics.

If that's your position, fine, we have nothing more to say to one another because I think that's a very, very bad position.

@FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

bazkie 👩🏼‍💻 bitplanes 🎵

in reply to Cory Doctorow • 3 weeks ago • •

I did not make that case, if you'd properly read my [additions] to the statement.

making dictionaries etc isn't automated on mass scales like feeding training data to LLMs is.

it's a very human job that involves a lot of expertise and takes a lot of time.

This entry was edited (3 weeks ago)

in reply to Cory Doctorow

zenkat

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @bazkie @FediThing @prinlu I think part of the issue here is that GenAI is being pushed so hard and fast *everywhere* that's it's hard to be nuanced about what narrow use-cases might be acceptable or not.

We're living under a massive pro-LLM propaganda campaign. They have already set the terms of the debate with a maximalist position. It's no surprise that the backlash is similarly absolute.

@Cory Doctorow @FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

Joris Meys

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic
No, because dictionnaries are about language which is a shared common, encyclopedias are about knowledge, which is a shared common, and bibliohraphies are a list of works, not a derivative.

Knowledge, language and a list of works cannot be copyrighted. You can use language, knowledge, words from the dictionary. You can quote an encyclopedia when refering to the source. None of that is even relevant to this discussion.

@bazkie @FediThing @prinlu @tante

@Cory Doctorow @tante @FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

Zitrone

in reply to Cory Doctorow • 2 weeks ago • •

@pluralistic @bazkie @FediThing @prinlu encyclopedias don't say: look i made a painting, wanna buy?
but all that is kinda offtopic, imo
there are plenty of reasons to not use LLM, besides that commons/copyright stuff, wich are not purist and very much based on real-world-issues. in a perfect world, theft won't be an issue (imo), because we had overcome money and fossil energy. but using genLLM still would be morally wrong, because of its dangers, due to bias and failures. …/

@Cory Doctorow @FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

Joris Meys

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic
The argument was "without the consent of the creators of said works." And you know that.

Don't be just another debate bro. Please.

@FediThing @bazkie @prinlu @tante

@Cory Doctorow @tante @FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

FediThing

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @bazkie @prinlu
If LLMs were only used for checking grammar that is one thing.

But by far the most common use of LLMs is labour theft through creating knock-offs, and that's something else.

I think the concern is that training data useful for the first case could be useful for the second case too? Hence the questions about where the training data comes from and where it ends up.

Kind of feels like it needs to be strictly ringfenced if it's to be ethical?

@Cory Doctorow @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

This entry was edited (3 weeks ago)

in reply to FediThing

Cory Doctorow

in reply to FediThing • 3 weeks ago • •

@FediThing @bazkie @prinlu
Once again, you a replying to a thread that started when someone wrote that using an LLM to check the punctuation in your own work is ethically impermissible because no one should assemble corpora of other peoples' works for analytical purposes under any circumstances, ever.

@FediThing @bazkie 👩🏼‍💻 bitplanes 🎵 @prince lucija

in reply to Cory Doctorow

bazkie 👩🏼‍💻 bitplanes 🎵

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @FediThing @prinlu sure, but I'm responding here specifically to your statement that scraping for training isn't unethical per se.

@Cory Doctorow @FediThing @prince lucija

in reply to Cory Doctorow

Rafa

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic After reading so many comments, it is pretty clear who here would be opposing the creation of Napster and torrenting and be defending RIAA... They are also clearly very much against Internet Archive, shadow libraries, etc, simply because they can't take any disagreement.

Who knew running a local LLM, that uses the same energy as watching a youtube video, to spellcheck your own work would bring out such a mob.

@Cory Doctorow

in reply to Cory Doctorow

zivi

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @FediThing you’re attempting to legitimize use of an unethical technology for something you don’t actually need a plausible-sounding-wall-of-text generator for

it goes beyond “it’s made by bad people in bad ways”. it’s a “”tool”” that actively causes cognitive decline and psychosis and sucks the soul out of everything it touches. and mind you promoting and legitimizing it is an act of support for those bad people and their bad ways. your deflection is a typical that of someone with no regard for ethics

“I installed Ollama” instantly gives a person away as a techbro

your not-so-friendly not-so-neighborhood “””liberal”””

@Cory Doctorow @FediThing

in reply to zivi

Cory Doctorow

in reply to zivi • 3 weeks ago • •

@zaire @FediThing
I'm not a liberal, I'm a leftist, so perhaps this is why I disagree with you.

The argument that "something is unethical because someone else used it in an unethical way" is so incoherent that it doesn't even rise to the level of debatability.

@FediThing @zivi

in reply to Cory Doctorow

Dilman Dila

in reply to Cory Doctorow • 3 weeks ago • •

So this is some kind of spell-checker, which is already in LibreOffice? I'm not sure why I would use that instead.

I use offline AI, esp for visual effects, subtitles, fixing dialogue errors, etc. There are "deep fake technologies" useful for mocap, camera tracking, and such other tedious works. They don't use prompts, and don't generate art, and are trained on your own inputs.

Perhaps we need a new name to differentiate it from the online genAI tech.

This entry was edited (3 weeks ago)

in reply to Cory Doctorow

Mark Saltveit

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @FediThing
What's the difference between your argument here and "Slavery is OK because I didn't kidnap the slaves; I just inherited them from my dad." ??

@Cory Doctorow @FediThing

in reply to Mark Saltveit

Cory Doctorow

in reply to Mark Saltveit • 3 weeks ago • •

@taoish @FediThing
Because there are no slaves in this instance. Because no one is being harmed or asked to do any work, or being deprived of anything, or adversely affected in *any articulable way*.

But yeah, in every other regard, this is exactly that enslaving people.

Sure.

@Mark Saltveit @FediThing

in reply to Cory Doctorow

Mark Saltveit

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @FediThing
Unless you consider stolen intellectual property (and ongoing copyright violations) a harm, a deprivation, &c.

But your general analogy against "fruit of the poison tree" morality would seem to also apply in the case of slavery -- in my hypothetical, the person didn't enslave anyone. They just inherited a slave from someone who did. That is indeed "fruit of a poisoned tree", even if they just continued an existing enslavement.

We have a real world recent example -- the cell lines stolen from Henrietta Lacks. Do you dismiss any moral concerns about using her cell line without consent as a neo-liberal moral purity trap?

@Cory Doctorow @FediThing

in reply to Mark Saltveit

Cory Doctorow

in reply to Mark Saltveit • 3 weeks ago • •

@taoish @FediThing
Scraping and training are not copyright infringements:theguardian.com/us-news/ng-int…

AI companies will fail. We can salvage something from the wreckage

AI is asbestos in the walls of our tech society, stuffed there by monopolists run amok. A serious fight against it must strike at its roots

^{Cory Doctorow (The Guardian)}

@Mark Saltveit @FediThing

in reply to Cory Doctorow

Lupino

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic i'd start with the part that the model probably came pre-trained. Or was it trained by you on your laptop...? @FediThing @tante

@Cory Doctorow @tante @FediThing

in reply to Lupino

Cory Doctorow

in reply to Lupino • 3 weeks ago • •

@LupinoArts @FediThing
This is a purity culture argument about the "fruit of the poisoned tree." The silicon in your laptop was invented by a eugenicist. The network your packets transit was invented by war criminals. The satellite the signal travels on was launched on a rocket descended from Nazi designs that were built by death-camp slaves.

@Lupino @FediThing

in reply to Cory Doctorow

Cory Doctorow

in reply to Cory Doctorow • 3 weeks ago • •

@LupinoArts @FediThing
To be clear, I completely reject this argument as a form of special pleading. Everyone has a reason why *their* fruit of the poisoned tree is OK, but other peoples' fruit of the poisoned tree is immoral.

@Lupino @FediThing

in reply to Cory Doctorow

Lupino

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic i guess this misses the point: the particular chip in my laptop wasn't made by war criminals (i hope...), but the model you do use was trained under vast amounts of energy and water consumption. I'm not sure this is completely comparable, tbh.

@FediThing @tante

@Cory Doctorow @tante @FediThing

in reply to Lupino

Lupino

in reply to Lupino • 3 weeks ago • •

@pluralistic and yes, i'm aware that producing a chip also costs vast amounts of energy and water... but at least my chip is used to solve a multitude of purposes, while a LLM that checks spelling and grammar is built and trained for one single use-case (that, nb, could also be done without an LLM). So yes, I do differenciate. @FediThing @tante

@Cory Doctorow @tante @FediThing

in reply to Lupino

Cory Doctorow

in reply to Lupino • 3 weeks ago • •

@LupinoArts @FediThing
Llama 2 was not built to check spelling and grammar. That's "not even wrong."

@Lupino @FediThing

Shiri Bailem likes this.

in reply to Lupino

Cory Doctorow

in reply to Lupino • 3 weeks ago • •

@LupinoArts @FediThing
No, this is just more "fruit of the poisoned tree" and your argument that your fruit of the poisoned tree doesn't count is the normal special pleading that this argument always decays into.

@Lupino @FediThing

Shiri Bailem likes this.

in reply to Cory Doctorow

Lupino

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic sorry, i'm just not good at making a point. To me, not "LLM" is the "forbidden fruit", but "using an LLM for certain purposes" is. I think there are actually use-cases for stochastic inference machines (like folding proteins or structuring references), but, as @tante wrote (better: as I understand him), there are use-cases that one very much can reject in its entirety. And that should be okay.

@Cory Doctorow @tante

in reply to Lupino

Cory Doctorow

in reply to Lupino • 3 weeks ago • •

@LupinoArts
I never denied the existence of "use-cases that...one can reject it its entirety."

@Lupino

in reply to Cory Doctorow

Matija Nalis

in reply to Cory Doctorow • 2 weeks ago • •

The problem with AI is not primarily with tech itself, but human traits of laziness, greed and ignoring unpriced externalities for short-sighted personal gains. Those of course exist even without AI, but AI allows for their damage to be multiplied many thousandfold which upgrades it from minor annoyance to existential crisis.

And the problem with Cory saying he's preferring using his AI is not exact cost of him using AI model (which is tiny), it is NORMALIZING 1/3

This entry was edited (2 weeks ago)

in reply to FediThing

Colman Reilly

in reply to FediThing • 3 weeks ago • •

@FediThing @pluralistic I’ve been utterly baffled why he’s so popular for decades.

@Cory Doctorow @FediThing

in reply to Colman Reilly

Cory Doctorow

in reply to Colman Reilly • 3 weeks ago • •

@Colman @FediThing That's interesting. I've never wondered that about you.

@Colman Reilly @FediThing

in reply to Cory Doctorow

Shiri Bailem

in reply to Cory Doctorow • 3 weeks ago •

A meme image showing hands pouring water on a burn with the text "Apply Cold Water To The Burned Area"

@Cory Doctorow @Colman Reilly @FediThing @tante

@Cory Doctorow @Colman Reilly @tante @FediThing

like this

in reply to Cory Doctorow

Ursa

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @Colman @FediThing
This is...disappointing. To be fair, I'm disappointed in almost everyone in this thread for engaging in schoolyard shit throwing, but you're much higher in status and your shit sticks. Have a conversation. Figure out where these views can comingle. Find common understanding or you risk using your high status to fracture an already unstable alliance of people who want technology to operate safely and for the benefit of our shared humanity.

Do better.

@Cory Doctorow @Colman Reilly @FediThing

in reply to Cory Doctorow

komali_2

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic capital murder on the timeline

@Cory Doctorow

in reply to Cory Doctorow

Ghostrunner

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @Colman @FediThing wow. punching down? I had a higher opinion of you.

@Cory Doctorow @Colman Reilly @FediThing

in reply to Ghostrunner

Martijn Vos

in reply to Ghostrunner • 3 weeks ago • •

@Ghostrunner @Cory Doctorow @Colman Reilly @FediThing @tante

You had a higher opinion of him, but not of yourself? And yet you wonder why he's popular.

I'm not a fan of all the shit throwing in this thread, but if you participate, you're going to get some on you.

@Cory Doctorow @Colman Reilly @tante @Ghostrunner @FediThing

in reply to FediThing

Ian Betteridge

in reply to FediThing • 3 weeks ago • •

@FediThing @pluralistic Some people - in fact quite a lot; if my reading is correct - do indeed argue that LLMs can *never* be ethically used because they are “trained on stolen work”.

@Cory Doctorow @FediThing

in reply to Ian Betteridge

Cory Doctorow

in reply to Ian Betteridge • 3 weeks ago • •

@ianbetteridge @FediThing
Performing mathematical analysis on large corpora of published work is not "stealing."

@FediThing @Ian Betteridge

in reply to Cory Doctorow

Hanno Rein

in reply to Cory Doctorow • 3 weeks ago • •

It really depends a bit on the details, doesn't it. If I copy a CD, I also perform some mathematical analysis on it, error checking etc. Maybe I even make a non-exact copy by passing it through some filter to make it sound better. But it's totally different from listening to a bunch of Beatles songs and then getting inspired to write my own songs in a similar style.

This entry was edited (3 weeks ago)

in reply to Hanno Rein

Shiri Bailem

in reply to Hanno Rein • 3 weeks ago •

@Hanno Rein It may seem totally different to you, but legally there's little difference between you being inspired after listening to a bunch of music and the LLM training off of it.

If you want to get into a deeper ethics conversation than the legal text, you're wading into some deep mud that goes back to the advent of copyright. (People take it for granted that there were ethical debates about copyright and the settled answer was that copyright is a compromise not a fundamental right)

@Cory Doctorow @tante @FediThing @Ian Betteridge

@Cory Doctorow @tante @FediThing @Hanno Rein @Ian Betteridge

Ian Betteridge likes this.

in reply to Shiri Bailem

Hanno Rein

in reply to Shiri Bailem • 3 weeks ago • •

@shiri

Sorry, I wasn't very clear. The comment was not about the LLM training data including a song but it's ability to reproduce the song afterwards. It becomes a Beatles impersonator.

LLMs can (mostly) output what they got as training data. So I don't buy the mathematical analysis argument.

@pluralistic @tante @FediThing @ianbetteridge

@Shiri Bailem @Cory Doctorow @tante @FediThing @Ian Betteridge

in reply to Hanno Rein

Shiri Bailem

in reply to Hanno Rein • 3 weeks ago from Fedilab •

@Hanno Rein
it varies and is a violation when they do... but here's the catch: that's the worst argument to challenge them on.

The big ecologically devastating companies behind all the worst of things? They benefit enormously if tackled from that front.

Because that argument doesn't make LLMs illegal or affect their training, it only makes it illegal to share the models required to run your own. Because the copyright violation is either in the output (which they can filter) or it's in the model files, which means they'd be the only ones you could go to for AI.

That argument could even solve the AI bubble... again in the worst way because their abuses would be allowed to continue (the bubble bursting is what will shut down most if not all of these datacenters)
@Cory Doctorow @tante @FediThing @Ian Betteridge

@Cory Doctorow @tante @FediThing @Hanno Rein @Ian Betteridge

in reply to Cory Doctorow

Ian Betteridge

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @FediThing I agree - but that is the (incorrect imho) argument they make.

@Cory Doctorow @FediThing

in reply to Ian Betteridge

Cory Doctorow

in reply to Ian Betteridge • 3 weeks ago • •

@ianbetteridge @FediThing Yup.

@FediThing @Ian Betteridge

in reply to Cory Doctorow

David

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @ianbetteridge @FediThing
It's still profit loss damage curable by income transfer if the illegally acquired data was used to create that profit. Dataset prominence should provide the percentage of profits and prominence is data size but also inference casualty. The primary literature should not be able to be diluted with free intellectual property.

I don't know if any of this is actual case law and I'm not a lawyer.

@Cory Doctorow @FediThing @Ian Betteridge

This entry was edited (3 weeks ago)

in reply to David

Cory Doctorow

in reply to David • 3 weeks ago • •

@drdrowland @ianbetteridge @FediThing
You're talking about ways of using models, not the creation of models. It's possible to make a model that does illegal things. But training a model is not illegal.

@David @FediThing @Ian Betteridge

in reply to Cory Doctorow

James Gleick

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @ianbetteridge @FediThing “Mathematical analysis” is doing a lot of work here. It could mean gathering meaningless statistics. Or it could mean capturing the qualities (deviations from the average) that make a particular work of art (or author) special, creative, surprising—for use in simulacra.

I think that's harmful, to the culture as a whole, if not to the artworks and artists getting regurgitated.

@Cory Doctorow @FediThing @Ian Betteridge

in reply to James Gleick

Cory Doctorow

in reply to James Gleick • 3 weeks ago • •

@gleick @ianbetteridge @FediThing
Let's stipulate to that (I don't agree, as it happens, but that's OK). It's still not a copyright infringement to enumerate and analyze the elements of a copyrighted work.

For the record, I think AI art is bad and neither consume nor make it.

@FediThing @James Gleick @Ian Betteridge

in reply to Cory Doctorow

James Gleick

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @ianbetteridge @FediThing I'm not claiming that's copyright infringement. Even if one respects the general framework of copyright, which I know you don’t, it seems hopeless to apply it to this AI mess.

But there is a kind of theft here. Not that it's actionable or measurable. But it’s nontrivial. It's related to questions of impersonation. It's an assault on individuality. Whatever your reasons for thinking AI art is bad (I have some sense), it's related to that, too.

@Cory Doctorow @FediThing @Ian Betteridge

in reply to James Gleick

James Gleick

in reply to James Gleick • 3 weeks ago • •

@pluralistic @ianbetteridge @FediThing Some authors have taken the view that they deserve some compensation for the use of their books in training the LLMs. Do the transaction and we're hunky-dory. That's not my view. I don't care about compensation; I just don't want my prose regurgitated in the LLMs, for reasons I'm not yet able to express properly. I feel I should have been asked, and I feel violated.

@Cory Doctorow @FediThing @Ian Betteridge

in reply to James Gleick

Dave Rahardja

in reply to James Gleick • 3 weeks ago • •

@gleick @pluralistic @ianbetteridge @FediThing I think the sense of “theft” that creators feel is directly caused by the fact that the AI industry (as it stands today) is a Ponzi scheme which is fundamentally built on remixing creators’ works and devaluing human labor. I have a feeling that most creators will not feel the same kind of outrage if an educational institution created the same technology for academic use, e.g. to generate insights into online culture and psychology.

In short, the GRIFT (i.e. the particular application of the technology) is the source of the feeling of theft, not the technology itself. I think the tech itself has value when used ethically.

FWIW I agree with Cory here that copyright is the *wrong* framework to use for criticizing AI, because for every case where copyright helps the individual creator, there are hundreds of cases where it helps incumbent megacorporations more.

#ai

humancode.us/2024/05/15/copyri…

Copyright will not save us from AI

^humancode.us

#ai @Cory Doctorow @FediThing @James Gleick @Ian Betteridge

This entry was edited (3 weeks ago)

in reply to Dave Rahardja

Martijn Vos

in reply to Dave Rahardja • 3 weeks ago • •

@Dave Rahardja @Cory Doctorow @tante @FediThing @James Gleick @Ian Betteridge

I think there's a couple of aspects to the "theft":
* the theft of material: they're trained on copyrighted material
* the theft of jobs: AI is being used to replace artists/writers/coders; it's the same thing that upset the Luddites
* the theft of style: not only does AI "learn" from the works of others, it can emulate it. On demand. Some artists have very unique, personal styles that are suddenly not their own anymore.

@Cory Doctorow @tante @Dave Rahardja @FediThing @James Gleick @Ian Betteridge

in reply to Martijn Vos

Ian Betteridge

in reply to Martijn Vos • 3 weeks ago • •

@mcv @pluralistic @drahardja @FediThing @gleick I mean, *I* was trained on copyrighted material. So were you. So is everyone. I even regurgitate phrases I’ve read, usually unknowingly.

@Cory Doctorow @Dave Rahardja @FediThing @Martijn Vos @James Gleick

in reply to Ian Betteridge

Martijn Vos

in reply to Ian Betteridge • 3 weeks ago • •

@Ian Betteridge @Cory Doctorow @Dave Rahardja @FediThing @James Gleick @tante

That is true, and probably the strongest argument to defend LLMs. But LLMs have more explicitly encoded the material they're trained on, and better able to reproduce it than we are. Still, I think copyright is by far the weakest of the three types of theft. And I think the duplication of specific, personal styles is probably the most personal and invasive. In a way it's kind of the same thing, and yet it feels very different to me.

The theft of jobs causes the most damage, but is also kind of unavoidable with many technological advances.

@Cory Doctorow @tante @Dave Rahardja @FediThing @James Gleick @Ian Betteridge

in reply to Cory Doctorow

Alaric Snell-Pym

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @gleick @ianbetteridge @FediThing there's been documented cases of LLMs regurgitating stuff from their training set verbatim, which clearly IS copyright infringement; and that means some parts of the training set are.encodrd in the weights of the model, which looks like publishing a copyrighted work to me. If publishing a JPEG of an image without copyright to it would be infringing, isn't publishing a model that can recreate something also infringing?

@Cory Doctorow @FediThing @James Gleick @Ian Betteridge

in reply to Cory Doctorow

Bruno Nicoletti

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @ianbetteridge @FediThing If that “mathematical analysis” regurgitates near verbatim works created by other people, it certainly is committing IP theft, and LLMs will happily do that. The “mathematical analysis” is effectively a form of lossy compression on its training data which a prompt can later extract.

@Cory Doctorow @FediThing @Ian Betteridge

in reply to Bruno Nicoletti

Cory Doctorow

in reply to Bruno Nicoletti • 3 weeks ago • •

@bjn @ianbetteridge @FediThing
Once again, you're talking about *using* a model, not training a model.

Also "IP theft" isn't a thing. Perhaps you mean copyright infringement?

@Bruno Nicoletti @FediThing @Ian Betteridge

in reply to Cory Doctorow

Bruno Nicoletti

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @ianbetteridge @FediThing I’ll give you pedant points for copyright infringement, which is what most people mean by “IP theft”. As for training/using, the difference is somewhat moot. The models are trained to be used, and if trained on copyrighted data without a license, you’ve encoded that data into the model which might then regurgitate it thus facilitating copyright infringement.

@Cory Doctorow @FediThing @Ian Betteridge

in reply to Bruno Nicoletti

Cory Doctorow

in reply to Bruno Nicoletti • 3 weeks ago • •

@bjn @ianbetteridge @FediThing it is a bedrock of copyright law that devices 'capable of sustaining a substantial non-infringing use' are lawful. Decided in 1984 (SCOTUS/Betamax) and repeatedly upheld.

It is categorically untrue that merely because a model's output can infringe copyright that the model is therefore illegal.

There's not much that's truly settled in American limitations and exceptions, but this is.

@Bruno Nicoletti @FediThing @Ian Betteridge

in reply to Cory Doctorow

Cory Doctorow

in reply to Cory Doctorow • 3 weeks ago • •

@bjn @ianbetteridge @FediThing 'facilitating copyright infringement' just isn't a thing.

@Bruno Nicoletti @FediThing @Ian Betteridge

in reply to Cory Doctorow

Cory Doctorow

in reply to Cory Doctorow • 3 weeks ago • •

@bjn @ianbetteridge @FediThing and as befits UK fair dealing (and related limitations and exceptions), we've had opinions from IPREG affirming that training a model doesn't infringe.

@Bruno Nicoletti @FediThing @Ian Betteridge

in reply to Cory Doctorow

Bruno Nicoletti

in reply to Cory Doctorow • 3 weeks ago • •

Then the laws are not fit for purpose. The whole point of copyright is to encourage people produce works by being sure they get the benefit of those works. If my works can be encoded into a bunch of matrix weights and reproduced without attribution let alone financial recompense, then why should I bother? Google is doing its best to effectively steal the bread out of creators mouths with its AI summaries. It may be legal, but it stinks.

This entry was edited (3 weeks ago)

in reply to Bruno Nicoletti

Cory Doctorow

in reply to Bruno Nicoletti • 3 weeks ago • •

@bjn @ianbetteridge @FediThing by all means say 'i don't like this technology' but don't conflate that with 'therefore it is illegal'

@Bruno Nicoletti @FediThing @Ian Betteridge

in reply to Cory Doctorow

Bruno Nicoletti

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @ianbetteridge @FediThing Well apart from Anthropic having to pay $1.5B for copyright infringement, it’s all above board, 🙄. It’s not a matter of liking the technology or not, machine learning is capable of cool and useful things. However, how LLMs are being used and pushed is both immoral and culturally destructive. I’m surprised you are buying into it.

@Cory Doctorow @FediThing @Ian Betteridge

in reply to Cory Doctorow

Else, Someone

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic
> IPREG affirming that training a model doesn't infringe.

What we now take the party line serious?

@bjn @ianbetteridge @FediThing @tante

@Cory Doctorow @tante @Bruno Nicoletti @FediThing @Ian Betteridge

in reply to Cory Doctorow

Else, Someone

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic
> untrue that merely because a model's output can infringe copyright that the model is therefore illegal.

Mhmmm naaah overfitting and memorization are very much a thing, especially in the case of LLM where they've completely given up on controlling data leaks, and where memorization has been demonstrated rather unambiguously e.g. with the suitesparse example...

Not to imply that "illegal" is bad ofc, or that copyright justifiable

@bjn @ianbetteridge @FediThing @tante

@Cory Doctorow @tante @Bruno Nicoletti @FediThing @Ian Betteridge

in reply to Cory Doctorow

The Secretbatcave

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @bjn @ianbetteridge @FediThing
I’d argue that It’s a bit more nuanced. Training and inference are two separate stages with their own rules.

For non profit, academic research, excerpts are allowed to be collected, but not the whole work. You still can’t circumvent DRM either.

Llama might be argued is non profit, but lifting whole works to train on still isn’t allowed.

@Cory Doctorow @Bruno Nicoletti @FediThing @Ian Betteridge

in reply to Cory Doctorow

soc

in reply to Cory Doctorow • 3 weeks ago • •

@pluralistic @ianbetteridge @FediThing 🤡

@Cory Doctorow @FediThing @Ian Betteridge

Unknown parent

Shiri Bailem

Unknown parent • 3 weeks ago •

@Fruits @Cory Doctorow god, mood and a half right there...

Honestly the worst problem is that it's a self-defeating issue... the type of people who flock first to this platform are the type of people we're having issues with... and the solution is adding more people to dilute them, but they're driving off people in a self-reinforcing cycle...

The fact that many of them will out loud say they don't want regular people to join fedi still leaves my jaw on the floor...

@Cory Doctorow

like this

in reply to tante

Mastodon Migration

in reply to tante • 3 weeks ago • •

Hmmmm... How about this perspective?

LLM is just a programming technique. The ethicality of using LLMs relates to the type of use and the source of the data it was trained on.

Using LLMs to search the universe for dark matter using survey telescopic data or to identify drug efficacy using anonymized public health records is simply using the latest technology for good purpose. Cory's use seems like this.

LLMs trained on stolen data creating derivative work. That's just theft.

Shiri Bailem likes this.

in reply to Mastodon Migration

Shiri Bailem

in reply to Mastodon Migration • 3 weeks ago •

@Mastodon Migration tagging @Cory Doctorow because this is a good line of discussion and he might need the breath of fresh air you're bringing.

My own two cents: you're missing one of the big complaints in the form of "how they were trained" which is the environment impact angle. Not that it isn't addressed by Cory's use case, just a missing point in the conversation that's helpful to include.

The "stolen data" rabbit hole is sadly a neverending one that digs into deep issues that predate LLMs. Like the ethics of copyright (which is an actual discussion, just so old that it's forgotten in a time when copyright is taken for granted). Using it to create "art" and especially using it to replace artist jobs is however a much much more clear argument.

Nitpick: LLMs can't be used for checking drug efficacy or surveying telescopic data, I think in this line you're confusing LLM with the technology it's based on which is Machine Learning.

@tante

@Cory Doctorow @Mastodon Migration @tante

like this

Matt Rife reshared this.

in reply to Shiri Bailem

Mastodon Migration

in reply to Shiri Bailem • 3 weeks ago • •

@shiri @pluralistic
Thanks for these corrections. Completely agree with everything, and thanks for tagging Cory.

One of the really unfortunate things that the Silicon Valley scammers have achieved is to coopt new technologies for their despicable pump and dump schemes and apply their disingenuous hype factory which ends up tarring all uses with the same brush.

@Shiri Bailem @Cory Doctorow

Shiri Bailem likes this.

in reply to Mastodon Migration

David Fleetwood - RG Admin

in reply to Mastodon Migration • 3 weeks ago • •

@mastodonmigration @shiri @pluralistic The only ethical use of a LLM would be one where the training dataset was ethically acquired, the power was minimized to the level of other methods of providing the same benefits, and the 'benefits' were actually measureable and accurate.

None of those are true today, and so far as I know there is little to no path to them.

@Shiri Bailem @Cory Doctorow @Mastodon Migration

zivi likes this.

in reply to David Fleetwood - RG Admin

Mastodon Migration

in reply to David Fleetwood - RG Admin • 3 weeks ago • •

@reflex @shiri @pluralistic
Seems like Cory's local punctuation and grammer checker is such an example, no?

@Shiri Bailem @Cory Doctorow @David Fleetwood - RG Admin

in reply to Mastodon Migration

Shiri Bailem

in reply to Mastodon Migration • 3 weeks ago from Fedilab •

@Mastodon Migration
it's the "copyright" issue, the outlook that unless everyone who posted anything that was used receives a check for a hefty sum then it's unethical.

Copyright is in quotes because it's not really a violation of copyright (the LLMs are not producing whole copies of copywritten materials without basically being forced) nor is it a violation of the intent of copyright (people are confused, copyright was never intended to give artists total control, it's just to ensure new art continues to be created).

@Cory Doctorow @David Fleetwood - RG Admin @tante

@Cory Doctorow @Mastodon Migration @tante @David Fleetwood - RG Admin

like this

in reply to Shiri Bailem

David Fleetwood - RG Admin

in reply to Shiri Bailem • 3 weeks ago • •

Also it's incredibly unclear to me how a LLM is a good use case for punctuation and grammar checking,. something regular document editors have done incredibly well since the late 90's or so. Like that's your use case? Not promoting Microsoft here but Word has been fantastic at that since at least 2003.

Seems weird to use that as the case for an energy sucking plagiarism machine.

This entry was edited (3 weeks ago)

in reply to tante

Radio Free Trumpistan

in reply to tante • 3 weeks ago • •

Chiming in, in defense of people like me who are not "neoliberal" and refuse to use AI of any sort because we don't have a use for them and find them rather superfluous to the functioning of our own native-born great minds. We simply have no use or need for them at all. Such things are for the weak-minded and lazy.

in reply to tante

komali_2

in reply to tante • 3 weeks ago • •

questions for the leftists and liberals from a confused anarchist:

1. Do you think you can put the cat back in the bag with LLMs? How?

2. For those that believe that LLMs were trained on stolen data, what does it mean for data to be private, scarce property, that can be "stolen?"

3. What about models that just steal from the big boys, like the PRC ones? Theft from capitalists, surely ethical?

4. Will you not using any LLMs cause Sam Altman and friends to lose control of your country?

Shiri Bailem likes this.

in reply to komali_2

Radio Free Trumpistan

in reply to komali_2 • 3 weeks ago • •

Dear anarchist: the confusion you feel is due to the fact that the only people you see are "leftists" "liberals" and "anarchists". None of those are real everyday people. Because the political parties in the U.S. don't recognize real everyday people as voters, real everyday people are fed up and disgusted with partisan loyalties, especially and including anarchists.

in reply to komali_2

Shiri Bailem

in reply to komali_2 • 3 weeks ago •

@komali_2 answering as a leftist AI-moderate who just understands the arguments better than most:

It varies, they either haven't thought that far or they have such a flawed view of AI that they think it's possible (ie. if it can only exist on massive datacenters, then shutting down those datacenters would stop it, and if it was only a novelty then a shaming campaign would shut it down)
That's a a very hot button topic. People have taken copyright for granted to the point that the discussion about the ethics of copyright is just as offensive to people as the topic of AI. Even though this case isn't even covered by copyright law, they feel entitled to "ownership" of their art. The argument of "theft" didn't even kick off until AI got good enough that people started using it in lieu of paying artists, and is mostly inbetween a demand for payment (not realizing that any payment would be laughably tiny) or a way to try and pressure these into not existing (see question 1).
Most of these people know nothing about variations in training data, and they'll either dismiss this question (because they're siding with said capitalists) or they'll divert to arguing about the ecological impact of training the big models.
They're just trying to social pressure everyone into no longer using them, because again see question 1 where they think they can put the cat back in the bag.

@tante

@tante @komali_2

like this

in reply to tante

Growlph Ibex

in reply to tante • 3 weeks ago • •

I've struggled for a while to reconcile people's justified outrage at the AI bubble's incalculably harmful unsustainability with the need to deny that even the most banal utility of its outputs could ever possibly have merit.

Credit to @pluralistic for really articulating why I find the "Not Even Once" crowd to be largely missing the point, even if we all agree that the "AI True Believers" are ghoulish monsters.

@Cory Doctorow

in reply to Growlph Ibex

Radio Free Trumpistan

in reply to Growlph Ibex • 3 weeks ago • •

Au contraire--what's apparent is that those who misunderstand the Not Even Once people are missing the point. No person who doesn't have any use for it will even use it once, and calling them all something equivalent to Ludites is missing the point. You might as well be denigrating all people who have never bought a torque wrench because the ones that haven't bought one have zero use for one in the first place.

in reply to Radio Free Trumpistan

Growlph Ibex

in reply to Radio Free Trumpistan • 3 weeks ago • •

@claralistensprechen3rd "Shame on you for not using AI" isn't even close to the point I was making, and I'm honestly not sure how any good-faith reading of my take here could have led you to that conclusion.

If you don't have a use case for a torque wrench, no problem.

I you DO have a use case for a torque wrench, it's weird for me to say "no you don't, just use a hammer".

@Radio Free Trumpistan

in reply to Growlph Ibex

Growlph Ibex

in reply to Growlph Ibex • 3 weeks ago • •

@claralistensprechen3rd To be fair, AI advocates have been trying to wedge AI into so many inappropriate situations that legitimately useful applications for LLMs are the exception rather than the norm, and being on the defensive is understandable.

But I think it's detrimental to presume that legitimately useful and perfectly ethical applications simply don't exist.

@Radio Free Trumpistan

in reply to Growlph Ibex

Radio Free Trumpistan

in reply to Growlph Ibex • 3 weeks ago • •

That's exactly what I've been getting at. You've posited two extremes and most people are at neither end of those extremes. Some of us normies are at the point where refusing to use AI is a direct response to the shove-down regardless of its merits. Some of us normies have gotten along without it so far and can get along without it just fine enough because there's simply no use for it. As it is in politics with only two parties, there are only two extremes as far as the not-normies are concerned, the rest of us be damned.

in reply to tante

vy

in reply to tante • 3 weeks ago (Received 2 weeks ago) • •

Is refusing buy a Tesla or use X because I disapprove of Musks racism and suport of Trump, "neoliberal purity culture" ? Maybe putting solar on my house is "neoliberal purity culture" too, as is refusing to cross a picket line. Seems like Doctorow uses "purity culture" to mean what I would call "moral responsibility". As for "neoliberal", insofar as it retains any meaning, it's much more applicable to the kind of free market libertarian argument that Doctorow is making than otherwise.

in reply to vy

Martijn Vos

in reply to vy • 2 weeks ago • •

@vy @tante

Yeah, I don't quite understand this use of "neoliberal". I don't strongly disapprove of moderate, responsible use of AI, but I totally respect people who boycott it. But "neoliberal", as far as I'm aware, mostly refers to: privatise everything, deregulate everything, aim for short-sighted benefit for yourself, and don't worry about the externalities. That's the exact opposite of boycotting AI.

@tante @vy

⇧

tante 3 weeks ago • •

tante
3 weeks ago • •