Skip to main content


Firefox 130 is bringing a game-changing feature: automatic alt-text generation for images using a fully private on-device AI model! ๐Ÿ™Œ๐Ÿพ

Initially available in the built-in PDF editor, our aim is to extend this to general browsing for screen reader users. hacks.mozilla.org/2024/05/expeโ€ฆ

in reply to Mozilla

This is how it's done! Private, open-source AI models running locally.

Q: How much storage do the models take? (EDIT: 200MB according to the post - yeah, in this case, this better be a downloadable 'module' instead of being built-in) Could you make this feature optional, which would require the user to opt-in and download or delete the model(s) themselves? I don't want Firefox to go the Microsoft Edge route, where they shovel every feature under the sun, the user has no choice, and there is no way to reduce the storage occupied by the browser.

This entry was edited (5 months ago)
in reply to AlexTECPlayz

@alextecplayz Just wait until you hear about Mozilla's brand new shopping toolbar. They bought a company that used to dabble in NFTs before switching to claiming to have AI.

And just for fun, this new Mozilla subsidiary will sell browsing history and location data to advertisers... as laid out here.

fakespot.com/privacy-policy
(Ctrl+F for "Personal Information is Sold")

Grrrr, Darth Moose Shark reshared this.

in reply to Mozilla

Looks interesting at first glance.

Thanks for being this open and transparent about the process, used model etc.

in reply to Mozilla

alt text cannot be automatically generated without human input because the function of alt text is highly contextual. if you actually gave a shit about a freer more independent web you'd support projects like @hannah's distributed alt text database which is currently supported via browser extension. it looks like you're too late now to take it over but if you have any funding left for non-AI bullshit i'm sure the 501(c)(3) would absolutely love your support as well as built-in browser integration social.alt-text.org/@hannah/11โ€ฆ


Hi friends,

In a last minute meeting between pre-op appointments preparing for my major craniotomy tomorrow, Scribely and I have agreed on terms to hand off the Alt-Text.org project.

The code will remain open source, and the project will become a 501(c)(3) with support from Scribely. Should the project be successful its operating costs will likely be considerable and I believe connections to industry are necessary to sustain its contributions to web accessibility.

๐Ÿ’œ Hannah


in reply to d@nny "disc@" mcยฒ

@hipsterelectron @hannah I like how you're yelling at one of the few organizations that go to great lengths to build a completely independent browser that they "don't give a shit" about the free web.

If they don't give a shit then literally nobody on earth does.

in reply to s92

@sasha92 i yelled at them on twitter a few days ago too i'm trying to get them to change so we don't have to throw the baby out with the bathwater x.com/hipsterelectron/status/1โ€ฆ
@s92
in reply to d@nny "disc@" mcยฒ

@sasha92 and uhhh clearly hannah gives a shit which is why she built the distributed alt text database. mozilla clearly isn't doing its job as an org so individual hackers on the internet are doing it for them and that's obviously much harder. i spelled out how they can do better, you're giving them a pass because they did something cool once. guess which one of us cares more about mozilla's success?
@s92
in reply to d@nny "disc@" mcยฒ

@hipsterelectron @sasha92 There is billions of images online and an external human can't describe each picture that's missing alt text.
That project has a different goal than Mozilla's alt-text AI and I am sure you can use both - human descriptions with that project for the few images that will have it and Mozilla's AI for the rest.
in reply to Emi

@paper @sasha92 you do know there are billions of humans right? wikipedia is widely used to train statistical models but it too is the compilation of many humans over time. adding alt text to an image takes about as much effort as editing a page (and often much less). do the math
@s92 @Emi
in reply to d@nny "disc@" mcยฒ

@paper @sasha92 also, as noted above, it is generally accepted by experts on alt text (not experts on "AI", who have a conflict of interest) that it cannot be machine generated in any remotely meaningful way. this is like providing an escalator to wheelchair users
@s92 @Emi
in reply to d@nny "disc@" mcยฒ

@hipsterelectron @sasha92 There are 3.2B images uploaded in a day (1.2T /year), many of them are repeating, google has 130B indexed. You can't describe all of that. Sure, human description will probably be better in many cases, but AI descriptions are still very useful.

Also, I doubt that project will get as many people editing it as Wikipedia has, so it can be great for a few popular images, memes, etc. but it can never cover random images on social media and websites without alt text.

in reply to Emi

@paper @sasha92 if you're not blind or visually impaired you absolutely don't get to make that call (and single individuals do not represent an entire community). hannah's project was built in direct conversation with the disabled people it serves. you're finding reasons to trash it to justify your preconceived notions and i think that's a really terrible thing to do. blocking now
@s92 @Emi
in reply to Mozilla

Nice work!!!

This would be particularly useful for postings to Mastodon, where alt-text is much socially desirable.

in reply to Mozilla

you do realize that incorrect alt-text can be worse than no alt-text? Sometimes critically so? I do hope this will be strictly opt-in and alt-texts generated by the model will be clearly marked, if this ever goes into stable Firefox
in reply to Mozilla

hm, i think this can be useful, however the problem is when people will never look at the output and just accept it at face value.

Basically I hope you will add a warning box that says "Do note that the text generation is not perfect and you should make sure the text clearly fits the image" or something along those lines. Also when it generates the text, it should always add "This alt text was generated by Firefox language model." as the first sentence, so people who rely on alt text features will know that this may be inaccurate.

in reply to Mozilla

this is not how we push for better accessibility. PLEASE reconsider this "AI" shift
in reply to Mozilla

cool.
Is it time to invent webbrowser that actually dont spy on you?
in reply to Mozilla

would be amazing if this offered an API for webapps to use. E.g. mastodons alt text field could detect it has that feature available and provide a suggested alt text to users.

On the other hand this might encourage lower quality alt texts, as that will always be quicker to do than writing down your own alt text.

Maybe keeping it "fallback" only for consumers of content that is missing alt text is best.

in reply to Mozilla

AI haters in your replies in 3 ... 2 ... oh nevermind, they are already here ๐Ÿซ 
in reply to Mozilla

please ensure there's an option to turn it off. I don't want any AI in my system.
in reply to Mozilla

cool. my visually impaired audience will be so happy i clicked the button to generate a bunch of gibberish, instead of spending 10 seconds to write a thoughtful description.
in reply to Mozilla

no. get this garbage out of here

  1. what the hell is it trained on? the same corpus of implicit bigotry and medical misinformation as every other "ai"?
  2. it's just going to lie a bunch like every other "ai". wrong alt text is worse than no alt text.
in reply to Mozilla

Or, I don't know, people could type them? On the things they post? "This way, people don't have to think" - ?
in reply to Mozilla

you really make it hard for decent people to not switch to other browsers for good. At this point there's no other explanation than your managers being bribed by competitors.
in reply to Mozilla

Context is important though. AI (rather, ASI) will fail to understand the intent of a picutre. That's an easy way to render a person unable to understand the picture even with a physical explanation...
in reply to Mozilla

This is cool-ish (and love the private, on-device aspect), but seems like the wrong end of the pipeline. Does Firefox help people write their alt text when posting images (and prompt them to do it)? Now _that_ would be game-changing. No need for the cost of the power increase of hundreds or thousands of _readers_ to create it, do it when the _writer_ creates it.
in reply to Mozilla

Hard fucking pass. I don't want a tiny LLM using my device resources.
in reply to Mozilla

I would have preferred it if you had just focused on speed. I don't need AI built into every piece of software I use.
in reply to Mozilla

no no no. No AI in Firefox, PLEASE. Y'all are the one viable alternative browser, please don't start following silly trends like this. If people want AI nonsense, there's plenty of places to get it. I would happily pay to avoid having features like these creep into Firefox.
in reply to Mozilla

So, ermโ€ฆ how often will it phone home to teach a centralized model about how to do this? How much are you going to pay your laborers for their time to train your a.i. models?
in reply to Mozilla

What a shitty idea.

Wish I was surprised that Mozilla has jumped on the AI Highway to Hell.

But their priorities have not been user focused for years IMO.

youtube.com/watch?v=4hhlQU0zDpโ€ฆ

in reply to Mozilla

The amount of kvetching in the comments here is hilarious. I guess it should be expected if you combine AI, open source, and alt-text into a single post on Mastodon.
in reply to Mozilla

Automated alt-text isn't too bad. This is a fair use of AI that doesn't really step on any toes.

I could foresee this causing problems if the alt-text is very wrong, though.

in reply to Mozilla

ITT all 7 firefox users upset that AI might be useful for some things
in reply to Mozilla

"The first time the user adds an image, theyโ€™ll have to wait a bit for downloading the model (which can take up to a few minutes depending on your connection) but the subsequent uses will be much faster"

Hope you can disable this feature and 200 mb download at all
I understand it can be very useful for some users, but for me not at all
If I find porno images I don't understand I think I can skip them ๐Ÿคช

in reply to Mozilla

ITT all 7 firefox users upset that AI might be useful for some things

Not pictured: 2 billion people using Chrome

in reply to Mozilla

Alt text will absolutely still be required on websites and social media posts; this is just to patch over where people couldn't be bothered to be inclusive, basically :)
in reply to Florian

@zersiax but do we really want to give some who canโ€™t be bothered, a check box that generates confusing, shallow and often innacurate alt text that would be more aggravating than not having any alt text at all?

This is not an โ€œai sucksโ€ comment with not foundation. Artists have been using AI alt text on Instagram for a while now and it is truly awful.

in reply to james

@james I don't know, honestly :) I think it depends on the image.
Chrome and Edge have had this feature for a while now, sans LLM, and really the only time that is useful is when there's text in an image, which gets OCR'ed ...relatively ... well. So in that sense I can see it; PDFs often are pictures of text and this might bridge that divide. For practically any other purpose though ...no, probably not :)
in reply to Florian

@zersiax mastodon web and some Fediverse clients like Ivory already have that OCR, yet I still see tonnes of image posts of text that do not bother to use it.

Which is why Iโ€™m like โ€œplease donโ€™t just launch this and expect people to check what comes out, otherwise youโ€™ve just made experience a special sort of crapโ€

๐Ÿ˜ฌ

in reply to Mozilla

A clever way to make developers and site owners not forget to write proper alt text. "Write your images alt text, otherwise we'll generate gibberish and deface your websites!".
Smart move, Mozilla.
in reply to Mozilla

I'm still waiting for the feature that obligatory speaks back the comment to the user before posting.

(see xkcd.com/481/)

in reply to Mozilla

will I be able to turn it off? I donโ€™t want AI everywhere and in everything.
in reply to Mozilla

I asked ChatGPT for a short prompt for the candle picture:
โ€œThe image shows a birthday cake with lit candles in the foreground and a smiling woman in the background, likely in a room with several people.โ€
That's indeed longer than the Firefox text but not absurdly lengthy and detailed.
However, I'm impressed that Firefox does that locally.
in reply to Mozilla

Sounds like a positive and non-intrusive integration of AI in an application for once!
in reply to Mozilla

This is pretty cool; but I was unaware that Firefox has a built in PDF "editor" and not just viewer.
in reply to Mozilla

BORING. Where's the focus on the actual browser standards, like PWA support!?
in reply to Mozilla

Nice! Your PDF functionality was neat, so this looks promising!
in reply to Mozilla

literally moved to Firefox to get AWAY from stoopid AI hype. WHYYYYYY????!!!!
in reply to Mozilla

I don't actually Want that in my browser.

If I did, I suppose I'd run, I don't know, Fucking Windows?

in reply to Mozilla

Im ashamed to use this shitty browser that keeps getting worse. at this point the only way to save firefox is to fire all the devolpers who make these changes and never let them work on anything again except those little "square hole" baby toys.
This entry was edited (5 months ago)
in reply to Mozilla

I've been working on tools around issues like this for years, including a large amount of communication with folks who depend on alt text. Should this technology be available to those writing alt text, rather than just voluntarily to screen reader users, it will be immensely harmful to the accessibility of social media everywhere. Please do not do this. I'm happy to dive into why but please do not make it easy for people posting images to use this AI generation.
- Founder, Alt-Text.org
This entry was edited (5 months ago)
in reply to Hannah Kolbeck ๐Ÿณ๏ธโ€โšง๏ธ

as a screen reader user who needs alt text, I agree fully with what @hannah says here and in her comments to others in this thread. Because the standard for alt-text is a bare minimum standard of it simply being present, this tool is likely to flood the internet with poor quality alt-text that has not been edited or adapted to consider the context of what good alt text is for any given situation.
in reply to 0x5DA

@0x5DA I don't have capacity in this moment for the full depth, but it's a bit more complex. Writing alt text that's actually equalizing of access, especially on social media, requires knowledge of multiple layers of context in which an image appears. Similar to other AI types, AI description works impressively *sometimes* but falls down hard on many types of image commonly appearing on SM, often in ways not obviously bad to those writing alt text.

An example: bsky.app/profile/hannah.the-voโ€ฆ

in reply to Hannah Kolbeck ๐Ÿณ๏ธโ€โšง๏ธ

@hannah
hm, i see. and i guesss you don't think is a training thing..

is it worse than _no_ alt text?
a human will have a superior understanding of context, and should be strongly preferred -but many people don't all the same, and i'm doubtful that will change. this seems (from an outside perspective) to be a reasonable, if unsatisfying, solution.

in reply to 0x5DA

@0x5DA It can be useful for a person using a screen reader to have access to an AI description, but crucial there is that said user needs to know that that's the source of said description. There are repeated patterns of those who feel pressured to include descriptions but don't actually care about accessibility doing the absolute minimum, manifesting here as using the direct AI output without examination or editing.

So yes, a lack of inline alt text is better than AI gen inline.

in reply to Hannah Kolbeck ๐Ÿณ๏ธโ€โšง๏ธ

> said user needs to know that that's the source of said description.

this is a browser feature that the end user turns on or off themselves so yes they do know that. it's not being done by the publisher.

in reply to Spooky Sun

@sun @0x5DA That is true of the first steps here, however if you read the linked blog post you will find that extending easy use to publishers is a goal.
in reply to Hannah Kolbeck ๐Ÿณ๏ธโ€โšง๏ธ

thank you for your reply, I agree with you that this is not desirable unless it's somehow marked, and still could discourage a good attempt at making alt text. I apologize.
in reply to Spooky Sun

@sun @0x5DA I misunderstood their plans, and have written a partial retraction: social.alt-text.org/@hannah/11โ€ฆ


Important partial retraction:

I thank @jcsteh for the correction on the @mozilla image description AI plans criticized by me in these last few days. I and many others, most of whom I saw celebrating, believed that those plans included providing such tools to all users writing alt text in the web browser. They don't.

I retract my criticism of Mozilla, because they are already doing what I was asking.

I maintain my criticism of alt writer AI tools, and I think Jamie has validated my concerns.


in reply to Mozilla

this looks like an awesome accessibility feature! thank you for including an easy-to-understand rundown in the article of how this tool differs from the ethically-dubious LLM stuff that's everywhere nowadays
in reply to Mozilla

I've serious concerns about this. I get that most likely other browsers will do it, but I feel like if we let people generate alt captions via AI models the people who will benefit from those captions will most likely miss out of the details that only a human looking at an image will actually see and be able express? If anything the AI model will make things up and explain things in such detail that might not be relevant to the actual image/scene itself.
in reply to Mozilla

whilst I admire the desire to make accessibility easier and more present, instagram and similar have auto alt text and it is absolutely rubbish. The most famous users of it do not check it at all. My favourite artists will allow โ€œpossible three panel illustrated graphic comic of a green snake or reptile having a coversationโ€ alongside their image

We must fund alt text education and have suitable controls and alerts as to the usefulness of the generated content, alongside bringing these โ€œfeaturesโ€ into the world.

in reply to james

I am anti AI but this tool is obviously being launched so I donโ€™t need to throw in a โ€œthis ainโ€™t itโ€, so I just implore you to make sure that youโ€™ve havenโ€™t launched a bunch of fake accessibility into the world that only confuses the ones needing that accessibility.
This entry was edited (5 months ago)
in reply to james

@james Any modern ML-based image descriptions will wipe the floor with whatever Meta is using. That system is 10 years old and only recognizes basic objects and scenery, so you get imgae descriptions like "Image may contain: Outdoor, dog, two people, and text." Half the time it doesn't even transcribe the text.
Going by my own experiences using LLMs from OpenAI, Anthropic, and Google to describe images, there is never a scenario where I would rather have no description instead of one generated with AI, and I expect things to get better from here. Maybe a few people will be less inclined to describe their images if a browser can do it for them, but not everyone uses that browser, and I would guess most sighted people who are aware of alt text won't really know or care about the specifics of one browser's implementation of image descriptions. People either do or do not post alt text. If anything, maybe this announcement will make *more* people post alt text. Here on Mastodon, most people know what it is already, but I bet people are reading this post and thinking "Oh, I should probably describe my stuff so the AI doesn't do it worse."
in reply to james

@james This is where the AI feedback loop will ruin itself. AI generates alt text that contain errors. At the next step AI crapes the internet and gets trained on the alt text it finds, AI generated alt text that has errors. Lovely! Now AI will hallucinate on its own hallucinations. What could possibly go wrong?
in reply to Mozilla

This is good, but I feel like this shouldn't be a long term solution. Instead we should be pushing websites to serve alt text, especially since HTML already has support for it.
in reply to Mozilla

I hate that you guys are doing this instead of giving us a proper tablet UI. I just want to browse the web, folks.
in reply to Mozilla

This a good and well done use of AI. It may be particularly useful for posting images on mastodon and fediverse in general.
in reply to Mozilla

this is exactly how software become increasingly bloated
in reply to Mozilla

I wish Mozilla the best with this AI experiment, but I will transition to LibreWolf and use that browser until the AI is removed from Firefox.
in reply to Mozilla

if the point is to make more high-quality alt text available for people, can you imagine how much more effective it would be to ship a set of configurable UI hints, authorship guides, and a regular series of articles coaching authors which alt text writing patterns to cultivate and which to avoid? A metadata standard to allow embedding and attaching alt text alongside and inside images, with author attribution and strong multilingualization? This feels wasteful and ill-prioritized.
in reply to Mozilla

Really wish you'd let us set our own passcode instead of the device code as a backup to the biometrics to view our passwords. If anyone figures out our device passcodes, they can get our passwords if we have them saved in Firefox mobile. Given that bad guys are already aware of this, you need to be on top of it.
in reply to Mozilla

screen reader users? What about everyone who ever posts images?
in reply to Mozilla

sorry I yelled at you about AI. This is good, actually. Still suspicious of the urge Mozilla has of throwing away a bunch of money chasing the latest fads.
in reply to Mozilla

Who in the screen reader community asked for this? Aspirations are great, but AI generated captions might miss the mark for accuracy and value added...
Unknown parent

@paper @sasha92 I want to say explicitly that I don't fully agree w/ @hipsterelectron here, but I think that your dismissal of their concerns also substantially misses the mark

I've also talked at length about fine details that assist the Blind folks @weirdwriter talks about in using AI above while not incurring discussed harm. Making it easy for a screen reader user to knowingly get an AI description is relatively simple and worthwhile, but giving that tool to writers needs great care

in reply to Mozilla

thank you for doing this. Seems like a very useful feature. I hope you get the AI right for it.
in reply to Mozilla

These modules need to be opt-in from the beginning ... not mandatory with a "maybe we'll let you get rid of them some day, teehee" that seems to be your current plan.

I REALLY don't want that functionality, and especially not the bloat or other resource use. Guess it's time to start looking for a new browser.

in reply to Mozilla

this is awesome! An actually useful application of "AI", and on-device to boot. This and the on-device translations are the kinds of features I want more of!
in reply to Mozilla

except blind people need an alt-text depending on context and what you wish to say (which IA won't have)
in reply to Mozilla

These kinds of inclusive quality of life features are one of the best use cases for AI, so good on Mozilla for creating this.
in reply to Mozilla

I always hope that these kind of thinks will be some thing like Modules/plugins that needs to be installed as an extra
in reply to Mozilla

- couldn't you teach this AI to mark the text of web links?
in reply to Mozilla

Thats gonna be useful for web dev but i hope its privacy focusing since i would not use it
โ‡ง