I just ran one of these locally on a Mac like this: uvx litert-lm run \ --from-h...

reactordev · 2026-06-05T20:44:36 1780692276

Not to mention the text-only 0.8GB version. Just crazy. You can have basic real-time conversations on-device that's video and audio aware now.

viccis · 2026-06-06T06:11:48 1780726308

I'll be honest with you. My main ask for on device AI is that when I am typing "Going out for a quick j" it corrects to "jog" and not "Jonathan". I don't think it needs that many gigabytes.

taffydavid · 2026-06-06T07:25:37 1780730737

Who doesn't enjoy a quick Jonathan now and then.

But seriously, wouldn't productive text on a 90s cell phone pass this test?

reactordev · 2026-06-06T11:08:10 1780744090

The autocomplete of a decade ago is better than what we have now.

It’s harder now because emojis and draw-to-type as well as pen input. We didn’t have these things 14 years ago when “I’ll be right back” could be expanded from “I’ll b ri ba”

yalok · 2026-06-06T04:00:26 1780718426

0.8GB is for text only. It's more like ~1.1GB if you include video/audio encoder

reactordev · 2026-06-06T16:25:26 1780763126

And your point is what? That’s more than 0.8GB text only if you include more than, text-only?

ranguna · 2026-06-07T09:14:24 1780823664

Their point is that OP used the same dot separated phrase to point out that there's a 0.8GB model and an audio/image model on device. Which reads weird.

simonw · 2026-06-05T21:38:41 1780695521

Have you seen a 0.8GB model file floating around yet? I couldn't find one earlier.

reactordev · 2026-06-05T23:06:31 1780700791

I think this is the one but it’s 0.8GB VRAM not 0.8GB size.

https://huggingface.co/google/gemma-4-E2B-it-qat-mobile-ct

But they could be cooking up a smaller one because the model card lists the Q_4 quants as being bigger than the mobile or text-only so I think we’ll need to wait for the Q_2_Distilled_Mobile_Textformer version. Still, just amazing work.

madduci · 2026-06-06T05:57:24 1780725444

Where is it? On ollama I see only the bigger one

reactordev · 2026-06-06T16:26:22 1780763182

I don’t use ollama, can you pull from HF?

rcarmo · 2026-06-05T23:12:29 1780701149

Is that actually QAT? the MLX Community models have that in their names, but these don't, and the upload dates don't quite line up.

__mharrison__ · 2026-06-05T23:19:14 1780701554

As an aside uvx is so pleasant to use... I wish Nvidia supported it as first-class rather than making folks jump through Docker hoops.

NamlchakKhandro · 2026-06-06T02:18:16 1780712296

I wish people would stop using python sure ai.

It's slow and the PKG resolution is way too flat.

qwertox · 2026-06-06T07:44:05 1780731845

What do you use?