I just ran one of these locally on a Mac like this:
uvx litert-lm run \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm \
--backend=gpu \
--prompt="Generate an SVG of a pelican riding a bicycle"
The first time you run that it downloads 3.2GB to ~/.cache/huggingface/hub/models--litert-community--gemma-4-E2B-it-litert-lm
It can handle audio and image input too, which is pretty cool for a 3.2GB model. For images:
I'll be honest with you. My main ask for on device AI is that when I am typing "Going out for a quick j" it corrects to "jog" and not "Jonathan". I don't think it needs that many gigabytes.
The autocomplete of a decade ago is better than what we have now.
It’s harder now because emojis and draw-to-type as well as pen input. We didn’t have these things 14 years ago when “I’ll be right back” could be expanded from “I’ll b ri ba”
Their point is that OP used the same dot separated phrase to point out that there's a 0.8GB model and an audio/image model on device. Which reads weird.
But they could be cooking up a smaller one because the model card lists the Q_4 quants as being bigger than the mobile or text-only so I think we’ll need to wait for the Q_2_Distilled_Mobile_Textformer version. Still, just amazing work.
It can handle audio and image input too, which is pretty cool for a 3.2GB model. For images:
And for audio: (The pelican is rubbish, but it's only a 3.2GB file so the fact it even outputs valid SVG is impressive to me: https://gist.github.com/simonw/94b318afde4b1ce5ff67d4b5d0362... )