More

batch12 · 2026-04-22T13:50:01 1776865801

I have been needing to cut back on my subscription services, too. I also canceled.

batch12 · 2026-04-13T02:57:20 1776049040

Planted a few fruit trees, some strawberries, and vegetables. Every time I water them, I think about automated irrigation.

batch12 · 2026-04-10T01:57:24 1775786244

Is it a specific picture of the face or any picture of it?

batch12 · 2026-03-20T17:13:23 1774026803

This one is a call center metric. Similar to after call work or first call resolution. This one, I believe, is average handle time.

batch12 · 2026-03-15T16:56:36 1773593796

Sincerely, TH FART

batch12 · 2026-03-11T17:26:41 1773250001

Someone should tell that to the people who publish the gas station mugshot magazines.

batch12 · 2026-01-06T23:35:53 1767742553

Could they have added some swap?

geerlingguy · 2026-01-06T23:38:44 1767742724

No, just updated the parent comment, I added -c 4096 to cut down the context size, and now the model loads.

I'm able to get 6-7 tokens/sec generation with 10-11 tokens/sec prompt processing with their model. Seems quite good, actually—much more useful than llama 3.2:3b, which has comparable performance on this Pi.

Aurornis · 2026-01-07T12:40:42 1767789642

> I added -c 4096 to cut down the context size

That’s a pretty big caveat. In my experience, using a small context size is only okay for very short answers and questions. The output looks coherent until you try to use it for anything, then it turns into the classic LLM babble that looks like words are being put into a coherent order but the sum total of the output is just rambling.

layoric · 2026-01-06T23:47:17 1767743237

Thanks for posting the performance numbers from your own validation. 6-7 tokens/sec is quite remarkable for the hardware.

geerlingguy · 2026-01-06T23:49:26 1767743366

Some more benchmarking, and with larger outputs (like writing an entire relatively complex TODO list app) it seems to go down to 4-6 tokens/s. Still impressive.

geerlingguy · 2026-01-07T03:54:06 1767758046

Decided to run an actual llama-bench run and let it go for the hour or two it needs. I'm posting my full results here (https://github.com/geerlingguy/ai-benchmarks/issues/47), but 8-10 t/s pp, and 7.99 t/s tg128, this is on a Pi 5 with no overclocking. Could probably increase the numbers slightly with an overclock.

You need to have a fan/heatsink to get that speed of course, it's maxing out the CPU for the entire time.

nallic · 2026-01-07T12:43:12 1767789792

for some reason I only get 3-4 tokens/sec. I checked the CPU does not throttle or anything.

batch12 · 2025-11-16T03:34:44 1763264084

Sounds like something that could be weaponized. Order a bunch of 'gifts' to be shipped to a target via UPS/FedEx or whichever vendor helpfully pays the tarrifs for you. Then your victim has to fight collections or pay up.

batch12 · 2025-11-09T13:28:44 1762694924

Its a cool idea, just beware. Saw some dead kids and some NSFW among the otherwise interesting content.

spencerc99 · 2025-11-10T18:25:58 1762799158

really sorry you had to experience that! i added a NSFW flag - i'm just pulling content randomly by date and didn't know the Archive had that kind of graphic content :(

batch12 · 2025-11-02T12:28:26 1762086506

Lighten up. People spend their time doing lots of things they enjoy regardless of the value others place on their efforts. Instead of projecting embarrassment, go save the world if that makes you happy.