Sometimes people are not looking for fully fleshed out high-effort answers. They want a pointer (to documentation, or a repo) to get going from someone more experienced.
Google search may throw up too much information and it is hard to make a choice. A one sentence answer from an expert may be enough to set them on the right path.
If you're seeking the kind of hands-off claude experience, obviously not. They are slow.
If you want to learn how these things work, train them locally, tinker, play with the code, grasp the fundamentals, or just out of sheer bloody-mindedness and principle refuse to tether the functioning of your application to a cloud API...
I have the same processor and ram. The dense 30b ish Gemma/Qwen really don't break 10 TPS with or without MTP. MOE's in this range feel more usable if they are smart enough for your work. Probably would still use hosted versions of these over local unless. MOE's feel somewhere between sonnet 3.5 and 3.7 to me. Dense feels between sonnet 3.7 and 4 in basic coding or local agentic capabilities (not close to those in chat or world knowledge)
From an economical point of view, there's almost no point to using these locally running models. The only things they are good for would be dirt cheap using the smaller/older models via some API as well. Recovering the investment for the hundreds/thousands you spend extra on hardware easily funds a lot of that. Unless you are using this stuff at scale, it's probably not going to be worth it.
I've dabbled with Qwen 3.x and Gemma 4 models a bit. They are alright but not that impressive. And my mac gets super hot if I use them for extended periods of time. It's just not very nice to use locally.
Some of these models will be a bit of a squeeze at Q4_0 I suspect; almost certainly they will be using CPU. Probably the 31B Gemma will be too much. Maybe not the Gemma-4 26B QAT.
But if you just want to play around rather than code, you really might find the Gemma 4 12B model worth mucking about with just so you've gone through the steps. Especially if you want to muck about with image analysis or audio transcription.
If you're writing PHP I think you could even find it good enough. I've been modestly surprised. You can do that basic fiddling with the Edge AI Gallery app, which can enable thinking and has a customisable system prompt and some agent support.
You could also try the 14B Deepseek R1.
Honestly even if it is not good enough, if you are anything like me, I think you'll find that going through this process is really quite educational — it has made a lot of things more concrete for me in a way that I have found reassuring and valuable.
M4 24GB here. You'll be fine, if you're anything like me minor latency is acceptable to obtain (a) privacy (b) reliability (c) CI/CD/guardrails (d) network independence (e) future-proofing vs. AIaaS. https://omlx.ai/ gives you intelligent local hardware based model download recommendations. That said it probably depends heavily on your workload, process and polish expectations. See also https://hackernews.hn/item?id=48089091
what are you using on yours? I've got a M4 Pro 24GB also. tried the open source gpt one. it's alright but I found it can get stuck at times. maybe just my config in LM Studio.
pi + Qwen3-4B-Instruct-2507 / Qwen3.6-35B-A3B-4bit / Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-4.5bit-msq depending how seat-of-pants I want to fly on memory.
I'm running an M3 on an Air with just 16GB. I can still get useful results without an internet connection in "chat mode". It's a different experience than using Claude, for sure, but it's workable. I typically use the Qwen variants these days.
This might be useful when ‘coding in chat mode’: I have a few scripts that I run in a project directory that takes a prompt from me, and creates a single long one-shot prompt that I can paste into a chat window and ask that any generating code is inside markdown code blocks for easier copy/pasting. Also, pardon the plug, but you can read my new tiny book free online that documents my experiences using agentic coding on my 16G Mac and my 32G Mac: https://leanpub.com/read/local-coding-agents
think of local models as "zero sugar" models and that's where we're at right now. I think it's crazy how good these models are compared to last year's frontier models
People are using 3090 (24GB) to run models, and it is the most cost effective way to run the. Yes, it is 2x faster, but memory wise you surely can spend 24gb on llm.
Also there are smaller, still usefull models that can run on 8GB or less.
If it is a one-off task, it doesn't matter if you use GUI or Terminal commands to do it. But more than once, terminal starts paying off IMO.
Here are some advantages.
- It is repeatable, you can do the same exact thing you did before. With ZSH history + FZF, recalling a command is a breeze.
- Auditability. The command in your shell history is there for you to revisit and servers as a permanent record of something you did (or didn't do).
- A command line doesn't make a mistake at 10th time, due to fatigue, inattention etc.
- Reusability. You may have to repeat the same command for different folders (or remote servers). A slight modification of the previous command will do it for you.
Not just _wrong_. It is confused! It is actually right in the second sentence.
This was Friday, Opus 4.6.
>I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
Walk. It's 50 meters — you're going there to clean the car anyway, so drive it over if it needs washing, but if you're just dropping it off or it's a self-service place, walking is fine for that distance.
This is actually a good diagnostic of whether the model is skimping on the thinking loop. Try raising thinking effort and it should get it right. Of course, if you're running this in a coding harness with a whole lot of extraneous context, the model will be awfully confused as to what it should be thinking about.
That may be enough in some cases.
Sometimes people are not looking for fully fleshed out high-effort answers. They want a pointer (to documentation, or a repo) to get going from someone more experienced.
Google search may throw up too much information and it is hard to make a choice. A one sentence answer from an expert may be enough to set them on the right path.
reply