Hacker News .hnnew | past | comments | ask | show | jobs | submit | reddit_clone's commentslogin

> Give a cursory answer

That may be enough in some cases.

Sometimes people are not looking for fully fleshed out high-effort answers. They want a pointer (to documentation, or a repo) to get going from someone more experienced.

Google search may throw up too much information and it is hard to make a choice. A one sentence answer from an expert may be enough to set them on the right path.


>64 GB

Thats the rub. I have an M4 with 48G. I wonder if it is worth testing this out.

My past attempts (with Ollama and various LLMs) were too slow to use.


I have a M5 MAX with 128, local models are toys compared to hosted ones. I've spent a lot of time and money trying to make it work even 1/2 as well.

It all depends on what you want to do, I guess.

If you're seeking the kind of hands-off claude experience, obviously not. They are slow.

If you want to learn how these things work, train them locally, tinker, play with the code, grasp the fundamentals, or just out of sheer bloody-mindedness and principle refuse to tether the functioning of your application to a cloud API...


I have the same processor and ram. The dense 30b ish Gemma/Qwen really don't break 10 TPS with or without MTP. MOE's in this range feel more usable if they are smart enough for your work. Probably would still use hosted versions of these over local unless. MOE's feel somewhere between sonnet 3.5 and 3.7 to me. Dense feels between sonnet 3.7 and 4 in basic coding or local agentic capabilities (not close to those in chat or world knowledge)

From an economical point of view, there's almost no point to using these locally running models. The only things they are good for would be dirt cheap using the smaller/older models via some API as well. Recovering the investment for the hundreds/thousands you spend extra on hardware easily funds a lot of that. Unless you are using this stuff at scale, it's probably not going to be worth it.

I've dabbled with Qwen 3.x and Gemma 4 models a bit. They are alright but not that impressive. And my mac gets super hot if I use them for extended periods of time. It's just not very nice to use locally.


Some of these models will be a bit of a squeeze at Q4_0 I suspect; almost certainly they will be using CPU. Probably the 31B Gemma will be too much. Maybe not the Gemma-4 26B QAT.

But if you just want to play around rather than code, you really might find the Gemma 4 12B model worth mucking about with just so you've gone through the steps. Especially if you want to muck about with image analysis or audio transcription.

If you're writing PHP I think you could even find it good enough. I've been modestly surprised. You can do that basic fiddling with the Edge AI Gallery app, which can enable thinking and has a customisable system prompt and some agent support.

You could also try the 14B Deepseek R1.

Honestly even if it is not good enough, if you are anything like me, I think you'll find that going through this process is really quite educational — it has made a lot of things more concrete for me in a way that I have found reassuring and valuable.


M4 24GB here. You'll be fine, if you're anything like me minor latency is acceptable to obtain (a) privacy (b) reliability (c) CI/CD/guardrails (d) network independence (e) future-proofing vs. AIaaS. https://omlx.ai/ gives you intelligent local hardware based model download recommendations. That said it probably depends heavily on your workload, process and polish expectations. See also https://hackernews.hn/item?id=48089091

what are you using on yours? I've got a M4 Pro 24GB also. tried the open source gpt one. it's alright but I found it can get stuck at times. maybe just my config in LM Studio.

pi + Qwen3-4B-Instruct-2507 / Qwen3.6-35B-A3B-4bit / Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-4.5bit-msq depending how seat-of-pants I want to fly on memory.

I'm running an M3 on an Air with just 16GB. I can still get useful results without an internet connection in "chat mode". It's a different experience than using Claude, for sure, but it's workable. I typically use the Qwen variants these days.

This might be useful when ‘coding in chat mode’: I have a few scripts that I run in a project directory that takes a prompt from me, and creates a single long one-shot prompt that I can paste into a chat window and ask that any generating code is inside markdown code blocks for easier copy/pasting. Also, pardon the plug, but you can read my new tiny book free online that documents my experiences using agentic coding on my 16G Mac and my 32G Mac: https://leanpub.com/read/local-coding-agents

Looks cool, I’ll checkout the book. Your download links (PDF and EPUB) are down for me.

> NoSuchKeyThe specified key does not exist…


i’m running m4 pro 48gb right now

omlx + gemma 12b 6 bit + pi

it’s feasible for sure

MoEs for speed (qwen 35b, cohere 30b, gemma 26b)

Dense for more methodical work (qwen 27b [reigning champ], gemma 31b, gemma 12b)

MoE i recommend 5bit+

Dense i think 4 bit is okay

Play with your context size, you don’t really need that much, have lazy loading for tools and mcps

my pi extensions for anyone looking for a skinny quick setup, i have use `--no-skills` right now too:

    "npm:pi-codex-goal",
    "npm:pi-simplify",
    "npm:pi-mcp-adapter",
    "git:github.com/elpapi42/pi-minimal-subagent",
    "npm:@wierdbytes/pi-statusline",
    "npm:@aliou/pi-guardrails",
    "npm:pi-lens",
    "npm:@juicesharp/rpiv-todo",
    "npm:pi-hashline-readmap",
    "npm:@mrclrchtr/supi-review",
    "npm:pi-cmux",
    "npm:@mrclrchtr/supi-context",
    "npm:pi-tool-search"

think of local models as "zero sugar" models and that's where we're at right now. I think it's crazy how good these models are compared to last year's frontier models

People are using 3090 (24GB) to run models, and it is the most cost effective way to run the. Yes, it is 2x faster, but memory wise you surely can spend 24gb on llm.

Also there are smaller, still usefull models that can run on 8GB or less.


I've an M1 Pro with 32GB ram and it's running pretty well

Converted all the tabs to spaces? :-)

You are right, this is not a rewrite like the Bun case.

The real news is, at 50M LOC, it is able to handle and do _something_ coherent.


If it is a one-off task, it doesn't matter if you use GUI or Terminal commands to do it. But more than once, terminal starts paying off IMO.

Here are some advantages.

  - It is repeatable, you can do the same exact thing you did before. With ZSH history + FZF, recalling a command is a breeze.

  - Auditability. The command in your shell history is there for you to revisit and servers as a permanent record of something you did (or didn't do).

  - A command line doesn't make a mistake at 10th time, due to fatigue, inattention etc.

  - Reusability. You may have to repeat the same command for different folders (or remote servers). A slight modification of the previous command will do it for you.

Vimium extension does that. Works well too. Works on Chrome and Firefox.

try out surfingkeys if vimmium isnt ur cup of tea

I am currently trying something called ShortCat, this is not just for the browser but works in other Mac applications too!

Look Ma, No mouse !


I noticed that too. Unless you _ask_ for a script, they throw away the scripts they write.

They are particularly bad at complex multiline parsing. Writing all sorts of weird/crude python/awk scripts and getting confused in the process.

I wish they would use Perl6/Grammer or Haskell/Parsec or similar and write better parsing scripts.


For the non haskell folks like myself, what would that look like/ why is parsing better? Perl i get


Perl has powerful regular expressions, but it only goes so far. Doing multiline/nested structured parsing is too painful.

Perl6/Raku has built in grammers that can do that idiomatically.

If you have a couple minutes, give this a glance. It will give you an idea.

https://andrewshitov.com/2018/10/31/a-simple-parser-in-perl-...

I am no expert in haskell either. But parsec is similar in concept.


Does this abstract over the package management systems (apt, yum , apk etc.)? Or do we still have to write distro specific install commands?


What can I run on a M4 Pro with 48 GB or RAM?


A sparser model like Qwen3.6 35B A3B is probably your best choice: https://qwen.ai/blog?id=qwen3.6-35b-a3b


The 35B MOE will run faster, but 48GB RAM is more than enough to run the 27B dense model as well. It's just that token/s will be on the lower side.


That.

And I had to look down every time I had use it. I am glad to see it go.


Not just _wrong_. It is confused! It is actually right in the second sentence. This was Friday, Opus 4.6.

>I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Walk. It's 50 meters — you're going there to clean the car anyway, so drive it over if it needs washing, but if you're just dropping it off or it's a self-service place, walking is fine for that distance.


This is actually a good diagnostic of whether the model is skimping on the thinking loop. Try raising thinking effort and it should get it right. Of course, if you're running this in a coding harness with a whole lot of extraneous context, the model will be awfully confused as to what it should be thinking about.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: