For anyone else trying to run this on a Mac with 32GB unified RAM, this is what worked for me:
First, make sure enough memory is allocated to the gpu:
sudo sysctl -w iogpu.wired_limit_mb=24000
Then run llama.cpp but reduce RAM needs by limiting the context window and turning off vision support. (And turn off reasoning for now as it's not needed for simple queries.)
As the post says, LM Studio has an MLX backend which makes it easy to use.
If you still want to stick with llama-server and GGUF, look at llama-swap which allows you to run one frontend which provides a list of models and dynamically starts a llama-server process with the right model:
I didn't know about llama-swap until yesterday. Apparently you can set it up such that it gives different 'model' choices which are the same model with different parameters. So, e.g. you can have 'thinking high', 'thinking medium' and 'no reasoning' versions of the same model, but only one copy of the model weights would be loaded into llama server's RAM.
Regarding mlx, I haven't tried it with this model. Does it work with unsloth dynamic quantization? I looked at mlx-community and found this one, but I'm not sure how it was quantized. The weights are about the same size as unsloth's 4-bit XL model: https://huggingface.co/mlx-community/Qwen3.5-35B-A3B-4bit/tr...
"When you're experiencing the story, in the back of your mind, you always know that there is someone who created the story to tell you some kind of message."
MiniMax has an incredibly affordable coding plan for $10/month. It has a rolling five hour limit of 100 prompts. 100 prompts doesn't sound like much, but in typical AI company accounting fashion, 1 prompt is not really 1 prompt. I have yet to come even close to hitting the limit with heavy use.
What surprised me was that it works with oauth tokens from consumer AI subscriptions, and even free ones like Antigravity. Some of this is inherited from pi, but the author did some extra work to build a proxy to allow auth to work even from within the Excel sidebar.
I'm not sure how it works and what the role of the backend is. But I hope to test it side by side with 'Claude for Excel', which I also happened to install today.
(I tried both on something basic just to check they were installed correctly.)
opencode is great, but I'm a big pi fan - pi has a minimal core that is hackable+extensible by design and is aware of its own docs. So your coding agent can mod your coding agent exactly as you like.
https://pi.dev/
I don't know that work, but I agree with you in general because of forewords etc. Or even appendices. And translations by different translators.
I "grew up with" a specific translation of Lord of the Rings into Norwegian, for example. There are two. They are very different. But the editions also differ in whether they include the appendices, whose illustrations are used, and more.
No, but many of the names are different, and stylistically they are very different. Depending on whether a translation tries to be fairly literal, or sound as if it is written for the language it is translated to the way the result feels will be very different.
An example is the name Bilbo Baggins. In the "canonical" Norwegian translation, he's become Bilbo Lommelun. "Lomme" means pocket, and "lun" means snug, warm, or comfortable. It's not literal, but it fits the nature of hobbits well while referencing the "bag" in Baggins", and the connotations comes immediately in Norwegian without having try to deconstruct the name.
In this case, I think the newer "canonical" translation is generally considered unambiguously the best, but people often have favourite translations. E.g. my favourite Scandinavian translation of Walt Whitman's Leaves of Grass isn't even Norwegian, but an old Danish translation which sounds much "softer" (it's hard to explain)
Yeah, so really there are at least 3 translations of Lord of the Rings, to continue that example, and I was being a typical Bokmål user and ignored Nynorsk.
The title differences are also a good illustration of how different it can be:
Bokmål: Kampen om Ringen, Ringenes Herre (the first one is literally "the battle for the ring")
I'm curious which one you're using.
reply