Fwiw, with its predecessor's Qwen3.5-35B-A3B-Q6_K.gguf, on a laptop's 6 GB VRAM and 32 GB RAM, with default llama.cpp settings, I get 20 t/s generation.
Have you tried running llama.cpp with Unified Memory Access[1] so your iGPU can seamlessly grab some of the RAM? The environment variable is prefixed with CUDA but this is not CUDA specific. It made a pretty significant difference (> 40% tg/s) on my Ryzen 7840U laptop.
I hadn't tried that, thanks! I found simply defining GGML_CUDA_ENABLE_UNIFIED_MEMORY, whether 1, 0, or "", was a 10x hit to 2 t/s. Perhaps because the laptop's RAM is already so over-committed there. But with the much smaller 4B Qwen3.5-4B-Q8_0.gguf, it doubled performance from 20 to 40+ t/s! Tnx! (an old Quadro RTX 3000 rather than an iGPU)
Your link seems to be describing a runtime environment variable, it doesn't need a separate build from source. I'm not sure though (1) why this info is in build.md which should be specific to the building process, rather than some separate documentation; and (2) if this really isn't CUDA-specific, why the canonical GGML variable name isn't GGML_ENABLE_UNIFIED_MEMORY , with the _CUDA_ variant treated as a legacy alias. AIUI, both of these should be addressed with pull requests for llama.cpp and/or the ggml library itself.
I am waiting for the 2x usage window to close to try it out today.
If they are charging 2x usage during the most important part of the day, doesn't this give OpenAI a slight advantage as people might naturally use Codex during this period?
I had the opposite, but it was for a SSD for a raspberry pi 5. I asked it to look online for good price. It found a place, and I ordered it. It was not a well known site like Walmart, but I got what I needed.
Is there any increase in work constraints that wouldn’t cause this? It seems like it just means that industry interview practices are well calibrated and so high performers have an ease of finding another job.
How much money do they make from donations? I don't know but "In practice we frequently payed for travel and hardware."
Translation: nothing at all.
If such a fundamental project that is a revenue driver for so many companies, including midas-level rich companies like Google, can't even pay decent salaries for core devs from donations, then open source model doesn't work in terms of funding the work even at the smallest possible levels of "pay a reasonable market rate for devs".
You either get people who just work for free or businesses built around free work by providing something in addition to free software (which is hard to pull off, as we've seen with Bun and Astral and Deno and Node).
There's a wide gap between the arguments "the open source model doesn't work" and "the open source model failed to produce anything as good as uv after a couple decades of python tooling churn". The latter is why people are understandably unsure of where things go from here.
Seems like you're responding to the wrong person. The person I replied to said the open source model doesn't work. Nobody said the thing in your second quote.
I get the point you're making, but the way you introduced it isn't conducive to productive conversation.
I don't agree. If anything, I'd argue that the comments above were a lot less conducive to productive conversation than mine ("Would single maintainers of critical open source projects be a better situation?", "Are you not aware of foundations?", "But sure, the entire open source model doesn't work, lol").
The entire context of this subthread is whether or not the model that Astral was using was reasonable or not compared to an open source approach. From your initial comment, you've been touting alternatives, and the comment I responded to was giving specific examples of where you think the model worked. I don't think you've provided much evidence that there was a good alternative here, and when you're taking an opinionated stance, a productive conversation will sometimes involve people pointing out flaws they perceive in your arguments.
If Claude Code can parse these design documents, I would recommend making a skill to do an adversarial review of the document. Then just generate that review, do some minor edits to make it look like a human wrote it and send it back to them.
reply