Is there some indication on how the different bit quantization affect performance? IE I have a 5090 + 96GB so I want to get the best possible model but I don't care about getting 2% better perf if I only get 5 tok/s.
It takes download time + 1 minute to test speed yourself, you can try different quants, it's hard to write down a table because it depends on your system ie. ram clock etc. if you go out of gpu.
I guess it would make sense to have something like max context size/quants that fit fully on common configs with gpus, dual gpus, unified ram on mac etc.
Any tips on that? My wife is starting her Etsy shop (https://www.etsy.com/shop/BravinaPrints) but she is not really getting any visits and still trying to figure out what kind of stuff she wants to draw.
Obv biased since I'm part of the team building it, but we think its super elegant. It uses a simple JSON manifest to connect to native APIs, so no need to build wrappers. Would love if you give it a try, and share your experience.
We're also implementing a UTCP-MCP bridge, as the last MCP server you'll ever need. (https://github.com/universal-tool-calling-protocol/utcp-mcp)
I've been trying Claude Code with Sonnet 4.0 for a week or so now for Rust code but it feels really underwhelming (and expensive since it's via Bedrock right now). Everytime it's doing something it's missing half despite spending a lot of time planning at the beginning of the session. What am I missing?
Same. I have a very efficient workflow with Cursor Edit/Agent mode where it pretty much one-shots every change or feature I ask it to make. Working inside a CLI is painful, are people just letting Claude Code churn for 10-15 minutes and then reviewing the diff? Are people even reviewing the code?
This sort of asynchronous flow will become more and more mainstream. chatgpt.com/codex, Google's Jules and to a degree Claude Code (even though that's local) are all following that pattern: phrase a goal, send it off to the agent, review the diff and request changes, rinse and repeat until ready for PR review.
For me this only works for fairly tightly scoped tasks that aren't super complex, but it does work. And I think the days of staring down the IDE will be coming to a close for all but the most complex coding tasks in the future.
Exact same experience. I have no clue what other people are doing. I was hunting for use cases where it could be used and it kept not working. I don't get it.
I'm not sure if it's Rust related. It manages to write the Rust code just fine, it's just that it doesn't seem to
- think of everything that is needed for a feature (fixable via planning at the beginning)
- actually follow that plan correctly
I just tried with a slightly big refactor to see if some changes would improve performance. I had it write the full plan and baseline benchmarks to disk, then let it go in yolo mode. When it was done it only implemented something like half of the planning phases and was saying the results look good despite all the benchmarks having regressed.
I've encountered human Rust programmers that exhibit this behavior. I think that Rust may be falling victim to its own propaganda, as developers come to believe that the borrow checker will fix their logic errors, not just flag up memory management that's at risk of buffer overflow or UAF.
Also tried it with Python. The autocomplete there was ok-ish(although to me the "wait for it -> review suggested code" cycle is a bit too slow), but getting it to code even standalone well-defined functions was a clear failure. I spent more time trying to fix prompts than it took to write the functions in the first place.
It's not planned for rv, this is whole other can of worms. Something like nix/docker should work but I'm not working on that part myself so I can't comment.