Ran a bunch of 3D Modeling benchmarks on Gemini 3.1 vs Gemini 3.
Unsurprisingly 3.1 performs a bit better. But surprisingly it costs 2.6x as much ($0.14 vs. $0.37 per 3D Model Generation) and is 2.5x slower (1m 24s vs. 3m 28s).
To me it feels like "lets increase our thinking budget and call it an improved model!"
I am building pretty much the same product as OP, and have a pretty good harness to test LLMs. In fact I have run a tons of tests already. It’s currently aimed for my own internal tests, but making something that is easier to digest should be a breeze. If you are curious: https://grandpacad.com/evals
Dimensionally accurate AI 3D modelling. My grandpa has a 3D printer but struggles to use any complex tools. So I am working on this chat interface to allow him to do some simple models.
So far he has triggered more than 150 generations. It’s getting better every model cycle and gives me something I enjoy working on.
Serial usecases ("fix this syntax errors") will go on Cerebras and get 10x faster.
Deep usecases ("solve Riemann hypothesis") will become massively parallel and go on slower inference compute.
Teams will stitch both together because some workflows go through stages of requiring deep parallel compute ("scan my codebase for bugs and propose fixes") followed by serial compute ("dedupe and apply the 3 fixes, resolve merge conflict").
I've been using 5.1-codex-max with low reasoning (in Cursor fwiw) recently and it feels like a nice speed while still being effective. Might be worth a shot.
Very interesting thanks! I wonder what would happen if you kept running Gemini in a loop for a while. Considering how much faster it ended it seems like there is a lot more potential.
Developers remember, you can always push back on design requirements instead of bringing in more bloat.
I was sitting next to one of the devs in a co-working space and he was trying to figure out some specific layout issue in react native. He spent 4 hours + installed a dependency to be able to do something completely tiny on a privacy policy screen. He asked me how I would do it, I told him to just ask if it can be laid out differently. He got it approved and implemented in 10 minutes. No bloat.
I'm finding it's better to use "javascript-less" UI frameworks [Pico.CSS, Skeleton, Bulma, Tailwind/daisyUI]. You get most of the benefits using good use of CSS. Anyone used these JS-free solutions and have recommendations?
My current choice is DaisyUI. It’s pretty good and because it’s based on tailwind you get the rest of the ecosystem benefits. Super easy to extend and change. Class bloat is much more manageable than raw tailwind.
Very cool and quite advanced compared to my tool. I've been working on something similar, although not an addon for SolidWorks, but a web SaaS. Initially started it as a tool to help my grandpa make some simple models (ChatGPT clicked for him SolidWorks was impossible).
Still have a long way to go, but if anyone wants to try you can do it here: https://grandpacad.com
If you want more free credits send me an email and I'm happy to give you some.
I am really curious about speed/latency. For my use case there is a big difference in UX if the model is faster. Wish this was included in some benchmarks.
I will run 80 3D model generations benchmark tomorrow and update this comment with the results about cost/speed/quality.
Interesting. I’m building a SaaS around this idea. And I managed to do things waaay more complex than that using LLMs. Especially “several times”. My AI can do a parametric trophy cup from one prompt in a couple of attempts, I would be shocked if it didn’t know how to make rectangular cube…
Unsurprisingly 3.1 performs a bit better. But surprisingly it costs 2.6x as much ($0.14 vs. $0.37 per 3D Model Generation) and is 2.5x slower (1m 24s vs. 3m 28s).
To me it feels like "lets increase our thinking budget and call it an improved model!"