More

ponyous · 2026-02-20T11:23:52 1771586632

Ran a bunch of 3D Modeling benchmarks on Gemini 3.1 vs Gemini 3.

Unsurprisingly 3.1 performs a bit better. But surprisingly it costs 2.6x as much ($0.14 vs. $0.37 per 3D Model Generation) and is 2.5x slower (1m 24s vs. 3m 28s).

To me it feels like "lets increase our thinking budget and call it an improved model!"

ponyous · 2026-02-13T19:22:36 1771010556

I am building pretty much the same product as OP, and have a pretty good harness to test LLMs. In fact I have run a tons of tests already. It’s currently aimed for my own internal tests, but making something that is easier to digest should be a breeze. If you are curious: https://grandpacad.com/evals

ponyous · 2026-02-08T23:15:45 1770592545

https://grandpacad.com/

Dimensionally accurate AI 3D modelling. My grandpa has a 3D printer but struggles to use any complex tools. So I am working on this chat interface to allow him to do some simple models.

So far he has triggered more than 150 generations. It’s getting better every model cycle and gives me something I enjoy working on.

skyberrys · 2026-02-09T00:12:26 1770595946

Really cool! I tried to make this part I've been wanting but I think forcing myself to clearly describe it made me realize there is a simpler way.

ponyous · 2026-02-08T08:32:24 1770539544

I see no mention of that, but OpenAI already has "service tier" API option[0] that improves the speed of a request by about 40% according to my tests.

[0]: https://openai.com/api-priority-processing/

ponyous · 2026-02-05T18:57:03 1770317823

I think models are smart enough for most of the stuff, these little incremental changes barely matter now. What I want is the model that is fast.

energy123 · 2026-02-06T02:29:11 1770344951

I predict a bifurcation in usage.

Serial usecases ("fix this syntax errors") will go on Cerebras and get 10x faster.

Deep usecases ("solve Riemann hypothesis") will become massively parallel and go on slower inference compute.

Teams will stitch both together because some workflows go through stages of requiring deep parallel compute ("scan my codebase for bugs and propose fixes") followed by serial compute ("dedupe and apply the 3 fixes, resolve merge conflict").

newtwilly · 2026-02-06T06:37:34 1770359854

I've been using 5.1-codex-max with low reasoning (in Cursor fwiw) recently and it feels like a nice speed while still being effective. Might be worth a shot.

derac · 2026-02-05T22:39:24 1770331164

This is faster if their marketing is right, it uses significantly less tokens. Gemini 3 flash is very good as well.

ponyous · 2026-01-21T07:30:33 1768980633

Very interesting thanks! I wonder what would happen if you kept running Gemini in a loop for a while. Considering how much faster it ended it seems like there is a lot more potential.

ponyous · 2026-01-20T11:56:51 1768910211

Developers remember, you can always push back on design requirements instead of bringing in more bloat.

I was sitting next to one of the devs in a co-working space and he was trying to figure out some specific layout issue in react native. He spent 4 hours + installed a dependency to be able to do something completely tiny on a privacy policy screen. He asked me how I would do it, I told him to just ask if it can be laid out differently. He got it approved and implemented in 10 minutes. No bloat.

jadbox · 2026-01-20T16:06:12 1768925172

I'm finding it's better to use "javascript-less" UI frameworks [Pico.CSS, Skeleton, Bulma, Tailwind/daisyUI]. You get most of the benefits using good use of CSS. Anyone used these JS-free solutions and have recommendations?

ponyous · 2026-01-20T16:46:46 1768927606

My current choice is DaisyUI. It’s pretty good and because it’s based on tailwind you get the rest of the ecosystem benefits. Super easy to extend and change. Class bloat is much more manageable than raw tailwind.

beckler · 2026-01-21T13:34:19 1769002459

If you like how shadcn looks, you might want to consider basecoat: https://basecoatui.com/

ponyous · 2026-01-13T13:56:39 1768312599

Very cool and quite advanced compared to my tool. I've been working on something similar, although not an addon for SolidWorks, but a web SaaS. Initially started it as a tool to help my grandpa make some simple models (ChatGPT clicked for him SolidWorks was impossible).

Still have a long way to go, but if anyone wants to try you can do it here: https://grandpacad.com

If you want more free credits send me an email and I'm happy to give you some.

ponyous · 2025-12-11T21:56:11 1765490171

I am really curious about speed/latency. For my use case there is a big difference in UX if the model is faster. Wish this was included in some benchmarks.

I will run 80 3D model generations benchmark tomorrow and update this comment with the results about cost/speed/quality.

ponyous · 2025-11-24T09:09:08 1763975348

Interesting. I’m building a SaaS around this idea. And I managed to do things waaay more complex than that using LLMs. Especially “several times”. My AI can do a parametric trophy cup from one prompt in a couple of attempts, I would be shocked if it didn’t know how to make rectangular cube…