I experimented with the Q2 and Q4 quants. First impression is that it's amazing ...

Kostic · 2026-02-03T20:53:19 1770151999

I would not go below q8 if comparing to sonnet.

anon373839 · 2026-02-04T12:30:18 1770208218

Yeah. Q2 in any model is just severely damaged, unfortunately. Wish it weren’t so.

cubefox · 2026-02-03T20:58:45 1770152325

> I experimented with the Q2 and Q4 quants.

Of course you get degraded performance with this.

Aurornis · 2026-02-03T22:40:15 1770158415

Obviously. That's why I led with that statement.

Those are the quant thresholds where people with mid-high end hardware can run this locally at reasonable speed, though.

In my experience Q2 is flakey, but Q4 isn't dramatically worse.

cubefox · 2026-02-04T09:48:31 1770198511

> Obviously. That's why I led with that statement.

Then why did you write this?

> It's always possible that there are some bugs in early implementations that need to be fixed later, but so far I don't see any reason to believe this is actually a Sonnet 4.5 level model.

margalabargala · 2026-02-03T20:59:37 1770152377

Wonder where it falls on the Sonnet 3.7/4.0/4.5 continuum.

3.7 was not all that great. 4 was decent for specific things, especially self contained stuff like tests, but couldn't do a good job with more complex work. 4.5 is now excellent at many things.

If it's around the perf of 3.7, that's interesting but not amazing. If it's around 4, that's useful.

Computer0 · 2026-02-04T01:50:24 1770169824

I still have yet to find a "Small" model that can use function calls consistently enough to not be frustrating. That is the most noticeable difference I consistently see between even older "SOTA" models and the best performing "SMALL" models (<70b).