More

WanderPanda · 2026-02-21T03:47:18 1771645638

Amazing work and people should really appreciate that the opportunity costs of your work are immense (given the hype).

On another note: I'm a bit paranoid about quantization. I know people are not good at discerning model quality at these levels of "intelligence" anymore, I don't think a vibe check really catches the nuances. How hard would it be to systematically evaluate the different quantizations? E.g. on the Aider benchmark that you used in the past?

I was recently trying Qwen 3 Coder Next and there are benchmark numbers in your article but they seem to be for the official checkpoint, not the quantized ones. But it is not even really clear (and chatbots confuse them for benchmarks of the quantized versions btw.)

I think systematic/automated benchmarks would really bring the whole effort to the next level. Basically something like the bar chart from the Dynamic Quantization 2.0 article but always updated with all kinds of recent models.

danielhanchen · 2026-02-21T12:08:06 1771675686

Thanks! Yes we actually did think about that - it can get quite expensive sadly - perplexity benchmarks over short context lengths with small datasets are doable, but it's not an accurate measure sadly. We're actually investigating currently what would be the best efficient course of action on evaluating quants - will keep you posted!

jychang · 2026-02-21T11:29:48 1771673388

> How hard would it be to systematically evaluate the different quantizations? E.g. on the Aider benchmark that you used in the past?

Very hard. $$$

The benchmarks are not cheap to run. It'll cost a lot to run them for each quant of each model.

danielhanchen · 2026-02-21T12:08:54 1771675734

Yes sadly very expensive :( Maybe a select few quants could happen - we're still figuring out what is the most economical and most efficient way to benchmark!

illusive4080 · 2026-02-21T12:55:42 1771678542

Roughly how much does it cost to run one of the popular benchmarks? Are we talking $1,000, $10,000, or $100k?

danielhanchen · 2026-02-22T09:58:44 1771754324

Oh it's more time that's the issue - each benchmark takes 1-3 hours ish to run on 8 GPUs, so running on all quants per model release can be quite painful.

Assume AWS spot say $20/hr B200 for 8 GPUs, then $20 ish per quant, so assuming benchmark is on BF16, 8bit, 6, 5, 4, 3, 2 bits then 7 ish tests so $140 per model ish to $420 ish/hr. Time wise 7 hours to 1 day ish.

We could run them after a model release which might work as well.

This is also on 1 benchmark.

Zetaphor · 2026-02-21T05:11:11 1771650671

This would be amazing

danielhanchen · 2026-02-21T12:09:01 1771675741

Working on it! :)

WanderPanda · 2026-01-19T17:14:23 1768842863

I find it hard to trust post training quantizations. Why don't they run benchmarks to see the degradation in performance? It sketches me out because it should be the easiest thing to automatically run a suite of benchmarks

Miraste · 2026-01-19T18:00:58 1768845658

Unsloth doesn't seem to do this for every new model, but they did publish a report on their quant methods and the performance loss it causes.

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

It isn't much until you get down to very small quants.

WanderPanda · 2025-12-06T08:43:17 1765010597

Wait but the one you linked seems to be pneumatically driven, while the op one is an actual combustion engine, right?

wcrossbow · 2025-12-06T08:59:58 1765011598

That’s true! Sorry for not mentioning that.

WanderPanda · 2025-11-19T02:31:54 1763519514

Small feedback if any of the Antigravity people read here: "Fast" is not a great name for the "eager" option (vs. "Planning") because "Fast" is associated with "dumb" in LLMs (fast/flash/mini). Probably "Eager" would be a more descriptive name

WanderPanda · 2025-11-13T17:41:00 1763055660

SWIFT is Belgian, though?

paganel · 2025-11-13T18:03:44 1763057024

It’s just a detail, the international financial market/banking system is basically under active US control, just look at what happened to Wegelin & Co. (at that point the oldest bank in Switzerland) when they thought that that was not the case.

WanderPanda · 2025-11-13T17:38:40 1763055520

Mechanically sure, but I still feel way safer when a Tesla (of any kind) is approaching me as a pedestrian or bicyclist than any other vehicle (except maybe Waymo) because I know they will alert the driver and brake if necessary. Any other car, especially older trucks, I'm quite afraid of, based on experience.

toomuchtodo · 2025-11-13T17:52:10 1763056330

> because I know they will alert the driver and brake if necessary.

This is not necessarily accurate.

https://x.com/TaylorOgan/status/1681240264554209281 ("Warning: Graphic; Last month, a 76-year-old pedestrian was tragically mowed down by a Tesla Model S in Brooklyn, NY. Both of his legs were torn off, according to witnesses. New data from the NHTSA says the Tesla was engaged on Autopilot/Full Self-Driving mode.")

https://teslamotorsclub.com/tmc/threads/model-y-doesnt-stop-...

https://www.tesladeaths.com/

I own several Teslas, would not trust them to stop for a pedestrian while in any driver assist mode. It may work, but if you rely on it, be prepared for consequences when it fails, as you are the responsible party when it fails.

romaaeterna · 2025-11-13T17:56:56 1763056616

That first crash sure doesn't sound like Autopilot/FSD, given that the car kept going after the crash.

toomuchtodo · 2025-11-13T18:00:03 1763056803

Tesla is currently renting vehicles for $60/day due to diminished demand; if one would like to test this personally, the cost is minimal. Avoid bodily injury whenever possible during testing.

https://www.youtube.com/watch?v=coeaqdexknE

https://electrek.co/2025/11/10/tesla-cant-sell-cars-so-renti...

Edit: @romaaeterna Are you willing to stand in front of it while it is at speed without a safety driver? I am trying to reconcile the mental model with risk appetite and potential gaps between priors and current state.

romaaeterna · 2025-11-13T18:09:37 1763057377

I have a Tesla and a drive FSD back and forth to work every day. It's great

Edit in response to your edit:

Would I risk myself standing in front of a FSD Tesla versus in front of an Uber or an average human-controlled car with the standard percentage chance of the human texting or being otherwise distracted or drunk or tired? I would take FSD. And I think that a mathematical rather than emotional evaluation of the odds would make risk-minded people do the same.

hn_acc1 · 2025-11-13T18:30:07 1763058607

Hopefully not anywhere near me. My family needs me.

behringer · 2025-11-13T18:06:24 1763057184

You would need to compare the data against the data of non-smart trucks. I'm guessing it's an order of magnitude more dangerous to be a pedestrian around a normal truck.

toomuchtodo · 2025-11-13T18:09:37 1763057377

Automatic emergency braking is a standard feature on many new cars, and will be mandatory for all new passenger cars and light trucks in the U.S. by September 2029. I am open to the assertion that Tesla's AEB, when scoped to pedestrian scenarios, is superior to other AEB systems, but this assertion requires independently verified data and evidence for support.

https://www.consumerreports.org/cars/car-safety/automatic-em...

https://www.consumerreports.org/cars/car-safety/aeb-with-ped...

https://en.wikipedia.org/wiki/Automated_emergency_braking_sy...

https://www.nhtsa.gov/sites/nhtsa.gov/files/2024-04/final-ru... [pdf]

https://www.caranddriver.com/news/a63027394/feds-automated-e...

fwip · 2025-11-13T18:14:15 1763057655

Dumb trucks don't encourage their drivers to turn their brains off.

sfpotter · 2025-11-13T18:12:30 1763057550

In my experience, Tesla drivers are some of the worst drivers on the road. They seem to pay the least attention to what's going on around them and are the most likely to pay fast and loose with the rules of the road. I don't know what's to account for this. There has been at least one study out of Berkeley that suggests that people who drive more expensive cars are more likely to break the rules of the road. It's possible that (at least here in Seattle), this is more likely to be the driver's first car since many people driving them are highly paid tech workers who often hail from others countries and who may not have as good of a grasp of driving in the US. Or it may be that this is enabled by autopilot itself (if your car is taking care of the safety you don't have to pay as much attention).

dexwiz · 2025-11-13T18:14:42 1763057682

The last reason is the biggest imo. Previously if you didn't pay attention you would crash relatively often. Now you aren't punished in the same way. In the same way spell check made us worse spellers. You aren't required to pay attention to detail, so you never develop that skill.

travisgriggs · 2025-11-13T19:19:32 1763061572

I taught my kids to drive both manuals and automatics. Usually we got the hang of driving an automatic, and then added manual in to the mix.

But with one of my kids, it was exactly as above. They scared the crap out of me, because they just would not focus well enough. We transitioned to a manual so that they were required to focus on the task at hand, and they then turned into a good driver.

(Aside: my kids, now college+ age have all gotten great deals on cars on college budgets, because they were willing to take a manual that cost far less due to reduced demand).

mschuster91 · 2025-11-13T18:17:17 1763057837

> There has been at least one study out of Berkeley that suggests that people who drive more expensive cars are more likely to break the rules of the road.

In Germany, we have a joke - BMWs don't need turn signal indicators, they have built-in precedence that comes with paying the money one needs to have to afford a BMW.

DonHopkins · 2025-11-13T18:33:46 1763058826

Who needs Fahrvergnügen when you can have Bezahltumvorfahrtsgefühligkeitsrechtserwartung (Paid-for-precedence-feeling-of-entitlement-expectation)?

HeyLaughingBoy · 2025-11-13T19:14:51 1763061291

Totally forgot about that. It's been decades since I've seen a far fig Newtons commercial.

MattDaEskimo · 2025-11-13T17:51:48 1763056308

May be misplaced considering Teslas have hit pedestrians. Additionally, many cars have pedestrian/object collision detection.

array_key_first · 2025-11-13T20:21:27 1763065287

Virtually every car made in the past 5-10 years emergency brakes. I mean, modern Honda's have the same level of autonomous driving as Teslas.

micwag · 2025-11-13T17:54:00 1763056440

Are collision avoidance system and automated emergency braking not standard in the US? Here in Switzerland basically every new vehicle has them.

maxeda · 2025-11-13T17:51:33 1763056293

Given the amount of pedestrians that have been killed by Telsas in "autopilot" mode, I can't say that I agree.

vardump · 2025-11-13T17:56:17 1763056577

Could you give me some numbers about deaths caused by Tesla versus other brands per mile driven? It seems to be very difficult to find enough information to draw any conclusions.

ben_w · 2025-11-13T19:04:08 1763060648

I believe the only information is crowd-sourced:

About: https://electrek.co/2025/02/06/elon-musk-approved-tesla-full...

Actual data: https://teslafsdtracker.com/

bink · 2025-11-13T18:59:14 1763060354

As a motorcyclist I feel far less safe when one is around me.

https://www.nbcnews.com/news/us-news/tesla-fatal-seattle-are...

roflchoppa · 2025-11-13T18:00:48 1763056848

https://youtu.be/_47utWAoupo?t=16

thinkingtoilet · 2025-11-13T18:09:00 1763057340

My Honda has a break alert system.

WanderPanda · 2025-10-29T17:34:13 1761759253

Damn TIL, I always used > Cursor: disable completions and forgot to turn it on again I need to try snooze then!

WanderPanda · 2025-10-29T17:30:07 1761759007

Why did you stop training shy of the frontier models? From the log plot it seems like you would only need ~50% more compute to reach frontier capability

srush · 2025-10-29T17:34:25 1761759265

We did a lot of internal testing and thought this model was already quite useful for release.

WanderPanda · 2025-10-29T17:38:29 1761759509

Makes sense! I like that you guys are more open about it. The other labs just drop stuff from the ivory tower. I think your style matches better with engineers who are used to datasheets etc. and usually don't like poking a black box

srush · 2025-10-29T17:47:27 1761760047

Thanks! I do like the labs blog posts as well though, OpenAI and Anthropic have some classics.

WanderPanda · 2025-10-29T08:02:22 1761724942

Until it isn't

WanderPanda · 2025-10-28T21:08:16 1761685696

Did you check out the STM32N6? It apparently has an h264 encoder