More

scw · 2026-06-11T15:37:00 1781192220

The original post [1] now includes an update:

  UPDATE! Within a day of this blowing up on Hacker News, AMD reached back 
  out to me and said they would be looking into the matter after all.

[1] https://mrbruh.com/amd2/

scw · 2026-04-11T21:50:45 1775944245

They're unlikely to find much unless they instead try “dispositive”.

bookofjoe · 2026-04-12T13:25:26 1776000326

Yikes! Corrected. Thank you.

scw · 2026-03-30T14:09:14 1774879754

TurboQuant has a specific benefit by compressing the KV cache at a negligible cost to quality. That mainly means that the context lengths can go up in models for the same amount of memory, however the KV cache only accounts for something like 20% of the overall model size, and this will not dramatically decrease memory demands in the way that some of the more sensationalist reporting has stated.

lostmsu · 2026-03-30T14:53:59 1774882439

In large providers KV caches are the main bottleneck, no?

scw · on Feb 23, 2025

Still active, but many fewer resources than in the past. Many backends like CUDA for Windows have been dropped and others pushed off to partners with varying levels of support. TensorFlow 2.19 is going to release soon without Python 3.13 support, it's hard not to imagine that resource constraints are at play.

scw · on Sept 26, 2024

The Microsoft stake finally allowed him to let loose and buy a keyboard with a working shift key.

scw · on Aug 10, 2024

Exciting concept! Note that the LLM corrected version does drop a full paragraph from the output at the bottom of the second page (starting with an asterisk and "My views regarding inflationary possibilities". I'm not sure if there is a simple way to mitigate this risk but would be nice to fall back on uncorrected text if the LLM can't produce valid results for some region of the document.

scw · on Nov 13, 2023

I recently had occasion to evaluate a database of 1200+ NVIDIA GPUs and can tell you that the only thing consistent about the model numbers is their inconsistency. For example, what is an RTX 4000? It could be the 2018 Quadro RTX 4000, the Quadro RTX 4000 Max-Q, or Quadro RTX 4000 Mobile (all Turing cards), but it could also be the RTX 4000 Mobile Ada Generation (Ada Lovelace card released 2023).

scw · on Feb 15, 2020

Reminds me of “boko-maru” of the made-up religion [1] in Vonnegut's Cat's Cradle.

1. https://en.m.wikipedia.org/wiki/Bokononism

scw · on Jan 28, 2020

The methodological approach and data sources are detailed in the associated post by JHU professor Lauren Gardner: https://systems.jhu.edu/research/public-health/ncov/

scw · on Oct 12, 2018

Check out: https://medium.com/microsoft-open-source-stories/how-microso...