Hacker News .hnnew | past | comments | ask | show | jobs | submit | estsauver's commentslogin

They're certainly welcome to do whatever they're like, and for a microkernel based OS it might make sense--I think there's probably pretty "Meh" output from a lot of LLMs.

I think part of the battle is actually just getting people to identify which LLM made it to understand if someones contribution is good or not. A javascript project with contributions from Opus 4.6 will probably be pretty good, but if someone is using Mistral small via the chat app, it's probably just a waste of time.


I have a recurring problem where I can't even read one of my favorite recipe websites (seriouseats.com) from my phone because the series of popups completely blocks the page, and can't be dismissed.

But if I ask Claude or Gemini for a nice version of the recipe, it works perfectly. I think there's a lot of own goals out there.


For what it's worth, most people already are doing this! Some of the subagents in Claude Code (Explore, I think even compaction) default to Haiku and then you have to manually overwrite it with an env variable if you want to change it.

Imagine the quality of life upgrade of getting compaction down to a few second blip, or the "Explore" going 20 times faster! As these models get better, it will be super exciting!


> Imagine the quality of life upgrade of getting compaction down to a few second blip, or the "Explore" going 20 times faster! As these models get better, it will be super exciting!

I'm awaiting the day the small and fast models come anywhere close to acceptable quality, as of today, neither GPT5.3-codex-spark nor Haiku are very suitable for either compaction or similar tasks, as they'll miss so much considering they're quite a lot dumber.

Personally I do it the other way, the compaction done by the biggest model I can run, the planning as well, but then actually following the step-by-step "implement it" is done by a small model. It seemed to me like letting a smaller model do the compaction or writing overviews just makes things worse, even if they get a lot faster.


The explore step with Codex-5.3-Spark and Opus 4.6 Fast both feel incredible.


I think the fast inference options have historically been only marginally more expensive then their slow cousins. There's a whole set of research about optimal efficiency, speed, and intelligence pareto curves. If you can deliver even an outdated low intelligence/old model at high efficiency, everyone will be interested. If you can deliver a model very fast, everyone will be interested. (If you can deliver a very smart model, everyone is obviously the most interested, but that's the free space.)

But to be clear, 1000 tokens/second is WAY better. Anthropic's Haiku serves at ~50 tokens per second.


Do you guys all think you'll be able to convert open source models to diffusion models relatively cheaply ala the d1 // LLaDA series of papers? If so, that seems like an extremely powerful story where you get to retool the much, much larger capex of open models into high performance diffusion models.

(I can also see a world where it just doesn't make sense to share most of the layers/infra and you diverge, but curious how you all see the approach.)


I think there's clearly a "Speed is a quality of it's own" axis. When you use Cereberas (or Groq) to develop an API, the turn around speed of iterating on jobs is so much faster (and cheaper!) then using frontier high intelligence labs, it's almost a different product.

Also, I put together a little research paper recently--I think there's probably an underexplored option of "Use frontier AR model for a little bit of planning then switch to diffusion for generating the rest." You can get really good improvements with diffusion models! https://estsauver.com/think-first-diffuse-fast.pdf


I'm very worried for both.

Cerebras requires a $3K/year membership to use APIs.

Groq's been dead for about 6 months, even pre-acquisition.

I hope Inception is going well, it's the only real democratic target at this. Gemini 2.5 Flash Lite was promising but it never really went anywhere, even by the standards of a Google preview


Taalas is interesting. 16,000 TPS for Llama on a chip.

https://taalas.com/


On a very old model, it's more like 16.000 garbage words/s


Llama 3.1 8B is pretty useful for some thing. I use it to generate SQL pretty reliably for example.

They are doing an updated model in a month or so anyway, then a frontier level one "by summer".


but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all.


I do wonder if there are tasks where 16k garbage words/s are more useful than 200 good words per second. Does anyone have any ideas? Data extraction perhaps?


A politician communication agent maybe...


Neat! I had been wondering if anyone was trying to implement a model in silico. We're getting closer to having chatty talking toasters every day now!




I wonder how many token per seconds can they get if they put Mercury 2 on a chip.


Its exciting to see, but look at the die size for only an 8b model


You can call Cerebras APIs via OpenRouter if you specify them as the provider in your request fyi. It's a bit pricier but it exists!


I used their API normally (pay per token) a few weeks ago. Their Coding Plan appears to be permanently sold out though.


I don't think it's a good comparison given Inception work on software and Cerebras/Groq work on hardware. If Inception demonstrate that diffusion LLMs work well at scale (at a reasonable price) then we can probably expect all the other frontier labs to copy them quickly, similarly to OpenAI's reasoning models.


Definitely depends on what you're buying, maybe some of the audience here was buying Groq and Cerebras chips? I don't think they sold them but can't say for sure.

If you're a poor schmoke like me, you'd be thinking of them as API vendors of ~1000 token/s LLMs.

Especially because Inception v1's been out for a while and we haven't seen a follow-the-leader effect.

Coincidentally, that's one of my biggest questions: why not?


What do you mean by Grow is dead since about 6 months ago? Not refuting your point, but I’m curious.


No new model since GPT-OSS 120B, er maybe Kimi K2 not-thinking? Basically there were a couple models it normally obviously support, and it didn't.

Something about that Nvidia sale smelled funny to me because the # was yuge, yet, the software side shut down decently before the acquisition.

But that's 100% speculation, wouldn't be shocked if it was:

"We were never looking to become profitable just on API users, but we had to have it to stay visible. So, yeah, once it was clear an Nvidia sale was going through, we stopped working 16 hours a day, and now we're waiting to see what Nvidia wants to do with the API"


The groq purchase was designed to not trigger federal oversight of mergers, so you buy out the ‘interesting’ part, leave a skeleton team and a line of business you don’t care about -> no CFIUS, no mandatory FTC reporting -> smoother process.


I am currently using their APIs on a paygo plan, I think it might just be a capacity issue for new sign ups.


Cerebras are on OpenRouter.


Once again, it's a tech that Google created but never turned into a product. AFAIK in their demo last year, Google showed a special version of Gemini that used diffusion. They were so excited about it (on the stage) and I thought that's what they'd use in Google search and Gmail.


Google did not create it; it is correct there was a Gemini that used diffusion, you could apply for access (not via API). It was okay


This is really, really cool. I would replicate and would be happy to share data on this if you want to let me help!

username at gmail if you want to chat


Sure! Pinged you with the email!


It works for me! (Edited link since original had laptops serial number in it: https://screen.studio/share/3CEvdyji)

Claude Code v2.1.37

EU region, Claude Max 20x plan

Mac -- Tahoe 26.2


Good to know it works for some people! I think it's another issue where they focus too much on MacOS and neglect Windows and Linux releases. I use WSL for Claude Code since the Windows release is far worse and currently unusable do to several neglected issues.

Hoping to see several missing features land in the Linux release soon.

I'm also feeling weak and the pull of getting a Mac is stronger. But I also really don't like the neglect around being cross-platform. It's "cross-platform" except a bunch of crap doesn't work outside MacOS. This applies to Claude Code, Claude Desktop (MacOS and Windows only - no Linux or WSL support), Claude Cowork (MacOS only). OpenAI does the same crap - the new Codex desktop app is MacOS only. And now I'm ranting.


What version are you on? Did you run a Claude update?


I'm on v2.1.37 and I have it set to auto-update, which it does. I also tend to run `claude update` when I see a new release thread on Twitter, and usually it has already updated itself.


Claude Code CLI 2.1.39 released a few hours ago fixes the problem. They didn't note it in the changelog though. Seems like a significant bug fix. ¯\_(ツ)_/¯


I think the closest that this has come is in the form of GitLab, which pretty famously did a ton of the corporate work in the format of a very open Handbook (https://handbook.gitlab.com/)

In the early years, it was extremely, extremely open and comprehensive. I've definitely looked through it when I wasn't sure how to handle something at work.


And that site is available as a git repository - on Gitlab of course:

https://gitlab.com/gitlab-com/content-sites/handbook


That's pretty cool. Wonder if it is deployed and updated religiously still. If they wanted to deploy an 'Agent' worker that source is goldmine for context.


I don't know about "religiously" but as of right now it was last updated 16 minutes ago:

https://gitlab.com/gitlab-com/content-sites/handbook


I have a hard time believing that the right move for most organizations that aren't already bought into an OpenAI enterprise plan is going to be building their entire business around something like this. This ties you to one model provider that has been having issues keeping up with the other big labs and provides what looks like superficially some extremely useful tools but with unclear amounts of rigor. I don't think I would want to build my business on this if I was an AI-native company that was just starting right now unless they figure out how to make this much more legible and transparent to people.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: