Hacker News .hnnew | past | comments | ask | show | jobs | submit | avaer's commentslogin

Some people said in their manifestos that AI was going to be Open and democratized and would benefit all of humanity.

It was 100% a rugpull but charitably not quite 100% predictable.


What people?

What do you mean by "open"?

By "open" did those people mean "free as in beer"?

What does rugpull mean in this context?


OpenAI was originally founded on the parent's concepts, but obviously that changed at some point.

Currently, Elon is suing Altman because Altman turned it for profit. Oddly, in that Elon is taking the moral road?


> Oddly, in that Elon is taking the moral road?

Holding other people to standards he wouldn't hold himself to?

Sounds totally on-brand.


I was talking about OpenAI, referencing their original mission statement. Other companies have said similar things.

Rugpull means they "pulled the rug out" from underneath those who believed their mission statement, and decided instead to take over the world and screw everyone else.


  ~20 FPS at 518×378
But on what hardware? I couldn't find it.

It seems to be a relatively small model so it probably doesn't require much but it would be nice if projects like this grounded the numbers.


Whenever I hear this kind of argument, I basically let the person know their argument is fine as long as

  1) they are going to pay me competitive money to "go slowly", "polish my code", or whatever, or
  2) they are actively working on getting me UBI
Otherwise I just shake my head.

It is widely agreed users are not paying enough to cover the costs of inference. This is what "subscription" plans are. So, many users are losing the companies money.

This is not discussed publicly and is covered up for by raises, because there is growth and the hope that at some point the economics could work out. Which remains to be seen.

It's a variant on a Ponzi scheme. Investor hope is that at some point someone invents a way to stop losing money.

If at any point investors start to lose faith that this is going to be the case, the bubble pops.


If companies start to raise the token prices, at some point it won't be affordable to people. I think that no matter what they do they will just keep losing money. If they raise prices, less people will be buying the paid plans and if they don't, they are still losing money like now

What percentage of Anthropic’s and OpenAI’s revenue is subscriptions?

I hope it's not reductionist, but this kind of thinking always feels like cope in the face of The Bitter Lesson.

> It takes no time to include all this

In some cases it does. Which engine?


I don't use an engine for vibing, they tend to be built around managing content and using their editors. You can drive engines from code but they are more built for invoking code from content usually. So I just use frameworks like SDL, Raylib C# and ebitengine. Most stuff I've done recently was ebitengine because I felt golang was the best thing to have LLMs writing when I started them.

Right now I have my own framework which has a host written in Rust but game code is written in AssemblyScript - too early to tell how well this will work out but it is very promising to me right now.

If I were just getting started I would probably pick some framework in Rust, or maybe Bevy which I believe is considered an engine but is code-centered.


Actually, if I were just getting started I might pick an engine at this point. It's probably great at Godot now, and maybe with MCP servers can work on any other engine fine by having access to the editor.

It works, I've shipped this as a "local inference"/poor person's ollama for low-end llm tasks like search. The main win is that it's free and privacy preserving, and (mostly) transparent to users in that they don't have to do anything, which is great for giving non-technical users local inference without making them do scary native things.

But keep in mind the actual experience for users is not great; the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back. That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.


It's a one-time download shared by all websites that use the Prompt API.

What's a bigger issue is that the models on most standard PCs are both tiny and slow. I was going to try using the Prompt API to change the text of (infocom) text adventures on the fly. But for many PCs, this will currently be too slow to be feasible.


> That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.

Maybe the next big thing will be some software subscription premium offers with a bunch of 5090s as an extra.


> But keep in mind the actual experience for users is not great; the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back.

With MoE models, you could fetch expert layers from the network on demand by issuing HTTP range queries for the corresponding offset, similar to how bittorrent downloads file chunks from multiple hosts. You'd still have to download shared layers, but time to first token would now be proportional to active-size rather than total-size. Of course this wouldn't be totally "offline" inference anymore, but for a web browser feature that's not a key consideration.


> With MoE models, you could fetch expert layers from the network on demand

This is a common misconception, probably due to the unfortunate naming. Expert layers are not "expert" at any particular subject, and active-size only refers to the activated layers per token. You'd still need all (or most of all) the layers for any particular query, even if some layers have a very low chance of being activated.

All in all, you'd be better off with lazy loading the entire model, at least you'd know you have the capability to run inference from then on.


Ultimately it would amount to lazy-loading the model, but the parameters themselves would be fetched from the network as needed, which still decreases time-to-first-token. It's true that "expert" choices will span most of the model, regardless of any particular "subject" or "topic" choice, but if we simply care about time-to-first-token it's still a viable strategy.

Perhaps you could generate a few tokens before the entire model is downloaded, but since every token takes a potentially different "path" through an MoE model, you'd still need to wait for the entire download before getting deeper than a handful of tokens... which is not really a UX improvement imo.

Even at its worst, it's a minor UX improvement compared to having to download everything prior to getting to the first token. Ultimately we will complete the download, but we can still pick the best priority so that the first handful of tokens goes through.

> operating systems start reliably shipping their own prebaked models

Here's to hoping that that dystopia will never happen.


Would it be less dystopian for Operating Systems to ship with their own browser that ships with their own models? Or do you find the current situation where Operating Systems ship with browsers dystopian?

I find the very idea of this AI thing making its way through like a virus onto our computer systems (either in centralised form, as it mostly happens now, or installed locally, like this article writes about) quite dystopian. On the other hand I do not find the idea of the Internet browser as dystopian (even though a data and corporate behemoth like Alphabet being the entity behind Chrome is indeed dystopian, I agree on that).

On a second thought you're on to something, maybe if we hadn't let the Internet browser take over our computer-lives as much as it did in the last 20 or so years then Chrome (under its current manifestation, that is) wouldn't have happened. At least we are now aware of the dangers awaiting in front of us when it comes to AI.


Congratulations it already is here

> It works, I've shipped this as a "local inference"/poor person's ollama for low-end llm tasks like search

fantastic!

> the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back

sure but does this mean the model is lazily downloaded? that is, if I used this and I am the first time the model was called, the user would be waiting until the model was downloaded at that point?

that sounds like a horrible user experience - maybe chrome reduces the confusion by showing a download dialog status or similar?

also, any idea what the on disk impact is?


The model download is lazy and cached, so it's a one-time cost presumably across all origins (I assume so since the alternative would be a trivial DoS waiting to happen).

So it's once per browser, not once per site.

You can track the download state yourself and pop whatever UI you want.


chrome://on-device-internals reports "Model Name: v3Nano Version: 2025.06.30.1229 Folder size: 4,072.13 MiB" on a random Windows machine I just checked.

Thank You stranger! I would have assumed the size would vary based on whether your hardware supports the high-quality GPU backend (4 GB) or defaults to a smaller CPU-compatible version (3 GB) but the 22GB note on that page is really confusing. Even if it was including the model server where's the remaining 18GB going towards?

I'd imagine that the 22GB was decided through modelling various scenarios. For a start, it's not just a 4GB current model, it's 2x4GB to be able to update it without needing time when the computer is without a model, that's up to 8GB.

Then it's possible the model you get will scale with the CPU/GPU/RAM available, so if you have a 12GB GPU you probably get a better model, perhaps that's a 10-11GB model? At 2x that's 22GB.

Then consider that a machine is not static, GPUs/hardware come and go, VRAM allocation in integrated graphics changes, etc. You end up with just needing to pick a number and not confuse users.


(Former Chrome built-in AI team member here.)

This is part of it, and also we just didn't want to use up the last of the user's disk space! It's disrespectful to use up 3 GB if the user only has 4 GB left; it's sketchy if the user only has 10 GB. At 22 GB, we felt there was more room to breathe.

One could argue that users should have more agency and transparency into these decisions, and for power users I agree... some kind of neato model management UI in chrome://settings would have been cool. But 99% of users would never see that, so I don't think it ever got built.


> Storage: At least 22 GB of free space on the volume that contains your Chrome profile.

Yes, but that is then followed by:

  > Built-in models should be significantly smaller. The exact size may vary slightly with updates.

Lmao and here I am still staunchly treating Blazor’s 2MB runtime as a deal-breaker

If it doesn't fit on a floppy...!

Emacs had long ago exceeded eight megs!

> `> Storage: At least 22 GB of free space on the volume that contains your Chrome profile.`

Yes, I can read and comprehend English and you should assume I read the page. Because of the "At least" wording, I was curious what a person who has actually used the feature has noticed, aka, learning from people who have actually done it already.


Doesn't sound great, but consider how much better this is than every webpage trying to load their own models.

If it turns out useful enough I'm sure browsers will just start including it as (perhaps optional?) part of installation.


Is it actually privacy preserving? Chrome mostly exists to extract all the information from a user it can without immediately getting a lawsuit of greater penalty than what is gained through ads, military contracts, etc. Android isn't too far off either. I would welcome any alternative to this. I can see applications for this being things like "while device is at rest and charging summarize all of the users recent text communications" or whatever else as a legal loop hole for wiretap laws

>I can see applications for this being things like "while device is at rest and charging summarize all of the users recent text communications" or whatever else as a legal loop hole for wiretap laws

This just exposes an API for sites to use. If they wanted to do the types of spying you're cynically suggesting, they could just add it without an API and you'd be none the wiser. Chrome contains closed source components so you wouldn't even know.


Do you think no-one would notice that the Chrome download was 20GB larger?

Who says they'll be using the 20GB model? You'd hardly need frontier level intelligence to detect CSAM or ad keywords. Moreover, it's downloaded on first use, not bundled with the browser, so you won't really notice unless you're checking the chrome user data directory, but that also contains caches and other site data you'd likely chalk it up to random sites.

It's a lot easier to hide the language they need in a EULA for a feature like this than it would be elsewhere.

I appreciate you feel this is a cynical take. But have you seen the class action lawsuits against Google over the last 5 years? They exceed a billion dollars as far as I can remember and they are for more blatant things than this.


>It's a lot easier to hide the language they need in a EULA for a feature like this than it would be elsewhere.

Why would adding a ML API or library require an EULA change?


So they don't get sued when the LLM inevitably instructs someone to drink bleach, hurt a stranger, delete the production db(I guess vibe coders say this is a good thing now though?), etc.

Hey, I'm the Chrome PM for the built-in AI APIs. I wanted to jump in on the privacy concern mentioned here.

It’s a totally valid question, and transparency is the only way this can work. On-device processing is an important core design goal of these APIs.

There are NO logs of the input / output interactions sent to any server, not even for training purposes. The only metrics we have are on performance, stability, and other generic API usage signals like any other APIs. These are all controlled by existing user preferences in Chrome.


> I think it won't be long before the whole world is mapped, and "playable".

People already don't want to use VR, why would they get into/allow scanning with even less immediate value?

I agree with the sprit though, I just think rendering the world is gonna happen from a few generations of iteration on world modeling tech like World Labs/Marble.


Someone should tell the mathematicians if they use a calculator or a whiteboard or heavens forbid a computer they are "bad at math".

1) That's not related to chain of thought I was replying to. Someone asked about the "bad at math" and pointed out "but it seems good to me" so I added the color of why that might be the case. Your retort seems to imply I'm making an argument that because something uses tools for a job it cannot be good at the thing it's using a tool for. Which is not the case.

2) If you have something to say, just say it. Don't put words in my mouth and then argue with a thing I didn't say.


Right, but your narrative was incorrect and based on faulty premises, which you haven't acknowledged. That's fine, except you're still pressing the argument.

Can you please present a reasonable maths problem that I can bounce off GPT so we can see it fail? I can give you many hundreds of relatively complex problems, none of which have appeared in a textbook, that GPT has not only solved, but critiqued my own crappy solutions for. I'm only asking you for one counterpoint.


> your narrative was incorrect and based on faulty premises

I am referring to specific, documented behavior of LLMs. Google it.


Google any plausibly reasonable math problem, and even the terrible LLM that powers the Google search page will almost certainly solve it correctly for you.

I don't need to reconstruct my argument axiomatically from folk beliefs.


You seem to have misunderstood my comment. I'm happy to accept the fault for poor communication. But you're making it hard. You're signaling that any clarifications on my behalf will be treated as further arguments instead of some sort of shared desire to hear one another. I don't care to continue.

TLDR (of the paper): prediction markets are basically a scam with extra steps. If you're not in on the scam (top 3%), you're the sucker (the liquidity).

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: