Hacker News .hnnew | past | comments | ask | show | jobs | submit | azakai's commentslogin

> Let's keep WebAssembly lean and fast!

Note that wasm is still lean and fast - WASI is not part of core wasm, but layered on top.

That is, it is possible to implement wasm without WASI. That is also true for other wasm proposals like WasmGC. It is very possible that parts of the ecosystem will not implement certain proposals if they don't make sense there (e.g. parts of the embedded ecosystem may never add GC, etc.).


I have seriously attempted to write my own WebAssembly 3.0 implementation recently, and while I did finish the whole thing [1] that left me a bitter taste about WasmGC which turned out to be very annoying to implement. In fact, I originally wanted to avoid GC but spectest assumed that GC is always available and I had no other option but implementing one in order to make use of spectest in the first place.

[1] https://github.com/lifthrasiir/wah/


Interfacing with GC is usually hard, how should have it been done?

Of course, but I'm talking about "annoyance". GC type system is especially annoying if you are not writing the full compiler.

Not the person you are responding to, but here:

> I believe that artificial intelligence has three quarters to prove itself before the apocalypse comes, and when it does, it will be that much worse, savaging the revenues of the biggest companies in tech. Once usage drops, so will the remarkable amounts of revenue that have flowed into big tech, and so will acres of data centers sit unused, the cloud equivalent of the massive overhiring we saw in post-lockdown Silicon Valley.

We have seen 8 quarters since. Has any of that come to pass?


Even if you see a real bubble or catastrophy in the making, predicting when it will pop is a fools game.

if you can't predict when it will pop then you should really not predict anything. I can also predict that Google will pop. I won't tell you when but I'll tell you that it will. I'll remain thoroughly unfalsifiable and I'll keep pushing the dates.

Exactly. Here is where this happens in the paper:

> Suppose one copies an LLM into AoE II and feeds into the AoE II-LLM ‘I feel lonely’ as an input. This AoE II-LLM replies: ‘I feel bad for you, maybe catch up with a friend? Closeness always helps in these situations’. One would be hard-pressed to make a convincing argument that, because of this response, an AoE II-LLM knows what helps in these situations

I don't see why one would be any more hard-pressed to make that conclusion about this system than a "normal" LLM.

That it is harder to "read" the data out is the only difference (the AoE II-LLM's output is encoded in game elements). But is ease of decoding an actual issue? If we can't understand a group of people that speak another language, does that say anything about them, or about us?


If you want examples of this, see the recent book "The AI Con"

https://www.goodreads.com/en/book/show/217432753-the-ai-con

which describes LLMs as "souped-up autocomplete", complex statistics that cannot truly understand anything. A more recent example is this paper:

https://zenodo.org/records/20071869

which says,

> [LLMs], as turbo-charged statistical models (recall their formal relation to logistic regression) can only but provide correlations.

And, of course, the Stochastic Parrot paper is the classic example in this area. It is from 5 years ago, but "LLMs only do statistics / can't understand" is very much alive and active among academics, even if it is a minority position.


None of those arguments claim "LLMs could not possibly be good models of some cognitive capacity"

The "some cognitive capacity" that's relevant to the current discussion is "consciousness".

What about the cognitive capacity of understanding?

The use of the term "understanding" in the quote you mentioned is a claim about metaphysics, not cognitive capacity.

From Merriam-Webster:

cognitive: as in reasonable; of, relating to, or involving conscious mental activities (such as thinking, *understanding*, learning, and remembering)


There was a lot new in calculus, but it also didn't come out of nowhere.

That Newton and Leibniz came up with similar ideas in parallel, independently, around the same time (what are the odds?), supports that.

https://en.wikipedia.org/wiki/Leibniz%E2%80%93Newton_calculu...


I had the same question. I think that could be answered by using the predicted activation, but I don't see that in the paper.

That is, rather than just translate activation to text, then text to activation, that final activation could then be applied to the neural network, and it would be allowed to continue running from there.

If it kept running in a similar way, that would show that the predicted activation is close enough to the original one. Which would add some confidence here.

But a lot better would be to then do experiments with altered text. That is, if the text said "this is true" and it was changed to "this is false", and that intervention led to the final output implying it was false, that would be very interesting.

This seems obvious but I don't see it mentioned as a future direction there, so maybe there is an obvious reason it can't work.


> But a lot better would be to then do experiments with altered text. That is, if the text said "this is true" and it was changed to "this is false", and that intervention led to the final output implying it was false, that would be very interesting.

They do essentially that with the rhyming example, changing "rabbit" in the explanation to "mouse" and generating text that's consistent with that change.


Thanks! I missed that part before.


The hardware can also add nondeterminism. GPUs reorder operations, leading to different results.

Vendors might also be running A/B testing or who knows what, even when you ask for a temperature of 0.

But, if you run a fixed model with temperature 0 on your local CPU, it will be deterministic (unless there are bugs).


A carb counting app might use API calls to these frontier models and then do some kind of analysis. It could see if different models agree or not, or multiple calls, and with how much variance.

So it would be more accurate to test the apps rather than the APIs, unless the goal is to warn people that just open chatgpt and ask there.


The open source app could in theory do that, but the paper's authors would be able to determine whether it did or not by reading its code, which they evidently did to replicate the API calls it made with their own script.

(And of course it would also be far more tedious to submit each picture 500 times manually using an app and manually log the response than using a script which is designed to collect the data automatically as fast as API rate limits permit)


fwiw, asking the model directly, "who is the ruler of England at present?" returns "Queen Victoria is the reigning sovereign of England."


Another way to put it: if training a model cost 72,000 tons of carbon, and it then gets used by 100 million people (typical of major models), the cost per person is 0.00072 tons.

Per the article, the average human uses over 5 tons per year (Americans: 18). Adding 0.00072 to 5 is not really noticeable.

(There is also the cost of inference, of course.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: