HN2new | past | comments | ask | show | jobs | submit | measurablefunc's commentslogin

Sampling over a probability distribution is not as catchy as "stochastic parrot" but I have personally stopped telling believers that their imagined event horizon of transistor scale is not going to deliver them to their wished for automated utopia b/c one can not reason w/ people who did not reach their conclusions by reasoning.

What's the latest novel insight you have encountered?

Not the person you asked, and “novel” is a minefield. What’s the last novel anything, in the sense you can’t trace a precursor or reference?

But.. I recently had a LLM suggest an approach to negative mold-making that was novel to me. Long story, but basically isolating the gross geometry and using NURBS booleans for that, plus mesh addition/subtraction for details.

I’m sure there’s prior art out there, but that’s true for pretty much everything.


I don't know, that's why I asked b/c I always see a lot of empty platitudes when it comes to LLM praise so I'm curious to see if people can actually back up their claims.

I haven't done any 3D modeling so I'll take your word for it but I can tell you that I am working on a very simple interpreter & bytecode compiler for a subset of Erlang & I have yet to see anything novel or even useful from any of the coding assistants. One might naively think that there is enough literature on interpreters & compilers for coding agents to pretty much accomplish the task in one go but that's not what happens in practice.


Which agents are you using, and are you using them in an agent mode (Codex, Claude Code etc.)?

The difference in quality of output between Claude Sonnet and Claude Opus is around an order of magnitude.

The results that you can get from agent mode vs using a chat bot are around two orders of magnitude.


Can you clarify a bit more about the this two orders of magnitude? In what context? Sure, they have "agency" and can do more than outputting text, but I would like see a proper example of this claim.

The workflow is not the issue. You are welcome to try the same challenge yourself if you want. Extra test cases (https://drive.proton.me/urls/6Z6557R2WG#n83c6DP6mDfc) & specification (https://claude.ai/public/artifacts/5581b499-a471-4d58-8e05-1...). I know enough about compilers, bytecode VMs, parsers, & interpreters to know that this is well within the capabilities of any reasonably good software engineer but the implementation from Gemini 3.1 Pro (high & low) & Claude Opus 4.6 (thinking) have been less than impressive.

sorry, needed to edit this comment to ask the same question as the sibling:

have you run these models in an agent mode that allows for executing the tests, the agent views the output, and iterates on its own for a while? up to an hour or so?

you will get vastly different output if you ask the agent to write 200 of its own test cases, and then have it iterate from there


Possibly a dumb question: but are you running this in claude code, or an ide, or basically what are you using to allow for iteration?

I'm using Google's antigravity IDE. I initially had it configured to run allowed commands (cargo add|build|check|run, testing shell scripts, performance profiling shell scripts, etc.) so that it would iterate & fix bugs w/ as little intervention from me as possible but all it did was burn through the daily allotted tokens so I switched to more "manual" guidance & made a lot more progress w/o burning through the daily limits.

What I've learned from this experiment is that the hype does not actually live up to the reality. Maybe the next iteration will manage the task better than the current one but it's obvious that basic compiler & bytecode virtual machine design in a language like Rust is still beyond the capabilities of the current coding agents & whoever thinks I'm wrong is welcome to implement the linked specification to see how far they can get by just "vibing".


That's roughly where I'm at too. I have seen people have some more success after having practices though. Possibly the actual workflows needed for full auto are still kind of tacit. Smaller green-field projecs do work for me already though.

It’s taken me a while to get good at using them.

My advice: ask for more than what you think it can do. #1 mistake is failing to give enough context about goals, constraints, priorities.

Don’t ask “complete this one small task”, ask “hey I’m working on this big project, docs are here, source is there, I’m not sure how to do that, come up with a plan”


The specification is linked in another comment in this thread & you can decide whether it is ambitious enough or not but what I can tell you is that none of the existing coding agents can complete the task even w/ all the details. If you do try it you will eventually get something that will mostly work on simple tests but fail miserably on slightly more complicated test cases.

There is prior art, so it’s not novel.

What goods?

Anything that AI makes more efficient to produce. You can make a lot of money if you can predict the scope of that.

So you don't have any actual examples. Just a general vague feeling about some magical outcome.

If you’re confident that AI won’t raise productivity significantly in a broad range of industries, there are likely some very attractive bets out there in the market to take the other side of.

Keep your financial advice for yourself instead of handing it out to random strangers on the internet. That way you have more "alpha" but since you already offered you should feel free to just give everyone else in the forum the benefits of your wisdom so they can also see how smart you are for betting that AI is going to make everything much cheaper.

> Anything that AI makes more efficient to produce. You can make a lot of money if you can predict the scope of that.

So slop? And maybe bespoke software?

Those aren't the goods that unemployed workers need.

AI won't lead to abundance, because of the simple fact it can't produce energy. The things people need will still be resource constrained, and many of those resources are getting redirected away from people to power AI.


Software is a "good", as far as economic statistics go.

AI is helping produce more software, right? Including more software that is for sale?[1] Or more online services that are for sale?

[1] One of the interesting things here is going to be liability. You can vibecode an app. You can throw together a corporation to sell it. But if it malfunctions and causes damage, your thrown-together corporation won't have the resources to pay for it. Yeah, you can just have the company declare bankruptcy and walk away, leaving the user high and dry.

After that happens a few times, the commercial market for vibecoded apps may get kind of thin. In fact, the market for software sold by any kind of startup may also get thin.


Software stopped being a good when it no longer came in a box with finite inventory, that you had to pay for only once. It's part of the services economy, same as insurance or car rental services, regardless of how the Fed classifies it.

So is the premise here that making more software is going to have a deflationary effect on the entire economy of material goods? If so then that's obviously nonsensical.

That's not what I said, no. More software is going to have a deflationary effect on software, which is part of the "goods" economy if it's sold in a box, or even (I think) if it's sold as a download. If it's just online, it's probably considered a service. Either way, more of it, more cheaply produced, decreases the value of each piece.

I haven't paid for any software in a long time & my monthly subscriptions for data storage & basic AI adds up to less than $100/month. Data storage is already as cheap as it could possibly get so AI is not going to make that any cheaper. More money in the economy is not going to have a deflationary effect, prices for everything will go up, including software services like data backups b/c cost of the service has nothing to do w/ software & the hardware is only going to get more expensive.

This time is different. The global system is not going to fall apart like isolated kingdoms in the past.

You seem very confident. This seems to imply you feel the haves will know when to leave enough on the table for the have nots to still feel like they are a part of the haves. I'm not so confident in that.

Far more likely is that we head back to a feudal era where data mining tech is used to identify and eliminate potential rabble-rousers. Once enough production is automated, all remaining have-nots are exterminated.

The weak link is that for “the haves” to have, the “have -nots” are needed. To have or to not is just a comparison, a millionaire needs the poor to be rich and to feel special otherwise when everyone is special nobody is.

People in technologically advanced societies have more than enough & the people who are not as advanced can not do anything that will have any effect on the people who own the fighter jets, missiles, robot factories, & "internet" satellites. The current system has no historical precedent. It is very close to an almost perfect panopticon w/ an associated media & police apparatus to keep everyone docile & complacent. Like I said, this time is different.

It will instead eventually fall apart in more thoroughly destructive ways. But not until it does a possibly-unrecoverably (at least in the medium term) amount of damage to civilization, humanity, and life on Earth first.

I agree but my point was that it will not be like any previous collapse.

yep. There is too much infrastructure now. Its going to take a lot for this to end.

“ Whatever it is you’re seeking won’t come in the form you’re expecting – Haruki Murakami”

Goliath's Curse by Luke Kemp covers it pretty well I think.

Likewise, thank you for the recommendation. I obviously haven't read Goliath's Curse yet, but it seems like Joseph Tainter's The Collapse of Complex Societies (1988) might also be interesting for the same readers.

Thanks for the recommendation.

Great. How do I use this in my life to make things better?

The SPRT is probably already making your life better: it's used to decrease the cost of medical trails, optimize classifications in high-stakes examinations (i.e. for medical certifications), detect defective manufacturing processes, etc. It sounds like this paper extends the method to groups of hypotheses, whereas the basic version is limited to a null hypothesis and an alternative hypothesis.

This helps with determining when have you observed enough data to make a decision.

A/B tests, monitoring metrics, health, quality control all use this.

If you use LLMs, you might use this to determine if a model update or prompt change impacts results using fewer tokens.


Implement a statistical software suite that ubiquitously uses this framework instead of the usual hierarchical mixed modeling tools whose assumptions often don't match what experiments were actually done.

You can search for the "peeking" problem in A/B testing.

SPRT also very likely helped win a major war that involved many nations.


This is AI slop & if you can't tell from a glance then you should figure out why you believe the nonsense on this page is actually sensible.

vibecoded cryptography will never stop being funny https://github.com/olserra/agent-semantic-protocol/blob/9d15...

You owe me a coffee and keyboard

Because "latent semantic vectors" sounds way cooler.

And my secret is epistemology. AMA.

mine is axiology, DNAMA. ;)

well my MA said my DNA is secret.

Compilers preserve semantics. That is part of their contract. Whether the output has instructions in one order or another does not matter as long as the output is observationally/functionally equivalent. Article does not do a good job of actually explaining this & instead meanders around sources of irrelevant "stochasticity" like timestamps & build-time UUIDs & concludes by claiming that LLMs have solved the halting problem.

C and C++ compilers are limited to preserving semantics for data-race free code only, though. They are allowed to turn a single load into multiple loads, or even a store into multiple stores: things that won't affect anything if you have only one thread accessing memory but for multithreaded programs, changing compiler or just making seemingly unrelated changes and recompiling can make existing data-race bugs have effects or not.

Attempting to get consistent results from floating-point code is another rabbit hole. GCC and clang have various flags for "fast math" which can enable different optimisations that reduce precision.

Before SSE, fp on x86 was done by the "x87" FPU which always had 80-bit precision, even if the type in the source code was 32 or 64 bits — and it used to be accepted to sometimes get more precision than asked for. Java got its "strictfp" mode mainly because of x87.


Runtime semantics are different from output semantics. You can build a nondeterministic program with a deterministic compiler, and the bytes of that program should be identical every time (notwithstanding the stupid metadata that most compilers inject, which is not semantically relevant).

Data races are undefined behavior¹ so in that case the compiler is still technically preserving semantics but if you use the proper primitives to remove undefined behavior (atomic operations, locks) for any shared state then the compiler will not generate code w/ undefined behavior & all modifications/mutations will be serialized in some order. You can then further refine the code if you want the operations to happen in a certain order. At the end of the day you must assume that the compiler will preserve your intended semantics otherwise we'd still be writing assembly b/c no high level specification would ever mean what it was intended to mean for the low-level executable machine model/target of the compiler.

¹https://cppreference.com/w/cpp/language/multithread.html


Yes, but the spooky thing is still that the code with the bug and the code affected by a change in the compiler that triggers the effect of the bug could be in two different code modules.

It matters for things like verifiable builds.

Full determinism is also highly prized by compiler writers because it massively simplifies the task of reproducing and debugging problematic executions.

You have to specify what exactly you're verifying.

Thank you for reading!

I wrote "We have not remotely solved the halting problem in the formal sense", which does not read like a claim that LLMs have solved the halting problem to me, but I'm open to rewording it. How would you put it?

I added in a bit about compiler contract, wdyt?


We haven't solve it in the formal sense b/c it is formally unsolvable so hedging adds no semantic content. It's not a formal article so you can phrase things however you want but an uncritical reading will leave the reader confused about what exactly you were trying to explain.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: