Hacker News .hnnew | past | comments | ask | show | jobs | submit | codeflo's commentslogin

> My solution was to kexec into a new kernel+initramfs which has a DHCP client and cURL in it - that effectively stops any filesystem access while the image is being written over the disk, then to just reboot.

That's what I was expecting from the article.

Update: It's not obvious, but it turns out that this is a multipart article, and kexec is reserved for part 3: https://astrid.tech/2026/03/24/2/how-to-pass-secrets-between...


I totally missed part 2/3, thanks for linking!

I know the argument I'm going to make is not original, but with every passing week, it's becoming more obvious that if the productivity claims were even half true, those "1000x" LLM shamans would have toppled the economy by now. Were are the slop-coded billion dollar IPOs? We should have one every other week.

They’re busy writing applications for their dogs and building “jerk me off” functionality into their OpenClaw fork. Once they’re done you’ll be sorry you ever asked.

Last time I read about a Codex update, I think it mentioned that a million developers tried the tool.

Don't most companies use AI in software development today?

And yes, I know that some companies are not doing that because of privacy and reliability concerns or whatever. With many of them it's a bit of a funny argument considering even large banks managed to adopt agentic AI tools. Short of government and military kind of stuff, everybody can use it today.


On a serious note it would be great to run an experiment - get 20 people on here claiming 100x productivity, and let’s see what they muster up in 1 month working together.

I’m happy to put money in towards it ;)


Writing pieces of code that beat average human level is solved. Organizing that code is on its way to being solved (posts like this hint at it). Finding problems that people will pay money to have solved by software is a different entirely more complicated matter (tbh I doubt anyone could prove right now that this absolutely is or isn’t solvable - but given the change we’ve seen already I place no bets against AI).

Also even if agents could do everything the societal obstacles to change are extensive (sometimes for very good, sometimes for bad reasons) so I’m expecting it to take another year or two serious change to occur.


“ Finding problems that people will pay money to have solved by software is a different entirely more complicated matter ”

Come on you’re taking the piss, surely.


> Also, lines of code is not completely meaningless metric.

Comparing lines of code can be meaningful, mostly if you can keep a lot of other things constant, like coding style, developer experience, domain, tech stack. There are many style differences between LLM and human generated code, so that I expect 1000 lines of LLM code do a lot less than 1000 lines of human code, even in the exact same codebase.


>> the architect → developer → reviewer pipeline actually produces better results than just... talking to one strong model in one session?

> There's a 63 page paper with mathematical proof if you really into this.

> https://arxiv.org/html/2601.03220v1

I'm confused. The linked paper is not primarily a mathematics paper, and to the extent that it is, proves nothing remotely like the question that was asked.


> proves nothing remotely like the question that was asked

I am not an expert, but by my understanding, the paper prooves that a computationally bounded "observer" may fail to extract all the structure present in the model in one computation. aka you can't always one-shot perfect code.

However, arrange many pipelines of roles "observers" may gradually get you there


Perhaps this paper might be more relevant with regards to multi-agent pipelines https://arxiv.org/html/2404.04834v4

> Or am I reduced to buying an Apple TV device and unplugging the TV from the internet entirely ?

At least until TV makers wise up to that strategy and build a TV that requires internet access to unlock the HDMI port, that's the way to go.


I don't have a reference, but I remember reading that some Samsung TVs require internet access to get past initial setup and allow access to HDMI. So we might already be here..


It's likely that "knows" has no separate definition, but is used in some definition of "operator". If so, then "operator" should probably connect to "know", and "knows" shouldn't appear in the graph at all. But calling that edge case "broken" is a bit harsh, I think.


All of the examples on the linked page seem to be "good" outputs. Attribution sounds most useful to me in cases where an LLM produces the typical kind of garbage response: wrong information in the training data, hallucinations, sycophancy, over-eagerly pattern matching to unasked but similar, well-known questions. Can you give an example of a bad output, and show what the attribution tells us?


You got it exactly right. Guilty as charged. Over the coming weeks, we will be showcasing exactly how you can debug all of these examples.

I agree that attribution is most useful for debugging and auditing. This is a prime usecase for us. We have a post with exciting results lined up to do this. Should be out in a week, we wanted to even just get the initial model out :)


What I am reading here is that when the model is wrong, it still (at least sometimes) confidently attributes the answer to some knwoledge base, is that correct? If that is the case, how is this different to simply predicting the vibe of a given corpus and assinging provenance to it? Much less impressive imo and something most models can do without explicit training. All precision no recall as it were.


I think this was answered before, with the constraints of the architecture of the model. You can't expect something fundamentally different from an LLM, because that's how they work. It's different from other models because they were not designed for this. Maybe you were expecting more, but that's not OP's fault or demerit.


What you're saying fits my understanding/expectations. However the post and the user I am replying to seem to imply different. This makes me wonder, is my understanding incomplete or is this post marketing hype dressed up as insight? So I am asking for transparency.


It is not hype. You can try the model on huggingface yourself to see its capabilities. My reply here was clarifying that the examples we showed were ones where the model didn't make a mistake. This is intentional, because over the next few weeks, we will show how the concepts, and attribution we enable can allow you to fix this mistakes more easily. All the claims in the post are supported by evidence, no marketing here.


We are probably at the point where hype and insight aren't that much distinguishable other than what would bear fruit in the future, but I agree with you


This is often quoted, but I wonder whether it's actually strictly true, at least if you keep to a reasonable definition of "works". It's certainly not true in mechanical engineering.


The definition of a complex system is the qualifier for the quote. Many systems that are designed, implemented and found working are not complex systems. They may be complicated systems. To paraphrase Dr. Richard I. Cook’s ”How Complex Systems Fail” where he claims that complex systems are inherently hazardous, operate near the edge of failure and cannot be understood by analyzing individual components. These systems are not just complicated (like a machine with fixed parts) but dynamic, constantly evolving, and prone to multiple, coincidental failures.

A system of services that interact, where many of them are depending on each other in informal ways may be a complex system. Especially if humans are also involved.

Such a system is not something you design. You just happen to find yourself in it. Like the road to hell, the road to a complex system is paved with good intentions.


Then what precisely is the definition of complex? If "complex" just means "not designed", then the original quote that complex systems can't be designed is true but circular.

If the definition of "complex" is instead something more like "a system of services that interact", "prone to multiple, coincidental failures", then I don't think it's impossible to design them. It's just very hard. Manufacturing lines would be examples, they are certainly designed.


The manufacturing lines would be designed, and they'd be designed in an attempt to affect the "design" of the ultimate resulting supply chain they're a part of. But the relationship between the design of some lines and the behavior of the larger supply chain is non-linear, hard to predict, and ultimately undesigned, and therefore complex.

The design of the manufacturing lines and the resulting supply chain are not independent of each other -- you can trace features from one to the other -- but you cannot take apart the supply chain and analyze the designs of its constituent manufacturing lines and actually predict the behavior of the larger system.

AFAIK there's not a great definition of a complex system, just a set of traits that tend to indicate you're looking at one. Non-linearity, feedbacks, lack of predictability, resistance to analysis (the "you can't take it apart to reason about the whole" characteristic mentioned above"). All of these traits are also kind of the same things... they tend to come bundled with one another.


A complex system is one that has chaotic behavior.

(And no, this is not "my" definition, it's how it's defined in the systems-related disciplines.)


What's considered chaotic? Multiple causes, hard to track?


Consider systems that require continuous active stabilization to not fail because the system has no naturally stable equilibrium state even in theory. Some of our most sophisticated engineering systems have this property e.g. the flight control systems that allow a B-2 bomber to fly. In a software context you see these kinds of design problems in large-scale data infrastructure systems.

The set of system designs that exhibit naturally stable behavior doesn't overlap much with the set of system designs that deliver maximum performance and efficiency. The capability gap between the two can be large but most people choose easy/simple.

There is an enormous amount of low-hanging opportunity here but most people, including engineers, struggle with systems thinking.



IMHO, the key is where you add complexity. In software you have different abstraction layers. If you make a layer too fat, it becomes unwieldly. A simple system evolves well if you're adding the complexity in the right layer, avoiding making a layer responsible for task outside its scope. It still "works" if you don't, but it's increasingly difficult to maintain it.

The law is maybe a little too simplistic in its formulation, but it's fundamentally true.


You built this gear using the knowledge from your last gear. You didn't start with no knowledge, read a manual on operating a lathe, grab a hunk of metal and make a perfect gear the first time.


> It's certainly not true in mechanical engineering.

Care to exemplify?


Why the hell would you need a blockchain for automatic payments? Bots that performed financial transactions existed long before “crypto”.


The ordinary financial system is constrained by a lot of regulation and it won't let an AI open an account.


Good, KYC exists for a reason. Why does AI need to open an account, anyway? Just give it a debit card with a limit, not a whole new account and contract with a bank.


The limit of a debit card is the money in your account.

The bank would argue that an AI using your account on your behalf is fraud.


Those are much easier problems to solve, and surely already solved by some fintechs, than bringing cryptocurrencies to the minimum legal compliance and meeting performance requirements.


They don't intend to meet the legal compliance requirements. That's the reason for using cryptocurrencies — avoiding compliance.


My debit card has specific limits, far less than all my funds. There also exist pre-paid cards, ideal for things like this.


How unreasonable that I can’t make my computer pretend to be a fiduciary.

Those awful regulations won’t let me say the “computer ate my homework”. Imagine.


"We promise that we will not enforce" is perhaps a funny way not to grant a license, but making it sound like they do. This seems almost purposefully designed to look open-source to laypeople, while being carefully written in a way that ensures it will be vetoed by any corporate lawyer vetting the license.


Is "we promise not to enforce" legally binding I wonder, in which case it is de-facto a license? IANAL but it's an interesting concept.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: