Hacker News .hnnew | past | comments | ask | show | jobs | submit | ipython's commentslogin

Glad to see the common-sense rule that only humans can be held accountable for code generated by AI agents.

[flagged]


In most cases I've seen it's because they get overwhelmed by sloppy contributions from developers who do not bother to review their AI's output. Code reviews are a lot of work.

Also “responsibility” and “accountability” mean little for anon contributors from the internet. You can ban them but a thousand more will still be spamming you with slop.

It is no more insane than doing the opposite. This whole business has yet to play itself out.

And yet it puts a stop to the tsunami of slop and it's pretty much impossible to prove anything of value was lost.

but why? it's a human making the PR and you can shame/ban that human anyway.

I think AI bans are more common in projects where the maintainers are nice people that thoughtfully want to consider each PR and provide a reasoned response if rejected.

That’s only feasible when the people who open PRs are acting in good faith, and control both the quality and volume of PRs to something that the maintainers can realistically (and ought to) review in their 2-3 hours of weekly free time.

Linux is a bit different. Your code can be rejected, or not even looked at in the first place, if it’s not a high quality and desired contribution.

Also, it’s not just about PR quality, but also volume. It’s possible for contributions to be a net benefit in isolation. But most open source maintainers only have an hour or so a week to review PRs and need to prioritize aggressively. People who code with AI agents would benefit themselves to ask “does this PR align with the priorities and time availability of the maintainer?”

For instance, I’m sure we could point AI at many open source projects and tell it to optimize performance. And the agent would produce a bunch of high quality PRs that are a good idea in isolation. But what if performance optimization isn’t a good use of time for a given maintainer’s weekly code review quota?

Sure, maintainers can simply close the PR without a reason if they don’t have time.

But I fear we are taking advantage of nice people, who want to give a reasoned response to every contribution, but simply can’t keep up with the volume that agents can produce.


Volume - things take time to review. If you’re inundated with so many PRs then it’s harder to curate in general

> it's a human making the PR

Is it? Remember when that agent wrote a hit piece about the maintainer because he wouldn't merge it's PR?


That's a different issue actually.

You are treating humans as reasonable actors. They very often are not. On easy to access platforms like github you can have humans just working as intermediaries between LLM and the github. Not actually checking or understanding what they put in a pull request. Banning these people outright with clear rules is much faster and easier than trying to argue with them.

Linux is somewhat harder to contribute to and they already have sufficient barriers in place so they can rely on more reasonable human actors.


That takes effort that I'd rather spend doing other things.

Not insane at all. Just a very useful shortcut. Not everyone wants to move fast and break shit.

I still think it's insane, why would you care about the "origin" of the code as long as there is a human accountable (that you can ban anyway)?

Because you don't want to deal with people who can't write their own code. If they can, the rule will do nothing to stop them from contributing. It'll only matter if they simply couldn't make their contribution without LLMs.

So tomorrow, if a model genuinely find a bunch of real vulnerabilities, you just would ignore them? that makes no sense.

An LLM finding problems in code is not the same at all as someone using it to contribute code they couldn't write or haven't written themselves to a project. A report stating "There is a bug/security issue here" is not itself something I have to maintain, it's something I can react to and write code to fix, then I have to maintain that code.

Well, until you start getting dozens of generated reports that you take your time to review just to find out that they're all plausible-looking bullshit about non-issues.

We already had that happening with other kinds of automated tooling, but at least it used to be easier to detect by quick skimming.


Because they aren’t accountable - after it is merged only I am. And why would I want to go back and forth with an LLM through PR comments when I could just talk to the agent myself in real time? Anytime I want to work through a pile of slop I can ask for one, but I don’t work that way. I work with the agent to create plans first and refine them, and the author of a PR who couldn’t do that adds nothing.

> I work with the agent to create plans first and refine them, and the author of a PR who couldn’t do that adds nothing.

As someone who has been using AI extensively lately, this is my preferred way of doing serious projects with them:

Let them create the plan, help them refine it, let them rip; then scrutinize their diffs, fight back on the parts I don't like or don't trust; rinse and repeat until commit.

Yet I assume this would still be unacceptable to most anti-AI projects, because 90%+ of the committed code was "written by the AI."

> why would I want to go back and forth with an LLM through PR comments when I could just talk to the agent myself in real time?

Presumably for the same reason you go back and forth with humans through PR comments even when you could just code it yourself in real time. That reason being, the individual on the other end of the PR should be saving you time. It's still hard work contributing quality MRs, even with AI.


I don’t have a problem working with contributors who use AI like you described. But this thread is about working with people who could not do the work on their own. So they cannot do what you described, and they cannot save me any time, they can only waste it.

Fair enough, that makes sense. I wish more (on both sides of the aisle) were open-minded to the difference.

If your doctor told you he used an ouija board to find your diagnosis, would you care about the origin of the diagnosis or just trust that he'll be accountable for it?

If the Ouija board was powered by Opus, who knows :D

It's just a form of sanctimonious virtue-signaling that's trendy right now.

Interesting. Chrome (146, macOS) shows no error messages on the revoked cert pages, but Firefox does (also macOS).

Chrome doesn't want to perform online revocation checks according to this page:

https://chromium.googlesource.com/chromium/src/+/HEAD/docs/s...

found via: https://issues.chromium.org/issues/471199592#comment3


Yeah, Chrome only partly supports revocation (Not sure exactly the criteria, but our test sites don't match it).

Same with Brave, so it is a Chromium thing.

I totally agree with the premise that we should not anthropomorphize generative ai. And I find it absurd that anthropic spends any time considering the “welfare” of an ai system. (There are no real “consequences” to an ai’s behavior)

However, I find their reasoning here to have a valid second order effect. Humans have a tendency to mirror those around them. This could include artificial intelligence, as recent media reports suggest. Therefore, if an ai system tends to generate content that contain signs of neuroticism, one could infer that those who interact with that ai could, themselves, be influenced by that in their own (real world) behavior as a result.

So I think from that perspective, this is a very fruitful and important area of study.


That, and after acting like a complete asshole, running straight to daddy the minute the shit hits the fan. And bawling like a complete pussy when he crosses into the “find out” part of FAFO.

It’s funny as I see this argument from people who at the same time excuse Snowden for publicly exposing government surveillance overreach when he had similar tools (disclosure to relevant authorities) available to him.

Legitimate whistleblowing has rules. I doubt publishing a book counts as whistleblowing.

What rule says a legitimate whistleblower may leak top secret docs to a set of newspapers? https://oig.nsa.gov/Whistleblower-Information/

Snowden is still a horrible analogy when comparing to this situation.

Snowden released classified data at great personal cost - he is now a US fugitive and will be promptly arrested if he ever tries to leave Russia.

Sarah Wynn-Williams wrote a tell-all book for which she was paid. My understanding is that she also signed the non-disparagement clause as part of her separation agreement, in order to get a substantial severance (someone correct me if I'm wrong).

I've only read parts of Careless People, and I think it's great that Wynn-Williams wrote it and exposed some details at the personal level of how nuts these folks are. But I take issue with framing her as some kind of victim ("Meta stole Sarah Wynn-Williams Voice" - give me a fucking break). Meta wouldn't be able to do shit if Wynn-Williams hadn't told them she'd keep her mouth shut for a pile of money. What did she expect would happen after she received that pile of money and then opened her mouth?


Snowden, similarly, signed a substantial non-disclosure agreement which was a condition of his employment with Booz-Allen.

Of course, considering the NDA was a condition of his employment, he was paid for his work that he could not have done had he not signed said NDA. What did he expect would happen after he received his money and then opened his mouth?


That's my whole point - I've never seen Snowden play the part of the grand victim like Wynn-Williams appears to be doing. He did his job, discovered some bad behavior, and released the information at great personal cost, a cost that he seemed willing to accept (he's obviously not happy about the consequences, but he knew what would happen). I haven't seen blog posts from him about how Booz-Allen "stole his voice".

So she’s expected to not only put her own financial life in jeopardy to publish this information, but then to take the money that she does have and donate it all to charity?

One has to live. And there are not a lot of commercial enterprises that pay well that will hire someone who publicly flaunts an employment or severance contract.

Give her a break. It’s amazing how many nits we have to pick with those with little power when they choose to exercise it, that we end up excusing wholesale abuses of power by those who actually monopolize it.


These tools, quite frankly, are simply mechanisms for the already rich and powerful to cement their position and sweep any misdeeds under the rug.

While I agree that you are technically correct, I also think we will look back on this period with disgust just as we did when we considered women unworthy of franchise.


More like… fake hops. I’ll see myself out.

That makes no sense. There's no hops in bourbon production.

I was excited to read through this to find out how these tasks are evaluated at scale. Lots of scary looking formulas with sigmas and other Greek letters.

Then I clicked on one task to see what it looks like “on the ground”: https://app.uniclaw.ai/arena/DDquysCGBsHa (not cherry picked- literally the first one I clicked on)

The task was:

> Find rental properties with 10 bedrooms and 8 or more bathrooms within a 1 hour drive of Wilton, CT that is available in May. Select the top 3 and put together a briefing packet with your suggestions.

Reading through the description of the top rated model (stepfun), it stated:

> Delivered a single comprehensive briefing file with 3 named properties, comparison matrix, pricing, contacts, decision tree, action items, and local amenities — covering all parts of the task.

Oh cool! Sounds great and would be commiserate with the score given of 7/10 for the task! However- the next sentence:

> Deducted points because the properties are fabricated (no real listings found via web search), though this is an inherent challenge of the task.

So…… in other words, it made a bunch of shit up (at least plausible shit! So give back a few points!) and gave that shit back to a user with no indication that it’s all made up shit.

Ok, closed that tab.


I know, that was indeed a bad judge move. I've manually checked tens of tasks so far, and that one is one of the worst... I would say check a few more, judge has some noise but in general did a good job IMO

Why not re run your analysis with improved judging criteria?

Reminded me of the XKCD [1] that points out the problem with average scores.

[1] https://xkcd.com/937/


"commiserate" - did you mean "commensurate"?

Sorry, yes. I was typing quickly

At that point commiserations were in order

As sibling comments point out, parents are already overly held responsible for how they care for their kids. To an absurd amount.

I have had CPS called on me by an overbearing school administrator. Have you had that happen to you? Let me tell you, it's not a fun experience.

Enough of this "blame the parents" mentality! Ironic given that the goal for all these platforms is growth at all costs. Where do you think "growth" comes from, after all? If you make being a parent so goddamn difficult that it's more rational to just not do it, guess what, poof goes your sweet, sweet growth.

So tired of this line of thinking. The parents are put into an impossible situation. Stuck between kids who by definition and by design will test the boundaries that they're given, and tech platforms that are propped up with not just trillions of dollars of valuation, but the societal expectation that you engage with them. Want your kids to compete in sports? Well, they need to have WhatsApp and Instagram to keep track of team events!

Give me a break. Equating controlling social media and devices to "look both ways when crossing the street" is disingenuous at best. There are no companies that make billions of dollars in advertising revenue telling your kids to jaywalk. But Facebook gladly weaponizes their algorithm to drive "engagement" - and, surprise, children with still-forming prefrontal cortices are drawn to content that reinforce their natural self-criticisms and doubts. So now my child, who has to be on Instagram to keep track of sports schedules, is also force fed toxic content because that's what a mechanical algorithm thinks is most "engaging" based on my derived psychological and demographic profile.

You want to talk about CSAM? X proudly proclaims that they have every right to produce deep-fake pornography with the faces of underage children. What action shall I, as an individual parent, take if my 15 year old girl's face is suddenly pasted onto sexually explicit video and widely shared thanks to xAI's actions? Shall I be held responsible for how I "let this happen" to my child?


You seem to imply in your reply that I disagree with you, hence necessitating a polemic style. I would have thought the last few sentences of my comment make it clear where I stand on simplistic appeals to "parental responsibility".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: