Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

> The shortest path for the model is to implement it completely

Have you worked with LLMs??????????? “I disabled the test so it’s not run so now all the tests pass” is not a hypothetical it’s pretty common. LLMs frequently do shortcut learning. The reason why reviews are expensive is because you still need to do all the steps in order to understand if a shortcut is justified.



> “I disabled the test so it’s not run so now all the tests pass”

Also:

"I implemented it this terrible way because of precedence in the codebase...that I just wrote"

"I avoided implementing this correctly because of migration concern for existing installations of this code I'm writing right now"

"I deferred this critical feature for the future, so we can deploy quicker"

or, my favorite,

"I hand rolled an buggy http server because you said the tool should be self contained"


Or bizarrely Claude askes about code churn like it matters.

Human, Do you want me to do it the right way? It will cause code churn in 90 files. Or I can take a shortcut and edit 3 files in a terrible way.

Edits 90 files for 12 lines each in 25 seconds...


“This is a significant rewrite that will take weeks”

Done after my potty break


> "I avoided implementing this correctly because of migration concern for existing installations of this code I'm writing right now"

This one grinds my gears so bad, probably 1/4 of my job at this point is telling an LLM (either in my own editor or in review comments for someone's MR) "how about we do this right _before_ merging it, eh?".

Or:

foo = abc

<Maybe 4-10 lines of code>

if foo != None:

    ...
In my experience giving it "rules" not to do this does nothing, but a separate pass (could be by a different model but really just fresh context is enough) does okay


All the global memories I have stored are basically just this. Memories appear to be completely useless on their own, but, before a merger or a plan is finalized, having it go read them will definitely help course correct.

I really wish there was an "every x tokens" type trigger for every agent, where I could have it fire off a "Pause, go read the guide to make sure you're adhering fully." To help keep concepts fresh in context.


I used Fable to write a relatively small RPG. In the span of 2 hours it managed to do many interesting things. My favorite was when it wrote code with a race condition that could cause people to take more damage than they should, which it then defended as an acceptable tradeoff for parallelism.


> “I disabled the test so it’s not run so now all the tests pass” is not a hypothetical it’s pretty common.

I've never seen that though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: