> The shortest path for the model is to implement it completely
Have you worked with LLMs???????????
“I disabled the test so it’s not run so now all the tests pass” is not a hypothetical it’s pretty common. LLMs frequently do shortcut learning. The reason why reviews are expensive is because you still need to do all the steps in order to understand if a shortcut is justified.
> "I avoided implementing this correctly because of migration concern for existing installations of this code I'm writing right now"
This one grinds my gears so bad, probably 1/4 of my job at this point is telling an LLM (either in my own editor or in review comments for someone's MR) "how about we do this right _before_ merging it, eh?".
Or:
foo = abc
<Maybe 4-10 lines of code>
if foo != None:
...
In my experience giving it "rules" not to do this does nothing, but a separate pass (could be by a different model but really just fresh context is enough) does okay
All the global memories I have stored are basically just this. Memories appear to be completely useless on their own, but, before a merger or a plan is finalized, having it go read them will definitely help course correct.
I really wish there was an "every x tokens" type trigger for every agent, where I could have it fire off a "Pause, go read the guide to make sure you're adhering fully." To help keep concepts fresh in context.
I used Fable to write a relatively small RPG. In the span of 2 hours it managed to do many interesting things. My favorite was when it wrote code with a race condition that could cause people to take more damage than they should, which it then defended as an acceptable tradeoff for parallelism.
Have you worked with LLMs??????????? “I disabled the test so it’s not run so now all the tests pass” is not a hypothetical it’s pretty common. LLMs frequently do shortcut learning. The reason why reviews are expensive is because you still need to do all the steps in order to understand if a shortcut is justified.