I’d say it’s not only determinism, but also the social contract that’s missing. ...

LoganDark · on Aug 25, 2024

> we should probably avoid programming by hope

I use Cursor at work, which is basically VSCode + LLM for code generation. It's a guess and check, basically. Plenty of people look up StackOverflow answers to their problem, then verify that the answer does what they want. (Some people don't verify but those people are probably not good programmers I guess.) Well, sometimes I get the LLM to complete something, then verify the code is completed is what I would have written (and correct it if not). This saves time/typing for me in the long run even if I have to correct it at times. And I don't see anything wrong with this. I'm not programming by hope, I'm just saving time.

layer8 · on Aug 25, 2024

This increases the time you spend proofing other’s work (tedious) versus time you spend developing a solution in code (fun). Also, if the LLM output is correct 95% of the time, one tends to get more sloppy with the checking, as it will feel unnecessary most of the time.

LoganDark · on Aug 26, 2024

> This increases the time you spend proofing other’s work (tedious) versus time you spend developing a solution in code (fun).

I find that I don't use it as much for generating code as I do for automating tedious operations. For example, moving a bunch of repeating-yourself into a function, then converting the repeating blocks into function calls. The LLM's really good at doing that quickly without requiring me to perform dozens of copy-paste operations, or a bunch of multi-cursor-fu.

Also, I don't use it to generate large blocks of code or complicated logic.

szundi · on Aug 25, 2024

Just what I was thinking about lately, what if LLMs are not 95% precise, but 99,95%. After like 50-100 checks you find nothing, and you just dump the whole project to be implemented - and there come the bugs.

However ... your colleagues just do the same.

We'll see how this unfolds. As for now the industry seems to be a bit stuck at this level. Big models too expensive to train for marginal gains, smaller are getting better but doesn't help this. Until some one new idea comes in how LLMs should work, we won't see the 99.95% anyway.

bubaumba · on Aug 26, 2024

one idea is obvious: multi-model approach. it partially done today for safety checks. the same can be done for correctness. one model produces result, different model only checks the correctness. optionally several results, second model checks correctness and selects the best. this is more expensive, but should give better final output. not sure, this may have been already done.

layer8 · on Aug 25, 2024

Yeah, I’m more worried about the middle ground that would make software quality (even) worse than it is today.