HN2new | past | comments | ask | show | jobs | submitlogin

I’d say it’s not only determinism, but also the social contract that’s missing.

When I’m calling ‘getFirstChar’ from a library, me and the author have a good understanding of what the function does based on a shared context of common solutions in the domain we’re working in.

When you ask ChatGPT to write a function that does the same, your social contract is between you and untold billions of documents that you hope the algorithm weights correctly according to your prompt (we should probably avoid programming by hope).

You could probably get around this by training on your codebase as the corpus, but until we answer all the questions about what that entails it remains, well, questionable.



> we should probably avoid programming by hope

I use Cursor at work, which is basically VSCode + LLM for code generation. It's a guess and check, basically. Plenty of people look up StackOverflow answers to their problem, then verify that the answer does what they want. (Some people don't verify but those people are probably not good programmers I guess.) Well, sometimes I get the LLM to complete something, then verify the code is completed is what I would have written (and correct it if not). This saves time/typing for me in the long run even if I have to correct it at times. And I don't see anything wrong with this. I'm not programming by hope, I'm just saving time.


This increases the time you spend proofing other’s work (tedious) versus time you spend developing a solution in code (fun). Also, if the LLM output is correct 95% of the time, one tends to get more sloppy with the checking, as it will feel unnecessary most of the time.


> This increases the time you spend proofing other’s work (tedious) versus time you spend developing a solution in code (fun).

I find that I don't use it as much for generating code as I do for automating tedious operations. For example, moving a bunch of repeating-yourself into a function, then converting the repeating blocks into function calls. The LLM's really good at doing that quickly without requiring me to perform dozens of copy-paste operations, or a bunch of multi-cursor-fu.

Also, I don't use it to generate large blocks of code or complicated logic.


Just what I was thinking about lately, what if LLMs are not 95% precise, but 99,95%. After like 50-100 checks you find nothing, and you just dump the whole project to be implemented - and there come the bugs.

However ... your colleagues just do the same.

We'll see how this unfolds. As for now the industry seems to be a bit stuck at this level. Big models too expensive to train for marginal gains, smaller are getting better but doesn't help this. Until some one new idea comes in how LLMs should work, we won't see the 99.95% anyway.


one idea is obvious: multi-model approach. it partially done today for safety checks. the same can be done for correctness. one model produces result, different model only checks the correctness. optionally several results, second model checks correctness and selects the best. this is more expensive, but should give better final output. not sure, this may have been already done.


Yeah, I’m more worried about the middle ground that would make software quality (even) worse than it is today.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: