Hacker News .hn (a.k.a HN2)new | past | comments | ask | show | jobs | submit | est31's commentslogin

I've been 8 years on this site, and I have 8 favorite comments. This comment just made it into a very exclusive club.

Have you tried the latest models at best settings?

I've been writing software for 20 years. Rust since 10 years. I don't consider myself to be a median coder, but quite above average.

Since the last 2 years or so, I've been trying out changes with AI models every couple months or so, and they have been consistently disappointing. Sure, upon edits and many prompts I could get something useful out of it but often I would have spent the same amount of time or more than I would have spent manually coding.

So yes, while I love technology, I'd been an LLM skeptic for a long time, and for good reason, the models just hadn't been good. While many of my colleagues used AI, I didn't see the appeal of it. It would take more time and I would still have to think just as much, while it be making so many mistakes everywhere and I would have to constantly ask it to correct things.

Now 5 months or so ago, this changed as the models actually figured it out. The February releases of the models sealed things for me.

The models are still making mistakes, but their number and severity is lower, and the output would fit the specific coding patterns in that file or area. It wouldn't import a random library but use the one that was already imported. If I asked it to not do something, it would follow (earlier iterations just ignored me, it was frustrating).

At least for the software development areas I'm touching (writing databases in Rust), LLMs turned into a genuinely useful tool where I now am able to use the fundamental advantages that the technology offers, i.e. write 500 lines of code in 10 minutes, reducing something that would have taken me two to three days before to half a day (as of course I still need to review it and fix mistakes/wrong choices the tool made).

Of course this doesn't mean that I am now 6x faster at all coding tasks, because sometimes I need to figure out the best design or such, but

I am talking about Opus 4.6 and Codex 5.3 here, at high+ effort settings, and not about the tab auto completion or the quick edit features of the IDEs, but the agentic feature where the IDE can actually spend some effort into thinking what I, the user, meant with my less specific prompt.


> I am talking about Opus 4.6 and Codex 5.3 here, at high+ effort settings

So you have to burn tokens at the highest available settings to even have a chance of ending up with code that's not completely terrible (and then only in very specific domains), but of course you then have to review it all and fix all the mistakes it made. So where's the gain exactly? The proper goal is for those 500 lines to be almost always truly comparable to what a human would've written, and not turn into an unmaintainable mess. And AI's aren't there yet.


You really do need to try the latest ones. You can’t extrapolate from your previous experiences.

I do not think they are impartial - all I can see is lots of angst.

I feel like we're talking about different things. You seem to be describing a mode of working that produces output that's good enough to warrant the token cost. That's fine, and I have use cases where I do the same. My gripe was with the parent poster's quote:

> Claude and GPT regularly write programs that are way better than what I would’ve written

What you're describing doesn't sound "way better" than what you would have written by hand, except possibly in terms of the speed that it was written.


yeah it writing stuff that's way better than mine is not the case for me, at least for areas I'm familiar with. In areas I'm not familiar with, it's way better than what I could have produced.

I still think the source code is the preferred form for modification because it is what you point the AI at when you want it to make a change.

Sure there might be md documents that you created that the AI used to implement the software, but maybe those documents themselves have been AI written from prompts (due to how context works in LLMs, it's better for larger projects to first make an md document about them, even if an LLM is used for it in the first place).

As for proprietary software, the chinese models are not far behind the cutting edge of the US models.


I think it's a complicated issue.

A lot of low quality AI contributions arrive using free tiers of these AI models, the output of which is pretty crap. On the other hand, if you max out the model configs, i.e. get "the best money can buy", then those models are actually quite useful and powerful.

OSS should not miss out on the power LLMs can unleash. Talking about the maxed out versions of the newest models only, i.e. stuff like Claude 4.5+ and Gemini 3, so developments of the last 5 months.

But at the same time, maintainers should not have to review code written by a low quality model (and the high quality models, for now, are all closed, although I heard good things about Minmax 2.5 but I haven't tried it).

Given how hard it is to tell which model made a specific output, without doing an actual review, I think it would make most sense to have a rule restricting AI access to trusted contributors only, i.e. maintainers as a start, and maybe some trusted group of contributors where you know that they use the expensive but useful models, and not the cheap but crap models.


It's the difference between raw LLM output vs LLM output that was tweaked, reviewed and validated by a competent developer.

Both can look like the same exact type of AI-generated code. But one is a broken useless piece of shit and the other actually does what it claims to do.

The problem is just how hard it is to differentiate the two at a glance.


> It's the difference between raw LLM output vs LLM output that was tweaked, reviewed and validated by a competent developer.

This is one of those areas where you might have been right.. 4-6 months ago. But if you're paying attention, the floor has moved up substantially.

For the work I do, last year the models would occasionally produce code with bugs, linter errors, etc, now the frontier models produce mostly flawless code that I don't need to review. I'll still write tests, or prompt test scenarios for it but most of the testing is functional.

If the exponential curve continues I think everyone needs to prepare for a step function change. Debian may even cease to be relevant because AI will write something better in a couple of hours.


This very much depends on the domain you work in. Small projects in well tread domains are incredible for AI. SaaS projects can essentially be one-shot. But large projects, projects with specific standards or idioms, projects with particular versions of languages, performance concerns, hardware concerns, all things the Debian project has to deal with, aren't 'solved' in the same way.

The tacit understanding of all these is that the valued contributors can us AI as long as they can "defend the code" if you will, because AI used lightly and in that way would be indistinguishable from knuthkode.

The problem is having an unwritten rule is sometimes worse than a written one, even if it "works".


You can't train LLMs on proprietary data, at least not if you want to make that LLM as accessible as Gemini. Otherwise random people can ask it your home address.

So it matters less than one would think. Also, ChatGPT can do 'internet search' as a tool already, so it already has access to say Google maps POI database of SMBs.

And ChatGPT also gets a lot of proprietary data of its own as well. People use it as a Google replacement.


>You can't train LLMs on proprietary data, at least not if you want to make that LLM as accessible as Gemini. Otherwise random people can ask it your home address.

If this is your only criteria I think you have a misunderstanding of what proprietary data is and ways companies can mitigate the situation in the inference stage.


From my opinion, the block layoffs were a test, to see how a) a software company manages with only half of its employees now that there's powerful LLMs, and b) how the remaining employees react to the imminent threat of them being laid off as well.

If block succeeds, we'll see more layoffs of that kind, probably even more extreme ones. You are not top senior level employee? Out. You don't single handedly cause 30% of the AI spend on your 15 person team? Out.

People say how in five years there won't be seniors because one stopped junior hiring... in five years the seniors won't be needed either. Already today, we have single person billion dollar exits, high schoolers making millions from food apps. This is thanks to LLMs.

The technology is there to replace most of the white collar work, it's just not applied enough yet. The economic system needs to adapt to not having labor being such a big redistributor.


I was there for three years. Every year a new top-level initiative, every year the new initiative failed to make a dent in the market. I think this shift was just an admission that the business is now in maintenance mode, harden up the existing cash cows and drop the new initiatives. That said, the existence of AI will impede hiring because if investors say "you should look into blub!", corp can say "our AI is already looking into it," rather than keeping extra humans on hand.

Yep.

I have started to say that it will be irresponsible for people to. Manually write code in a year or two from now - and I am setting the systems I work for up to that.

It will happen sooner than later.

Already now I can not compete with agentic programming.


> single person billion dollar exits

Single person, or single founder? I guess there's n0tch, but he hired people when he started making money. (There may very well be truly solo cases that I don't know about.)

A few others have commented that the job becomes a kind of hybrid. I already think of it like that. If you're a person who can talk to a client and then immediately implement something to solve a problem, that's still going to be part of the process for a while. The sales cycle is still going to be competitive, whether it's based on timing or insider connections. Software people are going to have to start thinking of themselves as small firms; you have to go close a deal and then your agent army can help you deliver.


That billion dollar figure is being thrown around for Steinberger's exit to OpenAI, but I couldn't find any reputable source claiming it. It might be a wrong number, idk.

> the block layoffs were a test, to see how a) a software company manages with only half of its employees now that there's powerful LLMs, and b) how the remaining employees react to the imminent threat of them being laid off as well.

The block layoffs were due to years of over hiring.

> Already today, we have single person billion dollar exits

It was nowhere near that much, and this was more a coordinated marketing move by OpenAI than an organic process.

> high schoolers making millions from food apps

This app is a sign of the massive bubble we’re in. The developer should be ashamed to make people think they could estimate calories from an image.

There’s trillions of dollars behind these AI companies succeeding. A lot of the hype you’re seeing is paid for. If you’re reading news articles, blogs, etc and not digging any further you’re being manipulated.


They also have their own global CDN, while Disney/HBO et al use various third party CDNs.

I suppose eventually we'll see something like Google's OSS-Fuzz for core open source projects, maybe replacing bug bounty programs a bit. Anthropic already hands out Claude access for free to OSS maintainers.

LLMs made it harder to run bug bounty programs where anyone can submit stuff, and where a lot of people flooded them with seemingly well-written but ultimately wrong reports.

On the other hand, the newest generation of these LLMs (in their top configuration) finally understands the problem domain well enough to identify legitimate issues.

I think a lot of judging of LLMs happens on the free and cheaper tiers, and quality on those tiers is indeed bad. If you set up a bug bounty program, you'll necessarily get bad quality reports (as cost of submission is 0 usually).

On the other hand, if instead of a bug bounty program you have an "top tier LLM bug searching program", then then the quality bar can be ensured, and maintainers will be getting high quality reports.

Maybe one can save bug bounty programs by requiring a fee to be paid, idk, or by using LLM there, too.


Google already has an AI-powered security vulnerability project, called Big Sleep. It has reported a number of issues to open source projects: https://issuetracker.google.com/savedsearches/7155917?pli=1

>where a lot of people flooded them with seemingly well-written but ultimately wrong reports.

are there any projects to auto-verify submitted bug reports? perhaps by spinning up a VM and then having an agent attempt to reproduce the bug report? that would be neat.


> Anthropic already hands out Claude access for free to OSS maintainers.

Free for 6 months after which it auto-renews if I recall correctly.


> Free for 6 months after which it auto-renews if I recall correctly.

They don't ask for credit card information when signing up this way, so even if true you won't be charged if you forget canceling.


No mention of auto renewal is made as far as I (and Claude) could determine.

Their OSS offer is first-hit-is-free.


> But the problem is that Toys R Us is spending $15, 20, or maybe even $50 (who knows?) to sell a $10 toy.

It's like how Uber and Airbnb in the early days were burning loads of cash to build market share. People went to these services because they were cheaper. Then they would increase prices once they had a comfortable position.

OpenAI is also in a rapidly transforming field where there are a lot of cost reductions happening, efficiency gains etc. Compared to say Uber which didn't provide a lot of efficiency gains.


A little bit, but the scale is another magnitude higher. I just saw a chart yesterday that shows Uber burning $18B, Tesla burning $9B, and Netflix burning 11B before reaching profitability. Open AI so far spent $218 Billion.


The opportunity is disproportionately greater as well though.

Unfortunately that doesn't change the fact even a small miscalculation could have an enormous impact. We are approaching levels of risk comparable in size to the subprime crisis of 2008.


Is it? AI isn't going to be a winner take all market. Competition between American AI labs and even Chinese ones have seen to that.

The winners for AI will be the product companies, because soon enough the top-tier models are all going to have good enough performance that companies can just pick the cheapest. It'll be a race to the bottom for inference and OpenAI is very poorly placed to compete in that kind of thing.


> It's like how Uber and Airbnb [...]

I disagree. It's like Uber and Airbnb in how they try to gain market share. Big difference: For Uber (and when it got big, basically everybody I know has used it once in a while) and Airbnb, you oaid for each transaction. With OpenAI, most peopme are on the free tier. And if there is something incredibly hard, it's converting free users to paid users. That will, IMHO, be the thong that blows (many) of the AI companies up. They won't ever reach a profit/loss-equality.


I agree with this. For the casual user, I feel AI is only a "nice to have".

> OpenAI is also in a rapidly transforming field where there are a lot of cost reductions happening, efficiency gains etc.

But also ever increasing quality requirements. So we can't possibly know at this point if this is a market with high margins or not.


And unlike Uber and Airbnb, OpenAI has no way to maintain marketshare. It’s a domain name with no moat.

Google has to pay Apple billions of dollars to make Google.com the default search engine. I just looked it up, over 15% of search revenue goes to pay to be the default search engine.

Every Android device defaults to Gemini.

Every Microsoft device defaults to Copilot.

I’d love to see where these cost reductions are. If costs are going to decrease rapidly why does OpenAI’s spending plan look so insane?


> Every Android device defaults to Gemini.

> Every Microsoft device defaults to Copilot.

I don't think it's right to say that these devices "default" to their vendors' AI software when it's impossible to replace it with something else. Yes I can install Claude as a standalone app but I don't have the OS-wide integration that Gemini does for Android for example.


Where are the cost reductions exactly? Except for using AI hype as an excuse for layoffs. Can you showe a reference? Genuinely interested.


Uber and Airbnb have network effects. You cant increase price when there is no cost in switching.


I dont see how network effects applies to Uber/Airbnb because nothing stops drivers/hosts from listing their property in multiple such apps


People continue using Airbnb because that's where the properties are listed. And owners keep listing properties because that's where the users are.


My point was that nothing stops hosts from listing their properties in AirBnb as well as a competitor. Unless AirBnb penalizes delisting or enforces price parity I guess?


Do you understand network effects? It’s not hand cuffs. I can also sell my rare baseball cards outside of ebay. But…

I'd say it has advantages and disadvantages.

One advantage is that whales can't play around with the stock price, say VCs dumping stocks at an unfortunate moment and putting pressure on the price. But it's also just wall street folks doing price manipulation for options schemes that can be an issue (it's illegal but has low enforcement if you are rich and well connected). Also lower chance of activist investors, and less of a quarterly pressure to show nice numbers, etc.

The advantage is also a disadvantage: minority shareholders of non-public companies have much less rights than those of public ones, and that includes employees. That's part of why you are dependent on the founder's goodwill on whether a startup exit can screw over rank and file employees or not. I'm not sure how much that danger is still out there if the company is doing tender offers, but it might still exist actually. Similarly, you can structure tender offers in a way that say former employees are disadvantaged, and many other arbitrary criteria.

Note that this depends greatly on the jurisdiction, e.g. in Germany there is legislation that's unfriendly to minority shareholders even for public companies, e.g. visible in the Varta takeover, imo part of why the idea of adding stocks to pensions will be ripe for money grabbing schemes of whales against the smaller owners.

Also employee of private company with tender offers, but not Stripe. Opinions my own.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: