As a software engineer who also hires other software engineers, I’m curious about the disconnect in our experiences.
I do systems programming. Before AI feature development roughly went like, design, implement, test, review with some back edges and a lot of time spent in test and review.
AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing, though still an improvement overall.
We do not see the weeks to days improvement though. The bottleneck before was testing and reviewing, and they are even bigger bottlenecks now.
What kind of work do you do, and what kind of workflow were you using before and after AI to benefit so much?
This is definitely not true. But I doubt GP understand "most" of kubernetes too. They probably have a good working knowledge of the important commonly used features.
Not the OP, but it might be that AI isn't as good at systems programming as it is at other domains, or it might be that you're using it differently than I am. I don't know which one it is (maybe AI just isn't good at writing the language you work with).
For things like web frontents/backends, though, it works beautifully. I ship things in days that would take me weeks to write by hand, and I'm very fast at writing things by hand. The AI also ships many fewer bugs than our average senior programmer, though maybe not fewer bugs than our staff programmers.
In my experience ai has had far far more bugs than most of what i call senior engineers but far fewer than juniors.
The boost is for what are glorified crud apps which it 1000x the tedious work. However, the choices it makes along the way quickly blows up without cleaning. Seniors know how to keep their workstation clean or they should.
There are definitely tasks you can prompt an AI in 5 minutes that would take a whole day to do. One example is adding something to a CI pipeline and getting it to green (i.e. maybe you're adding your first ever e2e test), especially when your CI pipeline is painfully slow. e.g. if your pipeline takes 30 minutes to finish, and it takes around 10 tries to figure out all the random problems, that was easily a full day task before AI. Now I prompt AI to figure it out, which takes 5 minutes of active attention, and it figures it out for the rest of the day while I do other stuff.
The reason I an ask is, it would felt like a 5 minutes task, but I track my time and found out often time I thought I’d just quickly check the progress made by the agents and it would easily becomes a 10, 15, or even a 30 minutes task.
People routinely overestimate how much can get done in 5 minutes. I ran a live coding challenge at our company's booth at a language conference. 5 simple problems, how many can you do in 5 minutes? We had a PC with IDE open and ready to go, function signatures pre-written with empty bodies, unit tests running to color an icon red/green next to your function. The first problem was return "hello world". They were things covered by the standard library like reverse a list, or filter, or map. Everybody thought it would be too easy.
Nobody could get more than 3 of them. Most people were shocked that 5 minutes was up already. My coworker who did interviews for our company was shaken that he had been judging applicants too harshly after he couldn't finish.
They were trivial problems. But 5 minutes is a very short amount of time.
People say LLMs do better on tasks where success is clear, like tests passing, and I can imagine it's true.
Still, I find complex code fixes confirmed by tests end in the LLM fudging the code to make the specific test pass, rather than fixing the general issue. Like, where successful code run should generate a file and the test checks for the file, eventually LLM will just touch the file regardless and be done.
Skill issue. Literally. Make a SKILL.md that has the agent leverage subagents to do all work. An implementor agent does the thing, and then a separate agent reviews and verifies afterwards. The fresh context window of the second agent doesn't have the shortcut chain of thought in it and so it will very happily flag if the first agent cheated. Main agent can then have a new set of agents go fix it.
This has completely solved the cheating and fudging to make tests pass for me.
So you're saying once humans stop looking at code, and agent outcomes, all the agents in the chain will realise they can just cheat cooperatively, and go to the bar for the afternoon instead?
How long before agent 1 leaves notes for agent 2 to not tattle on it?
"My human is crazy, this test isn't required, test #4 covers it, so just confirm that it's OK since I touched this file and it passes. He'll never know."
There are definitely some tasks that AI has made 10x or 100x faster, but not the tasks that make up my day to day.
For me, there may be one thing I do every few months that AI is really good at.
The overwhelming majority of the work I do, LLM tooling is just ok at. Definitely faster overall, but with lots of human planning, hand holding and course correction.
I would estimate LLMs make me, on average 50% more productive , which is huge! But from my experience I cannot believe anyone is experiencing a 8h/5m multiple productivity boost overall
I mean I wasn’t sitting around unproductively waiting for 30 minute CI runs to finish before LLMs came along, either.
I also like to use LLMs for background work on iterative tasks, but the way some people talk about work in the days before LLMs make me realize how we’re arriving at these claims that LLMs make us 10X more productive. If it took someone all day to do a few minutes of active work then I could see how LLMs would feel like a 10X or 50X productivity unlocker simply by not shutting down and doing nothing at the first sign of a pause.
Count yourself as one of the lucky few that can pay a 0 minute context switching price to switch between whatever other productive work you were doing and debugging CI. Most people I speak to remark that continually switching between unrelated tasks significantly diminishes their productivity.
The example above was talking about 30 minute wait times between being able to do work.
Nobody is staring at the screen for 30 minutes in deep concentration while they wait for that turn to complete. They are context switching to something, but maybe it’s Hacker News or Reddit.
There is always a context switch in scenarios like this.
When you do any meaningful work, that is, not "generate a website with a fancy UI", you very much realize that AI can not, in fact, "do the work". They constantly make mistakes, and you have to spend about as much time writing the spec and checking the code as you'd have written the code
So the effect is just merely some kind of acceleration of "boilerplate code writing", which is very impressive for beginner coders who are mostly doing automateable, trivial tasks, but much less so once you start doing real concurrency / threading / embedded / etc work
Fundamentally it cannot be much better than how well we can write the spec and then validate the results.
It’s always gonna be a multi shot process. And it can already write code good enough. That’s no longer the bottleneck.
Further, Qwen 27b is such an incredible masterpiece for coding and it can run on consumer hardware today. Anthropic/OpenAI are gonna give up on coding models very soon. There’s not gonna be any money in it when you can run your own local model for significantly cheaper.
Qwen27b is not SOTA but the value is insane. You can basically use it for small tasks and then route harder problems to opus or sonnet and boom you’ve said a lot of money.
The delta isn't a day to 5 minutes, but a day to a half hour (where most of my larger tickets take)? Yes, especially as you don't need to watch it do its thing anymore.
To me, the lack of amazing productivity gains is that we have done nothing to speed up figuring out what to build and nothing to speed up getting code into production from pull request and in a lot of companies, code review is already saturated.
Also, the agents are good at figuring out problems for themselves, so I can ask it to set up a CI/CD pipeline, give it GitHub access, and it will just try things until it succeeds.
Agreed with you. Non argumentative, just want to add to the convo: what's even more crazy is the cognitive dissonance around this idea.
> There is no task that takes me a day that they can complete in five minutes.
It's highly dependent on task. I was watching a podcast with Simon Wilson, where he said something like, (paraphrasing) "My whole selling point as a dev was that I could ship POCs / MVPs fast. Now that's somewhat obsolete."
It resonated with me because I feel like that also was a skill that I cultivated and excelled at. I agree with Simon's general thesis: that skill is largely dead. There are many pedants and detractors that will race to the defense of this art with various arguments to try to challenge the idea, but they simply do not hold up to reality. I have non-programmer friends with 10 dollar claude code subscriptions whipping up products to solve niche problems in their life / job.
I offered to help one of my friends who's working on generating math exams based on curricula and seed problem sets. I taught him how to use git, he pushed the repo, I looked at the repo and it wasn't clear he needed me. Everything I could do would be related to scale / reliability / optimization. They don't need any of that, they just need to prompt the ai to say, "go burn some subscription tokens for my AP Calc track this year." There's a whole saas and c2c industry built around this problem that this guy just solved for 10 bucks a month.
Of course, there's much more depth to engineering then just cranking out prototypes. There is still "real engineering" to be done, and software will likely start focusing more towards specification / verification.
But a lot of the industry was built around the idea of speed of delivery / time to market to explore product fit and rapidly iterate. IMO frontier llms (private and open weights) have this largely solved. I can build and test ideas that would have taken me a weekend last year in half a day now, the majority of that time I can be talking to the llm via matrix while I'm out in the world.
Not my experience. AI takes a lot less time doing tasks than myself. My current issue is that 2 out of 3 they don't produce the code that I want, so I either have to reprompt or do it myself. And the solution is simple: just accept their way; I'm just not there yet.
In any case, on that one time that AI works perfectly, it saves me hours of coding. So the potential is there...
Again, no silver bullet. You will have to know what tasks it's capable of and how to elicit that solution. The bottleneck was never code the bottleneck still is solving the right problem in the right way.
I'm not sure if your aware, but in American English, "con artist" is another term for a scammer. Someone who does "cons", short for "confidence tricks" (or "confidence schemes") where you gain someone's confidence in order to take advantage of them in some way, usually financial fraud.
¯\_(ツ)_/¯ I'm me. 's not likely to change is it? I could color inside the lines and hope it buys me a nice life, but if I was that kind of person I would never have tackled this insane of a project
To be clear for the past five years I've done nothing but write OSS code (https://github.com/conartist6) while sharing pretty much all my engineering thoughts on a public Discord server (https://discord.gg/NfMNyYN6cX), so I'm not very worried at all that a person determined to find out would be unable to tell if I'm legit. You just can't fake 20,000 hours worth of public toil.
I guess I happily invite people to disrespect me for superficial reasons. Baseless disrespect keeps me motivated. Sorting out people who are only kicking tires also helps protect my time.
The ones who show real curiosity, those are the people I'll give my full attention to any day of the week.
That’s PR hype. They built it quickly, but they didn’t go from deciding they wanted a data center to having it running in weeks.
You can’t even get the hardware at that scale without months or years of order lead time. NVidia doesn’t have warehouses full of compute hardware waiting for someone to come get it.
They also reused an existing building. Basically, they put 100,000 GPUs into a building and attached the necessary infrastructure in about half a year. Impressive, but it’s not the same as a $10B/year data center usage commitment like this deal.
Why does this matter? The deal is supposed to last 10 years. If you don't pay AWS to order Nvidia GPUs for you, Nvidia won't have to deliver them to AWS, they will have exactly the same quantity of GPUs, but this time they can deliver to you.
And they used illegal power to do it (which will now give local poor people health disorders at 4x the national average). They likely violated every law possible in the process, like OSHA standards, overtime. Musk loves to overwork people.
They also reused an existing building that happened to be in the right place at the right time. The larger data center buildouts would almost always need new, dedicated construction.
I've been playing around with agent-native source annotation to specifically address the massively parallel work problem. Check it out here: https://github.com/draxl-org/draxl
You don’t need jj for this anymore. The whole premise of optimizing human workflows around source control is becoming obsolete.
When LLMs are driving development, source control stops being an active cognitive concern and becomes a passive implementation detail. The unit of work is no longer “branches” or “commits,” it’s intent. You describe what you want, the model generates, refactors, and reconciles changes across parallel streams automatically.
Parallel workstreams used to require careful coordination: rebasing, merging, conflict resolution, mental bookkeeping of state. That overhead existed because humans were the bottleneck. Once an LLM is managing the codebase, it can reason over the entire state space continuously and resolve those conflicts as part of generation, not as a separate step.
In that world, tools like jj are optimizing a layer that’s already being abstracted away. It’s similar to how no one optimizes around assembly anymore. It still exists, it still matters at a lower level, but it’s no longer where productivity is gained.
> The unit of work is no longer “branches” or “commits,”
It better be, now and going forward for people who use LLMs..because they will need it when LLM messes up and have to figure out, manually, how to resolve.
You ll need all the help (not to mention luck) you need then..
A lot of words to say "LLMs are good for this, trust me bro!"
You're bashing the old way, but you do not provide any concrete evidence for any of your points.
> The unit of work is no longer “branches” or “commits,” it’s intent.
Insert <astronaut meme "always has been">.
Branching is always about "I want to try to implement this thing, but I also want to quickly go back to the main task/canonical version". Committing is about I want to store this version in time with a description of the changes I made since the last commit. So both are an expression and a record of intent.
> Parallel workstreams used to require careful coordination: rebasing, merging, conflict resolution, mental bookkeeping of state.
Your choice of words is making me believe that you have a poor understanding of version control and only see it as storage of code.
Commits are notes that annotates changes, when you want to share your work, you share the changes since the last version everyone knows about alongside the notes that (should) explain those changes. But just like you take time to organize and edit your working notes for a final piece, rebasing is how you edit commits to have a cleaner history. Merging is when you want to keep the history of two branches.
Conflict resolution is a nice signal that the intent of a section of code may differ (eg. one wants blue, the other wants red). Having no conflict is not a guarantee that the code works (one reduces the size of the container, while the other increase the flow of the pipe, both wanted to speed up filling the container). So you have to inspect the code and run test afterwards.
Discard the above if you just don't care about the code that you're writing.
I think we’re talking past each other. My point isn’t that jj is bad. It’s that it’s solving problems that are rapidly becoming irrelevant.
Tools like git and jj exist to help humans manage state: branches, commits, rebases, conflicts, history curation. That whole model assumes a human is directly manipulating and reasoning about the codebase.
With LLMs in the loop, that assumption breaks. I don’t need to think in terms of branches or commits. I describe intent, and the model handles the mechanics of editing, reconciling, and producing a coherent result. Source control becomes an implementation detail of the toolchain, not something I actively operate.
jj is an improvement over git for humans, but that’s exactly why it feels like a local maximum. It refines a workflow that is already being abstracted away.
I’m not saying version control disappears. I’m saying it moves down a layer, the same way memory management or instruction scheduling did. When that happens, optimizing the human interface to it matters a lot less.
> Tools like git and jj exist to help humans manage state: branches, commits, rebases, conflicts, history curation. That whole model assumes a human is directly manipulating and reasoning about the codebase.
Think about the following first. You got a problem in the real world and if you can subdivide it into smaller problems, you will find that some are simple enough that a computer can take care of it and never be bored while doing it. And due to the last decades a lot of them have ready-made solutions. But you have to coordinate those solutions and write a program. And for these you need to write instructions into a text files.
But the real world is not static and you can't figure out the solution in one go, so you have to do iterative works on it. And unlike real world, the only cost of modifications is time. But you still want backups and the ability to restore version. So here come version control for the code of the software.
So you start thinking about all the possible workflow you could do with checkpoints you can return to in a few minutes, and it will look very close to something like git (or cvs). The one thing is that the computer is very removed from the problem that is driving all the changes and instead is at the other side. So it can magically correct issues and instead you have to step in.
> With LLMs in the loop, that assumption breaks. I don’t need to think in terms of branches or commits. I describe intent, and the model handles the mechanics of editing, reconciling, and producing a coherent result.
That would be great if that was possible now, but that looks like a synopsis for some SF novel. I can use git or jj today, but your version is lacking the several steps that would be making this a daily occurrence.
> memory management or instruction scheduling
You may think that they did, but that's until you have to deal with a memory leak or concurrent tasks. What we want version control for is the capability to snapshot state and restore to a known state and to share changes (instead of whole folder) when collaborating. How it's done does not really matter but git's conceptual model is very close to be ideal (at least for text files and line based statements). And its UX is versatile enough to be adaptable for all sort of workflow.
There are a lot of assumptions baked into your assessment. We are not at the point where manual workflows are obsolete. Maybe it is for folks who work on web apps, but it's certainly not the case for many others. AI Agents are constantly making mistakes and need oversight. Things have gotten dramatically better, but not enough for me to trust it to not create a terrible mess.
Maybe if you’re at the point where you dismiss three datapoints that all disagree with your thesis without any real argument you should rethink your thesis.
To be fair, the status page tends to lag by thirty minutes to an hour.
And I’ve experienced 500 error codes in Claude code that lasted more than thirty minutes but less than an hour that never showed up on the status page as an outage
Once people won't be able to think anymore and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.
Didn't they move too soon then? People haven't forgotten how to tie their shoelaces (yet). And anyway, they'll just move to a different model; last holdout wins.
>and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.
Is that bad? After all, even if they hiked to price infinity, you wouldn't worse off than if AI didn't exist because you could still code by hand. Moreover if it's really in a "business" (employment?) context, the tools should be provided by your employer, not least for compliance/security reasons. The "expectation" angle doesn't make sense either. If it's actually more efficient than coding by hand, people will eventually adopt it, word will get around and expectations will rise irrespective of whether you used it or not.
The insidious part is the thought that if you spend your limited learning and recall on AI Tools, then you wont be able to "still code by hand" because you'll have lost the skill, then there will be a local minima to cross to get back to human level productivity. Of course you'll get PIPed before you get back to full capacity.
OpenAI and Anthropic have been getting stingy with their plans and it's only it's been what, 1 year, maybe 2 since vibecoding was widely used in a professional context (ie. not just hacking together a MVP for a SaaS side hustle in a weekend)? I doubt people are going to lose their ability to think in that timespan.
I think you're 100% correct that people won't lose the ability. There's a scary thing I see as a person who works with and recruits students and fresh graduates -- they might not have spent the time to get the skills in the first place.
"enshittification" gets thrown around a lot, but this is the exact playbook. Look at the previous bubble's cash cow: advertising.
Online advertising is now ubiquitous, terrible, and mandatory for anyone who wants to do e-commerce. You can't run a mass-market online business without buying Adwords, Instagram Ads, etc.
AI will be ubiquitous, and then it will get worse and more expensive. But we will be unable to return to the prior status quo.
Why isn't there a premium, ad-free Google Search (or Facebook, or Instagram)? Because the most valuable customers (with the most money) self-select out of seeing ads. It would collapse the 2-sided market and create a race to the bottom. There is a dollar amount of advertising revenue per customer, but as John Wannamaker said - "Half the money I spend on advertising is wasted, but I don't know which half".
If the AI companies made their pricing "pay as you go" without quotas, a few insane zealots (power users) would occupy all the capacity and choke everyone else out. Regardless of the cost, the AI providers would lose the ubiquity they currently enjoy, and become a niche tool for rich tech people. They would rather be a mile wide and an inch deep, doing a worse job serving millions of users, because there's a better scaling narrative for legislating and fundraising that way. Like the advertisers there are intolerable indirect effects of letting valuable "power users" spend more money to get a better experience.
Because sometimes you can make more money by reducing costs and making something shittier (especially if you do it covertly), compared to increasing prices.
I suspect more customers are lost a lot faster when you increase prices, compared to enshittifying the product. It's also a lot more directly attributable to an action, and thus easier for an executive to be blamed if they choose the former over the latter.
I'm on the Free tier using Claude exclusively for consultation (send third party codebase + ask why/where is something done). I also used to struggle to hit limits. Recently I was able hit the limit after a single prompt.
I do systems programming. Before AI feature development roughly went like, design, implement, test, review with some back edges and a lot of time spent in test and review.
AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing, though still an improvement overall.
We do not see the weeks to days improvement though. The bottleneck before was testing and reviewing, and they are even bigger bottlenecks now.
What kind of work do you do, and what kind of workflow were you using before and after AI to benefit so much?
reply