From what I've read, every major AI player is losing a (lot) of money on running LLMs, even just with inference. It's hard to say for sure because they don't publish the financials (or if they do, it tends to be obfuscated), but if the screws start being turned on investment dollars they not only have to increase the price of their current offerings (2x cost wouldn't shock me), but some of them also need a (massive) influx of capital to handle things like datacenter build obligations (10s of billions of dollars). So I don't think it's crazy to think that prices might go up quite a bit. We've already seen waves of it, like last summer when Cursor suddenly became a lot more expensive (or less functional, depending on your perspective)
Dario Amodei has said that their models actually have a good return, even when accounting for training costs [0]. They lose money because of R&D, training the next bigger models, and I assume also investment in other areas like data centers.
Sam Altman has made similar statements, and Chinese companies also often serve their models very cheaply. All of this makes me believe them when they say they are profitable on API usage. Usage on the plans is a bit more unknown.
Their whole company has to be profitable, or at least not run out of money/investors. If you have no cash you can't just point to one part of your business as being profitable, given that it will quickly become hopelessly out-of-date when other models overtake it.
Other models will only overtake as long as there is enough investor money or margins from inference for others to continue training bigger and bigger models.
We can see from inference costs at third party providers that the inference is profitable enough to sustain even third party providers of proprietary models that they are undoubtedly paying licensing/usage fees for, and so these models won't go away.
Yeah, that’s the whole game they’re playing. Compete until they can’t raise more and then they will start cutting costs and introducing new revenue sources like ads.
They spend money on growth and new models. At some point that will slow and then they’ll start to spend less on R&D and training. Competition means some may lose, but models will continue to be served.
> Sam Altman has made similar statements, and Chinese companies also often serve their models very cheaply.
Sam Altman got fired by his own board for dishonesty, and a lot of the original OpenAI people have left. I don't know the guy, but given his track record I'm not sure I'd just take his word for it.
> You’re probably gonna say at this point that Anthropic or OpenAI might go public, which will infuse capital into the system, and I want to give you a preview of what to look forward to, courtesy of AI labs MiniMax and Zhipu (as reported by The Information), which just filed to go public in Hong Kong.
> Anyway, I’m sure these numbers are great-oh my GOD!
> In the first half of this year, Zhipu had a net loss of $334 million on $27 million in revenue, and guess what, 85% of that revenue came from enterprise customers. Meanwhile, MiniMax made $53.4 million in revenue in the first nine months of the year, and burned $211 million to earn it.
This is my understanding as well. If GPT made money the companies that run them would be publicly traded?
Furthermore, companies which are publicly traded show that overall the products are not economical. Meta and MSFT are great examples of this, though they have recently seen opposite sides of investors appraising their results. Notably, OpenAI and MSFT are more closely linked than any other Mag7 companies with an AI startup.
Going public also brings with it a lot of pesky reporting requirements and challenges. If it wasn't for the benefit of liquidity for shareholders, "nobody" would go public. If the bigger shareholders can get enough liquidity from private sales, or have a long enough time horizon, there's very little to be gained from going public.
> From what I've read, every major AI player is losing a (lot) of money on running LLMs, even just with inference.
> It's hard to say for sure because they don't publish the financials (or if they do, it tends to be obfuscated)
Yeah, exactly. So how the hell the bloggers you read know AI players are losing money? Are they whistleblowers? Or they're pulling numbers out of their asses? Your choice.
Some of it's whistle blowers, some of it is pretty simple math and analysis. Some of it's just common sense. Constantly raising money isn't sustainable and just increases obligations dramatically.. if these companies didn't need the cash to keep operating, they probably wouldn't be asking for tens of billions a year because it creates profit expectations that simply can't be delivered on.
Maybe this is just a skill issue on my part, but I'm still trying to wrap my head around the workflow of running multiple claude agents at once. How do they not conflict with each other? Also how do you have a project well specified enough that you can have these agents working for hours on end heads down? My experience as a developer (even pre AI) has mostly been that writing-code-fast has rarely been the progress limiter.. usually the obstacles are more like, underspecified projects, needing user testing, disagreements on the value of specific features, subtle hard to fix bugs, communication issues, dealing with other teams and their tech, etc. If I have days where I can just be heads down writing a ton of code I'm very happy.
I'm curious, outside of AI enthusiasts have people found value with using Clawdbot, and if so, what are they doing with it? From my perspective it seems like the people legitimately busy enough that they actually need an AI assistant are also people with enough responsibilities that they have to be very careful about letting something act on their behalf with minimal supervision. It seems like that sort of person could probably afford to hire an administrative assistant anyway (a trustworthy one), or if it's for work they probably already have one.
On the other hand, the people most inclined to hand over access to everything to this bot also strike me as people without a lot to lose? I don't want to make an unfair characterization or anything, it just strikes me that handing over the keys to your entire life/identity is a lot more palatable if you don't have much to lose anyway?
From my perspective, not everybody is busy but they are using AI to remove the load from them.
You might think: But that is great right??
I had a chat with a friend also in IT, ChatGPT and alike is the one doing all the "brain part and execution" in most cases.
Entire workflows are done by AI tools, he just presses a button in some cases.
People forget that our brain needs stimulation, if you don't use it, you forget things and it gets dumber.
Watch the next generation of engineers that are very good at using AI but are unable to do troubleshooting on their own.
Look at what happened with ChatGPT4 -> 5, companies workflows worldwide stopped working setting companies back by months.
Do you wanna a real world example???
Watch people who spent their entire lives within an university getting all sort of qualification but never really touched the real deal unable to do anything.
Sure, there are the smarter ones who would put things to the test and found awesome job, but many are jobless because all they did is "press a button", they are just like the AI enthusiasts, remove such tools and they can no longer work.
Yeah, the scary problem comes from people who are trying to abdicate their entire understand-and-decide phase to an outside entity.
What's more, that's not fundamentally a new thing, it's always been possible for someone to helplessly cling to another human as their brain... but we've typically considered that to be a sign of either mental-disorder, psychological abuse, or a fool about to be parted from their money.
The whole premise of this thing seems to be that it has access to your email, web browser, messaging, and so on. That's what makes it, in theory, useful.
The prompt injection possibilities are incredibly obvious... the entire world has write access to your agent.
I guess that's one reason. If I'm perfectly honest I always turn Siri off because I don't trust Siri either; but that's less of a "malicious actors" thing and more of a "it doesn't work well thing". Although to be honest, outside of driving in a car I don't really want a voice interface. With a lot of things I feel like I need to overspecify it if I have to do it verbally. Like "play this song, but play it from my spotify Liked playlist so that when the song is over it transitions to something I want" (I've never tried that since I figure siri can't do it -- just an example)
I can see how it could be fun, but I'm a bit skeptical that it's a practical path forward. The security problems it has (prompt injection for example) don't seem solvable with LLMs in general
I'm working in AI, but I'd have made this anyway: Molty is my language learning accountability buddy. It crawls the web with a sandboxed subagent to find me interesting stuff to read in French and Japanese. It makes Anki flashcards for me. And it wraps it up by quizzing me on the day's reading in the evening.
All this is running on a cheap VPS, where the worst it has access to is the LLM and Discord API keys and AnkiWeb login.
I asked Codex to write some unit tests for Redux today. At first glance it looked fine, and I continued on. I then went back to add a test by hand, and after looking more closely at the output there were like 50 wtf worthy things scattered in there. Sure they ran, but it was bad in all sorts of ways. And this was just writing something very basic.
This has been my experience almost every time I use AI: superficially it seems fine, once I go to extend the code I realize it's a disaster and I have to clean it up.
The problem with "code is cheap" is that, it's not. GENERATING code is now cheap (while the LLMs are subsidized by endless VC dollars, anyway), but the cost of owning that code is not. Every line of code is a liability, and generating thousands of lines a day is like running up a few thousand dollars of debt on a credit card thinking you're getting free stuff and then being surprised when it gets declined.
EWD 1036: On the cruelty of really teaching computing science (1988)
“My point today is that, if we wish to count lines of code, we should not regard them as ‘lines produced’ but as ‘lines spent’: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.”
A better formulation is "every feature is a liability". Taking it to the line of code level is too prescriptive. Occasionally writing more verbose code is preferable if it makes it easier to understand.
> A better formulation is "every feature is a liability". Taking it to the line of code level is too prescriptive.
Amount of code is a huge factor but maybe not the best wording here. It's more a thing of complexity where amount of code is a contributing metric but not the only one. You can very easily have a feature implemented in a too complex way and with too much code (esp. if an LLM generated the code but also with human developers). Also not every feature is equal.
> Occasionally writing more verbose code is preferable if it makes it easier to understand.
Think this is more a classic case of "if the metric becomes a goal it ceases to be a metric" than it being a bad metric per se.
This sounds wrong, features have to be the value of your code. The required maintenance and slow down to build more features (technical debt) are the liability, which is how I understood the relationship to "lines of code" anyway.
I can sort of understand it if I squint: every feature is a maintenance burden, and a risk of looking bad in front of users when you break or remove it, even if those users didn't use this feature. It's really a burden to be avoided when the point of your product is to grow its user base, not to actually be useful. Which explains why even Fischer-Price toys look more feature-ful and ergonomic than most new software products.
Agreed, every line you ship,whether you wrote it or not, you are responsible. In that regard, while I write a lot of code completely with AI, I still endeavor to keep the lines as minimal as possible. This means you never write both the main code and the tests using AI. Id rather have no tests than AI tests (we have QA team writing that up). This kinda works.
But more code from AI means stocks go up. Stocks are assets. If you generate enough code the assets will outnumber the liabilities. It’s accounting 101. /s
The only people I've known that share this perspective are those that hate abstraction. Going back to their code, to extend it in some way, almost always requires a rewrite, because they wrote it with the goal of minimum viable complexity rather than understanding the realities of the real world problem they're solving, like "we all know we need these other features, but we have a deadline!"
For one off, this is fine. For anything maintainable, that needs to survive the realities of time, this is truly terrible.
Related, my friend works in a performance critical space. He can't use abstractions, because the direct, bare metal, "exact fit" implementation will perform best. They can't really add features, because it'll throw the timing of others things off to much, so usually have to re-architect. But, that's the reality of their problem space.
Maybe I am? How is it possible to abstract without encapsulation? And also, how is it possible to encapsulate without abstracting some concept (intentionally or not) contained in that encapsulation? I can't really differentiate them, in the context of naming/referencing some list of CPU operations.
> How is it possible to abstract without encapsulation.
Historically pure machine code with jumps etc lacked any from of encapsulation as any data can be accessed and updated by anything.
However, you would still use abstractions. If you pretend the train is actually going 80.2 MPH instead of somewhere between 80.1573 MPH to 80.2485 MPH which you got from different sensors you don’t need to do every calculation that follows twice.
I'm using the industry definition of abstraction [1]:
> In software, an abstraction provides access while hiding details that otherwise might make access more challenging
I read this as "an encapsulation of a concept". In software, I think it can be simplified to "named lists of operations".
> Historically pure machine code with jumps etc lacked any from of encapsulation as any data can be accessed and updated by anything.
Not practically, by any stretch of the imagination. And, if the intent is to write silly code, modern languages don't really change much, it's just the number of operations in the named lists will be longer.
You would use calls and returns (or just jumps if not supported), and then name and reference the resulting subroutine in your assembler or with a comment (so you could reference it as "call 0x23423 // multiply R1 and R2"), to encapsulate the concept. If those weren't supported, you would use named macros [2]. Your assembler would used named operations, sometimes expanding to multiple opcodes, with each opcode having a conceptually relevant name in the manual, which abstracted a logic circuit made with named logic gates, consisting of named switches, that shuffled around named charge carriers. Say your code just did a few operations, the named abstraction for the list of operations (which all these things are) there would be "blink_light.asm".
> If you pretend the train is actually going 80.2 MPH instead of somewhere between 80.1573 MPH to 80.2485 MPH which you got from different sensors you don’t need to do every calculation that follows twice.
I don't see this as an abstraction as much as a simple engineering compromise (of accuracy) dictated by constraint (CPU time/solenoid wear/whatever), because you're not hiding complexity as much as ignoring it.
I see what you're saying, and you're probably right, but I see the concepts as equivalent. I see an abstraction as a functional encapsulation of a concept. An encapsulation, if not nonsense, will be some meaningful abstraction (or a renaming of one).
I'm genuinely interested in an example of an encapsulation that isn't an abstraction, and an abstraction that isn't a conceptual encapsulation, to right my perspective! I can't think of any.
Incorrect definition = incorrect interpretation. I edited this a few times but the separation is you can use an abstraction even if you maintain access to the implementation details.
> assembler
Assembly language which is a different thing. Initially there was no assembler, someone had to write one. In the beginning every line of code had direct access to all memory in part because limited access required extra engineering.
Though even machine code itself is an abstraction across a great number of implementation details.
> I don't see this as an abstraction as much as a simple engineering compromise (of accuracy) dictated by constraint (CPU time/solenoid wear/whatever), because you're not hiding complexity as much as ignoring it.
If it makes you feel better consider the same situation with 5 senators X of which have failed. The point is you don’t need to consider all information at every stage of a process. Instead of all the underlying details you can write code that asks do we have enough information to get a sufficiently accurate speed? What is it?
It doesn’t matter if the code could still look at the raw sensor data, you the programmer prefer the abstraction so it persists even without anything beyond yourself enforcing it.
IE: “hiding details that otherwise might make access more challenging”
You can use TCP/IP or anything else as an abstraction even if you maintain access to the lower level implementation details.
I genuinely appreciate your response, because there's a good chance it'll result in me changing my perspective, and I'm asking these questions with that intent!
> You are thinking of assembly language which is a different thing. Initially there was no assembler, someone had to write one.
This is why I specifically mention opcodes. I've actually written assemblers! And...there's not much to them. It's mostly just replacing the names given to the opcodes in the datasheet back to the opcodes, with a few human niceties. ;)
> consider the same situation with 5 senators X of which have failed
Ohhhhhhhh, ok. I kind of see. Unfortunately, I don't see the difference between abstraction and encapsulation here. I see the abstraction as being speed as being the encapsulation of a set of sensors, ignoring irrelevant values.
I feel like I'm almost there. I may have edited my previous comment after you replied. My "no procrastination" setting kicked in, and I couldn't see.
I don't see how "The former is about semantic levels, the later about information hiding." are different. In my mind, semantic levels exist as compression and encapsulation of information. If you're saying encapsulation means "black box" then that could make sense to me, but "inaccessible" isn't part of the definition, just "containment".
Computer Science stole the term abstraction from the field of Mathematics. I think mathematics can be really helpful in clearing things up here.
A really simple abstraction in mathematics is that of numeric basis (e.g. base 10) for representing numbers. Being able to use the symbol 3 is much more useful than needing to write III. Of course, numbers themselves are an abstraction- perhaps you and I can reason about 3 and 7 and 10,000 in a vacuum, but young children or people who have never been exposed to numbers without units struggle to understand. Seven… what? Dogs? Bottles? Days? Numbers are an abstraction, and Arabic digits are a particular abstraction on top of that.
Without that abstraction, we would have insufficient tools to do more complex things such as, say, subtract 1 from 1,000,000,000. This is a problem that most 12 year olds can solve, but the greatest mathematicians of the Roman empire could not, because they did not have the right abstractions.
So if there are abstractions that enable us to solve problems that were formerly impossible, this means there is something more going on than “hiding information”. In fact, this is what Dijkstra (a mathematician by training) meant when he said:
The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise
When I use open(2), it’s because I’m operating at the semantic level of files. It’s not sensible to think of a “file” at a lower level: would it be on disk? In memory? What about socket files? But a “file” isn’t a real thing, it’s an abstraction created by the OS. We can operate on files, these made up things, and we can compose operations together in complex, useful ways. The idea of a file opens new possibilities for things we can do with computers.
Expanding on this regarding the difference between abstraction vs encapsulation: abstraction is about the distillation of useful concepts while encapsulation is a specific tactic used to accomplish a behavior.
To continue with the idea of numbers, let’s say you asked someone to add 3 and 5. Is that encapsulation? What information are you hiding? You are not asking them to add coins or meters or reindeer. 3 and 5 are values independent of any underlying information. The numbers aren’t encapsulating anything.
Encapsulation is different. When you operate a motor vehicle, you concern yourself with the controls presented. This allows you, as the operator, to only need a tiny amount of knowledge to interact with an incredibly complex machine. This details have been encapsulated. There may be particular abstraction present, such as the notion of steering, acceleration, and breaking, but the way you interact with these will differ from vehicle to vehicle. Additionally, encapsulation is not concerned with the idea of steering, it is concerned with how to present steering in this specific case.
The two ideas are connected because using an abstraction in software often involves encapsulation. But they should not be conflated, out the likely result is bad abstractions and unwieldy encapsulation.
> It's mostly just replacing the names given to the opcodes in the datasheet back to the opcodes
Under the assumption that the input data is properly formatted you can generate machine code. This is however an abstraction which can fail as nothing forces a user to input valid files.
So we have an abstraction without any encapsulation.
I don't see how the two are related, personally. I'm regularly accused of over-abstraction specifically because I aspire to make each abstraction do as little as possible, i.e. fewest lines possible.
"Abstracting" means extracting the commnon parts of multiple instances, and making everything else a parameter. The difficulty for software is that developers often start by writing the abstraction, rather than having multiple existing instances and then writing code that collects the common parts of those multiple instances into a single abstraction. I guess that is what "refactoring" is about.
In sciences and humanities abstraction is applied the proper way, studying the instances first then describing multitude of existing phenomena by giving names to their common repeating descriptions.
This matches my "ideal" way of writing software, which is something close to "reverse waterfall". Start with the non-negotiable truths at the lowest level, then work your way up towards the goal, which is sufficiently defined. As you go, the patterns became apparent, collapsing into nice abstractions.
The code always ends up nice and clean and modular. And, since I'm working towards the idea, I can say "here are the modular pieces I have to work with, dictated the fundamentals beneath, how do I use them to accomplish the task?". When working from the idea, I think it's easier to want to write something to achieve the immediate task, in a "not see the forest for the trees" kind of way (abstractions are about the goal, rather than the reality under). Of course, both directions are required, but I get the best "separation of concerns" going in reverse.
I call that lasagna code! From what I've seen, developers start with spaghetti, overcompensate with lasagna, then end up with some organization more optimized for the human, that minimizes cognitive load while reading.
To me, abstraction is an encapsulation of some concept. I can't understand how they're practically different, unless you encapsulate true nonsense, without purpose or resulting meaning, which I can't think of an example of, since humans tend to categorize/name everything. I'm dumb.
Hi, I'm the primary Redux maintainer. I'd love to see some examples of what got generated! (Doubt there's anything we could do to _influence_ this, but curious what happened here.)
FWIW we do have our docs on testing approaches here, and have recommended a more integrated-style approach to testing for a while:
Unfortunately I think I cleaned up the code before committing it, so I don't have an exact example! I did actually read that usage page though after looking at those tests, and that helped me in fixing the tests (maybe in retrospect I should have pointed the AI at the docs page first).
I think the main issue I was having with it was reusing the store object instead of creating a new one for each test. The other issue I was seeing was that it was creating mock objects and API's for things that weren't even being tested (lot of scope creep), and one of those API's was basically copy-pasted between two files (code duplication). It was also just testing things that weren't really necessary (ie, testing Redux itself, instead of my usage of it).
Another issue was just taking a complex approach to fixing something that could be more easily solved. For instance, I had debug: true turned on for redux-undo so I was seeing some unnecessary log messages in the test. Codex identified this, and asked if I wanted to disable them, so I said yes. What it did though, was instead of setting debug: false, or disabling it on tests, it patched console.log to look for redux-undo prefixes. Technically worked, but kind of byzantine!
None of this was a terrible disaster or anything, especially since I started pretty small, but I think what made me miss some of the issues at first glance is this is my first usage of Redux in a non-toy project so while I understand the fundamentals fine, it was easy to sneak working-but-bad stuff past me until I sat down with the code to write a test on my own and started to see the issues.
The equivalent of "draw me a dog" -> not a masterpiece!? who would have thought? You need to come up with a testing methodology, write it down, and then ask the model to go through it. It likes to make assumptions on unspecified things, so you got to be careful.
More fundamentally I think testing is becoming the core component we need to think about. We should not vibe-check AI code, we should code-check it. Of course it will write the actual test code, but your main priority is to think about "how do I test this?"
You can only know the value of a code up to the level of its testing. You can't commit your eyes into the repo, so don't do "LGTM" vibe-testing of AI code, it's walking a motorcycle.
Generating code was always cheap. That’s part of the reason this tech has to be forced on teams. Similar to the move to cloud, it’s the kind of cost that’s only gonna show up later - faster than the cloud move, I think. Though, in some cases it will be the correct choice.
ATM I feel like LLM writing tests can be a bit dangerous at times, there are cases where it's fine there are cases where it's not. I don't really think I could articulate a systemised basis for identifying either case, but I know it when I see it I guess.
Like the the other day, I gave it a bunch of use cases to write tests for, the use cases were correct the code was not, it saw one of the tests broken so it sought to rewrite the test. You risking suboptimal results when an agent is dictating its own success criteria.
At one point I did try and use seperate Claude instances to write tests, then I'd get the other instance to write the implementation unaware of the tests. But it's a bit to much setup.
I work with individuals who attempt to use LLMs to write tests. More than once, it's added nonsensical, useless test cases. Admittedly, humans do this, too, to a lesser extent.
Additionally, if their code has broken existing tests, it "fixes" them by not fixing the code under test, but changing the tests... (assert status == 200 becomes 500 and deleting code.)
Tests "pass." PR is opened. Reviewers wade through slop...
The most annoying thing is that even after cleaning up all the nonsense, the tests still contain all sort of fanfare and it’s essentially impossible to get the submitter to trim them because it’s death by a thousand cuts (and you better not say "do it as if you didn’t use AI" in the current climate..)
Yep. We've had to throw PRs away and ask them to start over with a smaller set of changes since it became impossible to manage. Reviews went on for weeks. The individual couldn't justify why things were done (and apparently their AI couldn't, either!)
Luckily those I work with are smart enough that I've not seen a PR thrown away yet, but sometimes I'm approving with more "meh, it's fine I guess" than "yeah, that makes sense".
This is how you do things if you are new to this game.
Get two other, different, LLMs to thoroughly review the code. If you don’t have an automated way to do all of this, you will struggle and eventually put yourself out of a job.
If you do use this approach, you will get code that is better than what most software devs put out. And that gives you a good base to work with if you need to add polish to it.
I actually have used other LLMs to review the code, in the past (not today, but in the past). It's fine, but it doesn't tend to catch things like "this technically works but it's loading a footgun." For example, the redux test I was mentioning in my original post, the tests were reusing a single global store variable. It technically worked, the tests ran, and since these were the first tests I introduced in the code base there weren't any issues even though this made the tests non deterministic... but, it was a pattern that was easily going to break down the line.
To me, the solution isn't "more AI", it's "how do I use AI in a way that doesn't screw me over a few weeks/months down the line", and for me that's by making sure I understand the code it generated and trim out the things that are bad/excessive. If it's generating things I don't understand, then I need to understand them, because I have to debug it at some point.
Also, in this case it was just some unit tests, so who cares, but if this was a service that was publicly exposed on the web? I would definitely want to make sure I had a human in the loop for anything security related, and I would ABSOLUTELY want to make sure I understood it if it were handling user data.
The quality of generated code does not matter. The problem is when it breaks 2 AM and you're burning thousands of dollars every minutes. You don't own the code that you don't understand, but unfortunately that does not mean you don't own the responsibility as well. Good luck on writing the postmortem, your boss will have lots of question for you.
AI can help you understand code faster than without AI. It allows me to investigate problems that I have little context in and be able to write fixes effectively.
> If you do use this approach, you will get code that is better than what most software devs put out. And that gives you a good base to work with if you need to add polish to it.
If you do use this approach, you'll find that it will descend into a recursive madness. Due to the way these models are trained, they are never going to look at the output of two other models and go "Yeah, this is fine as it is; don't change a thing".
Before you know it you're going to have change amplification, where a tiny change by one model triggers other models (or even itself) to make other changes, which triggers further changes, etc ad nauseum.
The easy part is getting the models to spit out working code. The hard part is getting it to stop.
I've never done this because i haven't felt compelled to do this because I want to review my own code but I imagine this works okay and isn't hard to set up by asking Claude to set this up for you...
What? People do this all the time. Sometimes manually by invoking another agent with a different model and asking it to review the changes against the original spec. I just setup some reviewer / verifier sub agents in Cursor that I can invoke with a slash command. I use Opus 4.5 as my daily driver, but I have reviewer subagents running Gemini 3 Pro and GPT-5.2-codex and they each review the plan as well, and then the final implementation against the plan. Both sometimes identify issues, and Opus then integrates that feedback.
It’s not perfect so I still review the code myself, but it helps decrease the number of defects I have to then have the AI correct.
The setup is much simpler than you might think. I have 4 CLI tools I use for this setup. Claude Code, Codex, Copilot and Cursor CLI. I asked Claude Code to create a code reviewer "skill" that uses the other 3 CLI tools to review changes in detail and provide feedback. I then ask Claude Code to use this skill to review any changes in code or even review plan documents. It is very very effective. Is it perfect? No. Nothing is. But, as I stated before, this produces results that are better than what an average developer sends in for PR review. Far far better in my own experience.
In addition to that, we do use CodeRabbit plugin on GitHub to perform a 4th code review. And we tell all of our agents to not get into gold-plating mode.
You can choose not to use modern tools like these to write software. You can also choose to write software in binary.
these two posts (the parent and then the OP) seem equally empty?
by level of compute spend, it might look like:
- ask an LLM in the same query/thread to write code AND tests (not good)
- ask the LLM in different threads (meh)
- ask the LLM in a separate thread to critique said tests (too brittle, testing guidelines, testing implementation and not out behavior, etc). fix those. (decent)
- ask the LLM to spawn multiple agents to review the code and tests. Fix those. Spawn agents to critique again. Fix again.
- Do the same as above, but spawn agents from different families (so Claude calls Gemini and Codex).
—-
these are usually set up as /slash commands like /tests or /review so you aren’t doing this manually. since this can take some time, people might work on multiple features at once.
I'm not sure I understand the point of this. If mistakes (or lies) aren't tolerable in the output, then whoever ran the agent should be responsible for reviewing the output and the resulting actions. I don't see how LLMs can be "responsible" for anything because they don't think in the way we do nor do they have motives in the way we do.
The "Hallucination Defense" isn't a defense because end of the day, if you ran it, you're responsible, IMO.
Well, if you consider Maslow's hierarchy of needs, "creatively enabled" would be a luxury at the top of the pyramid with "self actualization". Luxuries don't matter if the things at the bottom of the pyramid aren't there -- i.e. you can't eat or put a shelter over your head. I think the big AI players really need a coherent plan for this if they don't want a lot of mainstream and eventually legislative pushback. Not to mention it's bad business if nobody can afford to use AI because they're unemployed. (I'm not anti-AI, it's an interesting tool, but I think the way it's being developed is inviting a lot of danger for very marginal returns so far)
You can be poor and creative at the same time. Creativity is not a luxury. For many, including myself, it's a means of survival. Creating gives me purpose and connection to the world around me.
I grew up very poor and was homeless as a teenager and in my early 20s. I still studied and practiced engineering and machine learning then, I still made art, and I do it now. The fact that Big Tech is the new Big Oil is besides the point. Plenty of companies are using open training sets and producing open, permissively licensed models.
> I think the big AI players really need a coherent plan for this if they don't want a lot of mainstream and eventually legislative pushback.
That's by far not the worst that could happen. There could very well be an axe attached to the pendulum when it swings back.
> Not to mention it's bad business if nobody can afford to use AI because they're unemployed.
In that sense this is the opposite of the Ford story: the value of your contribution to the process will approach zero so that you won't be able to afford the product of your work.
We were going to have to reckon with these problems eventually as science and technology inevitably progressed. The problem is the world is plunged in chaos at the moment and being faced with a technology that has the potential to completely and rapidly transform society really isn't helping.
Hatred of the technology itself is misplaced, and it is difficult sometimes debating these topics because anti-AI folk conflate many issues at once and expect you to have answers for all of them as if everyone working in the field is on the same agenda. We can defend and highlight the positives of the technology without condoning the negatives.
I think hatred is the wrong word. Concern is probably a better one and there are many things that are technology and that it is perfectly ok to be concerned about. If you're not somewhat concerned about AI then probably you have not yet thought about the possible futures that can stem from this particular invention and not all of those are good. See also: Atomic bombs, the machine gun, and the invention of gunpowder, each of which I'm sure may have some kind of contrived positive angle but whose net contribution to the world we live in was not necessarily a positive one. And I can see quite a few ways in which AI could very well be worse than all of those combined (as well as some ways in which it could be better, but for that to be the case humanity would first have to grow up a lot).
I'm extremely concerned about the implications. We are going to have to restructure a lot of things about society and the software we use.
And like anything else, it will be a tool in the elite's toolbox of oppression. But it will also be a tool in the hands of the people. Unless anti-AI sentiment gets compromised and redirected such that support for limiting access to capable generative models to the State and research facilities.
The hate I am referring to is often more ideological, about the usage of these models from a purity standpoint. That only bad engineers use them, or that their utility is completely overblown, etc. etc.
It's just bad timing, but the ball is already rolling downhill, the cat's already out of the bag, etc. Best we can do at the moment is fight for open research and access.
In which the CEO of the single largest social media company, who's responsible (and whose algorithms are responsible) for the mediation of human interactions across the entire planet, really dislikes losing board games so his staff is instructed to let him win
Quality QA folk are able to reason and develop an understanding of a system without ever seeing a line of code. As long as we're discussing fundamentals, being able to develop such an understanding will be a skill to develop that will pay off returns even after AI comes and goes. Even when given the code, rushing to throw print everywhere, or rushing to throw it at debugger both come behind someone that understands the system and is able to observe the bug, then sit and reason about it, and then in some cases, just fix the bug. I've worked with a couple of programmers that good, it's awesome to experience.
Point is though, you don't need to see the code to debug it, so the fact that the code was generated should not be the thing holding you back.
---
When given only three words, is the rewrite any good? When given to a human intern, would it be any good? Instead "refactor the foo that's repeated n bar, baz and qux into one reusable shared class/component/the appropriate thing for the given language" is good actual software engineering that a human would do, or ask an intern to do.
From an ethical standpoint, I think it's .. murky. Not ads themselves, but because the AI is, at least partially, likely trained on data scraped from the web, which is then more or less regurgitated (in a personalized way) and then presented with ads that do not pay the original content creators. So it's kind of like, lets consume what other people created, repackage it, and then profit off of it.
I don't see a reason to think we're not going to hit a plateua sooner or later (and probably sooner). You can't scale your way out of hallucinations, and you can't keep raising tens of billions to train these things without investors wanting a return. Once you use up the entire internets worth of stack overflow responses and public github repositories you run into the fact that these things aren't good at doing things outside their training dataset.
Long story short, predicting perpetual growth is also a trap.
You scale your way only out in verifiable domains, like code, math, optimizations, games and simulations. In all the other domains the AI developers still got billions (trillions) of tokens daily, which are validated by follow up messages, minutes or even days later. If you can study longitudinally you can get feedback signals, such as when people apply the LLM idea in practice and came back to iterate later.
> Once you use up the entire internets worth of stack overflow responses and public github repositories you run into the fact that these things aren't good at doing things outside their training dataset.
I think the models have reached that human training data limitation a few generations ago, yet they stil clearly improve by various other techniques.
reply