More

engineercodex · on May 3, 2024

We all know this isn’t replacing doctors. We do know that when applied correctly, this will assist doctors and other clinical specialists in their work. With doctor burnout at all time highs, I think stuff like this is amazing.

david-gpu · on May 3, 2024

> We all know this isn’t replacing doctors. We do know that when applied correctly, this will assist doctors and other clinical specialists in their work.

Let's say a clinic currently has eleven doctors and is able to treat X number of patients per week. Let's say one of them retires, and instead of finding a human replacement, the remaining ten doctors choose an AI clinical assistant to free up 10% of their workload. Now it only takes ten AI-assisted doctors to continue serving the same number of patients as eleven doctors used to do.

This is just an example to show that there is no meaningful difference between "assisting" and "replacing". Any time an assistant, AI or not, takes some workload off somebody's plate, they have partially replaced them, and it adds up.

ben_w · on May 3, 2024

There is a difference, but with an unwritten assumption: that the AI can do all the things the doctor can do.

If the AI can't do all the things that a doctor can do, then even when it can take up the slack for one doctor retiring, that doesn't tell you anything at all about whether or not it can take up the slack for two retiring.

Right now there's more work to be done than there are doctors to do it; this means that the same number of doctors are getting more things done as the AI improves… but not infinitely more, because there's still stuff the current AI can't do, that only humans can do.

We have a lot of things where technology has fully saturated demand: in food, this is why we've got an obesity problem in much of the world[0]; in medicine, this is how we wiped out smallpox entirely, and are very close to wiping out a few other diseases entirely; in telephony, this is why video conference calls are basically free.

But in each of those fields, there are other things we still have demand for, they're not complete post-scarcity: restaurants, old age, and bandwidth costs are still a thing.

[0] not so for transport, which is one reason why we simultaneously have some people starving

rembicilious · on May 3, 2024

Except there aren’t enough doctors in many places. Also physician performance varies widely. So really this won’t replace doctors, but hopefully patient care will be affected positively. Doctors aren’t always brilliant outside their field and AI “assistance” can turn out to be a clever troll sometimes.

CipherThrowaway · on May 3, 2024

1. There is a meaningful difference between replacing and assisting. Replacing implies being able to take over the entire function. Technology that assists doctors with one part of their function, or process, can improve their output but is not capable of producing like-for-like output on its own. So the question of whether AI can replace or only assist doctors is very relevant to determining its impact on the role. Power tools didn't replace tradesmen, for example. If AI was able to replace doctors, then your clinic would be able to scale down to 0 rather than 10.

2. If a clinic can use an AI tool to make doctors 10% more productive, doctors become worth more rather than less. Firms are incentivized to hire more rather than less in this scenario. What you're invoking here is the "lump of labor" fallacy. There are market conditions where increasing efficiency really does reduce quantity demanded, but it's not clear that medicine really is one. As far as I can tell, far from there being a fixed lump of medical work, the general population in most of the West is under-serviced and struggles to get reliable, timely, cost-effective access to medical expertise.

david-gpu · on May 3, 2024

Whether or not there is pent-up demand for healthcare, the emergence of AI clinical assistants, like any other form of efficiency increases, effectively expands the supply of healthcare services. In a free market, an increase of supply lowers the price at equilibrium. And as their salaries go lower, fewer people become interested in joining the profession.

We saw this play out in agriculture, in manufacturing, and now it is starting to happen in some services. I do not understand why would we think it will be any different this time around.

lm555 · on May 3, 2024

In theory, yes. But in practice, at least here in Eastern Europe, there is such a shortage of doctors that even if they became three times more productive, there wouldn't be any meaningful changes in demand. For example, I haven't had a personal doctor for the past five years because they don't have any free capacity. Last month, I called the doctor twice because my child was sick, and they told me I shouldn't call them so often. So I think we're a long way from that happening.

CipherThrowaway · on May 3, 2024

>increase of supply lowers the price at equilibrium

The thing you're missing here is that "healthcare services" and "doctor's labor" aren't the same unit. Ceteris paribus, efficiency increases allow the price of healthcare to decrease while the price of doctor's labor increases. The thing that makes this non-contradictory is that a single doctor can now produce "more" healthcare. Economics says the opposite of what you think it does here. Increasing productivity drives expansion in market size, which drives up the ratio of value in the market to its labor inputs which drives up salaries.

Like I said, there might be important real-world reasons why these scenarios won't play out in medicine the way the theory predicts. But so far, you haven't provided any.

Manufacturing has also seen the opposite of what you are saying here. Global manufacturing production value has exploded over the last century, quite literally lifting billions of people out of abject poverty. In particular the last 3 decades of enormous per capita income increases in China have been driven by industrialization. I'm guessing you're taking a US-centric view that is exclusively focused on the local collapse of US manufacturing. This is to do with globalization and free trade, not improvements to labor efficiency.

bfung · on May 3, 2024

For doctors, as a sibling comment mentions, there’s way more medical demand than doctor-time available.

This isn’t even considering quality of care, only quantity.

For other professions, if there’s already a glut of supply,… well, we don’t really need more ads or reality tv shows or sensational/viral clickbait.

thejohnconway · on May 3, 2024

Yeah, I see this "not replacing, assisting" argument all the time, and it just doesn't work. If you can do more with fewer people, fewer people will be employed for the same task. In the past, nearly everyone was a farmer. We're not all farmers with, like, really easy jobs now because the machines do the boring bits. No, there are far fewer farmers.

raxxorraxor · on May 3, 2024

I don't think I want to have diagnostic support from an LLM. Perhaps it works most of the time but then you wake up with 13 fingers. The "battle" of Gemini vs GPT-4 doesn't really gain too much.

There are quite good specialized systems for medical applications that were thoroughly tested and vetted against quite high barriers for entry.

I hate the approach of ad companies to approach medical problems. Of course you need patient data for clinical studies, but far more interesting would be to collect data that hint to medical indications and offering up this knowledge to doctors that cannot know about all of them.

LLMs probably will just grow a new generation of hypochondriacs because they certainly will never say that you are healthy if diagnostic supports ever make it into production.

engineercodex · on Feb 24, 2024

Quality has to be backed my metrics or else it will never happen.

PS. Big fan of this person’s recent blog posts. He’s been on a roll!

layer8 · on Feb 24, 2024

It used to happen because people believed it was simply the right thing to do.

engineercodex · on Feb 23, 2024

I inserted these because I personally like reading related discussions and articles of topics at hand. Not sure how this is a negative :/

dvaun · on Feb 24, 2024

You’re right. That was undeserved, my apologies.

Edit: I’d like to note that your writing was fun to read—my comment was instead leaking a bad mood I had at the time.

engineercodex · on Feb 23, 2024

Ouch - half of the article (the "Actionable Takeaways" section) was my own commentary. The summary was for those who didn't want to necessarily parse through the entire paper's PDF.

Happy to listen to any constructive feedback if you have any, though!

engineercodex · on Feb 23, 2024

Hey! Thanks for your comment - I'm the one who wrote this article. I wasn't trying to say that the paper authors talked about "unexpected edge cases" or "thinking outside the box." I edited the post to be more clear that some of these takeaways are my own opinions.

This article is less of a summary of a paper and rather commentary on what the results of the paper entails. After all, Hacker News is meant for discussion :)

I will say though that I do believe that I still stand by the "exponentially more valuable" portion. I think the fact that LLMs can fluke their way into "hitting a jackpot" in terms of test coverage is exactly why they're so valuable. When you have something constantly trying out different combinations, if it hits even one jackpot, like in the paper, it's extremely valuable to the team. It's a case that could have been either non-obvious or simply too tedious to write a test for manually. I think there's tremendous value in that, especially speaking as someone who has spend way too much time simply figuring out how to test something within a Big Tech codebase (F/G) when I already knew what to test.

camkego · on Feb 24, 2024

Pedantic warning here. In fast and loose day-to-day common English language "exponentially more" means "fast growth" or "a whole lot". But that usage is meaningless! Why?, technically, you can't have exponential growth without a dependent variable. You can have exponential growth as a function of time, height, spend, distance, any freaking metric or variable. But it has to be as a function of a value.

You CAN'T have exponential growth that is not a function of some value or variable or input.

I suppose in this case you could argue you have exponential growth as a function of the discrete using-an-LLM or not-using-an-LLM, but I've never heard of exponential growth as a function of a discrete.

Often people using the term "exponential growth" in common English don't understand what it means. Sorry.

engineercodex · on Feb 27, 2024

Good point! I used exponential to emphasize the nature of the value of a certain test compared to the rest of the generated tests, but you're right it's not the right word. I updated the article to remove the usage. :)

atq2119 · on Feb 24, 2024

Spot on.

FWIW, exponential growth as a function of a discrete variable is very common (e.g. all of algorithmic complexity), but it has to be (at least modeled as) an unbounded numeric variable.

You can't have exponential growth as a function of a binary variable.

nicklecompte · on Feb 24, 2024

Seconding digdugdirk's comment :) Thanks for the thoughtful response and I apologize if I came across as mean.

My problem is we have no clue what those lines actually were. If it was effectively dead code, then it's not surprising that it was untested, and the LLM-generated test wouldn't be valuable to the team. We have no clue what the value of the test actually was, and using a single stat like "lines of code covered" doesn't actually tell us anything. Saying the test was "exponentially more valuable" is pure speculation, and IMO not an especially well-founded one. (Sort of like saying people who write more lines of code are more productive.)

This speculation seems downright irresponsible when the paper specifically emphasizes that this result was a fluke. When the authors said "hit the jackpot" they did not mean "hit the jackpot with a valuable test", they meant "hit the jackpot with an outlier that somewhat artificially juked the stats." I truly believe if the LLM managed to write a unusually valuable test with such broad coverage they would have mentioned it in the qualitative discussion. Instead they went out of their way to dismiss the importance of the 1,326 figure.

engineercodex · on Feb 27, 2024

You're right. I've edited my wording to be more realistic about the value of the test. I believe you're right that the test is not an outlier in terms of value provided.

Some of my comments within the article are more aspirational than realistic in this case, and I've made edits to reflect that.

I want to clarify that I view this LLM as a junior dev that submits PRs that pass presubmits and other verifiable, programmatic checks. A human dev then reviews the PR manually. In this case, the LLM + its processing is used to make sure that no BS is sent out of review - only potential improvements.

In no scenario should it's auto-generated code be auto-submitted into the codebase. That becomes a nightmare really fast.

digdugdirk · on Feb 24, 2024

Thanks for engaging with the above constructive criticism, it's a refreshing change from what is sadly the norm.

One additional question - do you forsee any issues with this application where LLMs enter a non-value add "doom loop"? I can imagine a scenario where a test generation LLM gets hooked on the lower value simplistic tests, and yet management sees such a huge increase on the test metric ("100x increase in unit tests in an afternoon? Let's do it again!") that they continue to bloat the test suite to near-infinity. Now we're in a situation where all future training data is now training on complete cesspool of meaningless tests that technically add coverage, but mostly just to cover an edge case that only an LLM would create.

Not sure if that makes sense, but tl;dr - having LLMs in the loop for both code creation and code testing seems like it's a feedback loop waiting to happen, with what seems like solely negative repercussions for future LLM training data.

engineercodex · on Feb 27, 2024

I could see that, and I wouldn't want LLMs just generating tests willy-nilly with no human oversight. I don't have any trust in the reasoning ability of LLMs at all.

Rather, I prefer to view LLMs as a junior dev that submits PRs that pass presubmits and other verifiable, programmatic checks. A human dev then reviews the PR manually. In this case, the LLM + its processing is used to make sure that no BS is sent out of review - only potential improvements.

samstave · on Feb 24, 2024

Perhaps there should be domains of focus for the test LLMs - even if they are clones, but assigned to only a particular domain, then their results have to be PR's etc...

Why not treat every LLM as a dev contributing to git such that Humans, or other LLms need to gatekeep in case something like that happens? (start by treating them as Interns, rather than Professors with office hours)

engineercodex · on Dec 23, 2023

This makes so much more sense now. Thanks for investigating

engineercodex · on Dec 23, 2023

Correct. This is a retrospective of an artifact from the early 2010s. The architecture is much different now.

exikyut · on Dec 23, 2023

Do you have any straightforward leads or pointers on what the tech stack looks like today?

It would probably be very interesting to compare then and now.

engineercodex · on Dec 23, 2023

I’m the writer of this. I linked to this presentation very clearly throughout the article as a source.

This was an article meant to resurface learnings from something that happened a decade ago, with added images and a clear distillation so that somebody doesn’t have to watch the 45 minute presentation to understand what happened.

I’m sorry you didn’t like it.

But I hope you can see the value I’m trying to provide here, as someone myself who doesn’t really have the time to sit through hour-long presentations to learn something.

rmbyrro · on Dec 23, 2023

I think most people here see the value of the article. Thank you for publishing it.

Parent comment was also valuable to me, for seeing how important it is to listen to others before jumping to conclusions about the worthiness of something.

mmaunder · on Dec 23, 2023

There was a time when many of the OGs here would write original content, not for subscribers, not to make money, but to share our ideas, facilitate our own learning, to have to defend our ideas, to go deeper, learn more, and get better at our art. That was the way, once.

ipaddr · on Dec 23, 2023

Can't wait to see what you submit. Don't criticize and offer nothing better.

mmaunder · on Dec 23, 2023

Check my history. Over 280 submissions going back to 2007. Welcome to HN btw.

ipaddr · on Dec 23, 2023

I'm on my nth account. 12,13 submissions year and barely a few this year isn't going to feed the masses. Produce more or accept that others will submit articles not up to your standards.

bad_username · on Dec 23, 2023

I got great value out of the article and shared it with my colleagues. Were it not for your effort, we'd never stumble upon this info, even though it existed in some other form.

graycat · on Dec 23, 2023

Uh, I really liked both the article and the posts in this thread. Both are timely and right on target for me as I work to get my Web site running on the Internet.

I like the 11 million unique users a month early in the company -- simple architecture, popular programming tools and languages, small team, and likely enough ad revenue to pay the bills and get some earnings!

I wrote my code using Microsoft's Visual Basic .NET, ASP.NET, SQL Server, and one use of platform invoke. The code appears to run as intended.

I like the mention of DB (relational data base): I wrote the code using a free version of Microsoft's SQL Server. Right, it's only for development work, has some severe limits on DB size, gets expensive for a production version, also a pain since have to count processor cores, was glad to see Postgres and MySQL since have been planning to use one of those. I liked seeing the mention, and remark on power, of key-value stores (e.g., Redis) since I wrote my own key-value store, with all the data just in main memory, using two instances of the .NET collection class.

Now with TB (trillion byte) main memories am even toying with the outlandish idea of keeping nearly all the DB data in main memory with SQLite -- outlandish!

And, liked seeing the steps up in capacity. That there are likely better architectural ideas now doesn't disappoint me!

The outlines of the architectures of even huge Web server farms was really good to see. You mean Google, Facebook, Amazon, Microsoft, etc. have surprisingly simple architectures??? WOW.

koliber · on Dec 23, 2023

Your article was clear, informative, and I learned a few things from it.

You can’t please everyone.

wokwokwok · on Dec 23, 2023

[flagged]

engineercodex · on Dec 23, 2023

I don’t really understand this comment.

I didn’t “slap” any date on top of the article. 2 Oct 2023 is the date I published my article. (This is how blogs on the Internet work.)

The first line of the article says that this all happened in 2012. I’m not sure I could get any clearer there.

The “GPT distillation” phrase is pretty rude. Anyone reading the article can see the difference between it and the presentation. Every “hand-drawn” image in the article is created by me in Excalidraw, and any images from the presentation are sourced with a very clear link to the presentation.

If you have any constructive feedback, I am happy to listen and take it into account.

Otherwise, I’m not really a fan of the rudeness. Thanks.

ramijames · on Dec 23, 2023

Don't worry about it man. Some people just like to complain. Don't take it to heart.

I found it interesting and valuable. It doesn't matter if the base content originated from another source. I would never have seen that original content. I saw this. It's ok.

jasonwatkinspdx · on Dec 23, 2023

I missed the original InfoQ presentation so this post was definitely useful to me. In particular I'm working on something related to the clustering vs simple sharding dilemma so finding another big name use case is quite helpful. Thanks and don't let the haters get you down.

clippyplz · on Dec 23, 2023

How do you figure this 45 minute talk with slides is "literally the same" as a five minute article with pictures?

engineercodex · on Dec 18, 2023

Try brilliant.org or swequiz.com

Watch some YouTube videos or read some CS articles.

If you have a project idea, break down the next steps into tiny actionable tasks and draw out how you would implement certain features.

engineercodex · on Oct 15, 2023

A lot more founders and employees are heavily diluted than you think. I always just assumed massive exits meant people were rich.

But when I calculated out how much an exec would make for a diluted $1B exit compared to working at FAANG for the same amount of time... FAANG turned out better many times (unfortunately).

I think preventing dilution should be one of the highest priorities for founders for themselves and for retention. Every single dilution event sets the outcome bar higher and, thus, harder to achieve.

steveBK123 · on Oct 15, 2023

Yeah I mean, it can get diluted pretty quick right!

Say you co-found with 1-2 others, so you start at 30-51%. You want to attract some good talent, so you give away 10% to early hires. Each funding round you give up 15-25%. You can see how your equity can quickly get below 10%.

Zuck managed to hold onto ~30% but your typical control is much lower these days. Musk had 28% of Tesla at IPO. Kalanick had 8% of Uber at IPO. Foley had 6% of Peloton at IPO. Splunk 3 co-founders had 5-7.5% each at IPO. Butterfield had 8% at Slack IPO.

So you may be foregoing 10 years of $700K FAANG comp for a ~$80M payday (8% equity of $1B unicorn exit).

To me it raises the question if non-founder startup equity is basically worthless. If a founder ends up diluted down to 8% at exit, what happens to the 0.5-1% shares you get promised at hire? 0.06-0.125% ?? So your first senior engineer hire into a unicorn equity might end up worth a grand $1M at IPO? Ouch. Go grind some leetcode and work for FAANG and make that in 2 years.