More

IanCal · 2026-06-15T11:07:54 1781521674

Which models were you using under this? If you used the quality default as exists in the interface, it makes sense that it was ~4x the cost as it'd be 3 frontier models judged by one of those.

The idea would be to use fusion with simpler, cheaper models.

IanCal · 2026-06-15T08:41:53 1781512913

I’m a little lost though this seems like it could be fun, on safari mobile whatever I build keeps losing some of the connections as I tap on other things so it’s hard to get far with it.

IanCal · 2026-06-11T10:47:49 1781174869

> software development, as a “decide-execute-deliver sandwich”. AI compresses the “execute” layer — the middle of the sandwich — but the other two layers resist automation in a way that will not be overcome by capability improvements alone.

I really struggle to see why improved capabilities cannot deal with those other layers. I do not believe you have substantiated this claim about not being possible as capabilities improve.

> At one end of the pipeline, development teams need to decide what to build.

Developers are not the ones that do this largely. This role is far more on the side of "Product Owner". Sometimes your job covers both, but this is not the majority of the work and does not mostly require SE knowledge - some input usually.

> This layer is hard to automate because it requires thinking about user needs, market signals, organizational priorities, and in some cases regulatory constraints.

Hmm, these are language models that can talk through much of this already - but more importantly none of what is mentioned there requires software engineering. For parts that do (I'm sure someone would come to correct me if I said that there was none or seemed to suggest it is never ever ever relevant) this is a much smaller slice.

> As AI capabilities improve, the kinds of decisions that can be delegated to AI increase over time. But this does not make the “decide” layer thinner — once a decision can be delegated to AI, it is no longer a source of competitive advantage, and the value of human decision-making migrates upward. Software increases in complexity over time, so there is no ceiling to this process.

Now this is rather hidden but a huge leap in logic. The decide layer does get thinner for all the same projects, and then you simply assert that software will get more complex and so this cancels it all out.

A team of 5 may end up being able to ship what a team of 50 used to, and maybe now there are 10 teams outputting more - but is there not a clear limit to this? At some point do we not just need 45 fewer people? That there needs to be some engineers is not the same as needing anywhere near as many as we have.

For a time I think we will see increased output meaning more software, but that tails off as they get better.

> At the other end of the sandwich, human teams need to be accountable for what they deliver.

Why? And if we assume so, why does that need a software engineer?

> It is possible that some day in the future teams will ship mission-critical code without fully testing and understanding it,

You don't need to read code to test it, and people choose to ship products without fully understanding the code all the time. Literally any decision maker who is not a software engineer who knows the entire codebase does this. Companies fully ship systems that are far too complex for any single developer to even understand.

And much of software isn't mission critical. Or at least, if you want to say it is then the mission is low stakes.

> today’s AI is so unreliable that such haphazard practices would represent an existential threat to software teams and their customers.

I'd argue for a bunch of stuff this isn't true, and the whole point of the article is "never even if they get better" which is different.

> A central insight of AI as Normal Technology is that we can collectively choose to keep humans accountable through shared norms, law, and policy.

Sure, we can ban AI writing code, but will we? Is there a huge collective concern for all us high paid engineers being replaced by AI?

IanCal · 2026-06-09T08:27:50 1780993670

> provided you're willing to label 200 or so images

A quick note to say that this is also a task you can hand to things like gemini.

dekhn · 2026-06-09T23:28:29 1781047709

Yep- this is what I do. I use a high quality VLM to generate labelled boxes (in my case, around tardigrades in a microscope image), do some light editing to fix the small number of errors, and then train YOLO26 with it. Works great, saved me tens of hours of labelling. It's a bit scary that there is a VLM that works as well as my fine-tuned model (although much slower).

globalnode · 2026-06-10T00:49:06 1781052546

thats a fantastic strategy thank you, and thanks to all the other helpful posters as well here. do you have any tips for how to choose the base yolo model? or just any generic one will do?

IanCal · 2026-06-09T08:25:35 1780993535

They can however be extremely useful for curating training data. Also things like SAM and the DINO (/grounding dino) models.

Also if they are better then you can also have a flow that’s cheap model -> marginal cases go to more complex thing (and a chain of these).

The yolo models are really shockingly good for their cost and how well they can work with not much training data as well.

IanCal · 2026-06-06T16:20:09 1780762809

This is an interesting read: https://ai-2027.com/

I'm not going to say it's a perfect prediction, but I do find the trajectory of "can write something reasonable" to "oh can write snippets of code" towards larger and larger systems feels like it's played out - the common thing I see more now is that people talk of "taste" that the humans are contributing more than the raw coding part.

I get what you mean with this rather automated research, I've done it on a smaller scale with performance work because it can run/test/measure/propose changes/debug and loop. I can throw a vague idea at it, guide it or discuss with it and go and make a coffee.

lowbloodsugar · 2026-06-06T18:26:57 1780770417

That was a “fun” read. Like Nick Bostrom’s Superintelligence [1].

[1] https://www.goodreads.com/book/show/20527133

adastra22 · 2026-06-08T10:00:53 1780912853

A book that has been thoroughly discredited by actual events.

IanCal · 2026-06-04T10:48:09 1780570089

> If we agree weights producing text may emerge consciousness, given large enough, then DBs must've gained it long ago.

If we agree that silicon can perform calculations, then beaches must have been working out log tables long ago.

dmd · 2026-06-04T12:14:08 1780575248

Greg Egan wrote an entire (fabulous) book about exactly that, "Permutation City".

larodi · 2026-06-04T21:13:56 1780607636

Diaspora is mind-blowing, yet highly improbable and speculative, even though carefully threaded to sound plausible on all levels. The whole introdus idea and simulation from VM perspective sounds incredible, but I don't think the body runs a simulation of anything. It is something else.

bobson381 · 2026-06-04T12:53:59 1780577639

a la https://xkcd.com/505/

larodi · 2026-06-04T21:15:07 1780607707

working out things,... is not calculating.

IanCal · 2026-06-03T08:21:58 1780474918

> 98% smaller in terms of active parameters (since it's a mixture of experts model).

I don’t think that’s right, this flash model is 5B active params. Qwen3.6-35B-A3B is 3B so 40% smaller.

IanCal · 2026-06-02T20:09:06 1780430946

51% does not mean it randomly gets things wrong half the time.

These things can be useful if you can accurately predict which tasks they will reliably do, and which they will usually fail on. Then you can get much more reliable work from them.

IanCal · 2026-06-02T14:18:46 1780409926

Why would it not be reproducible?

cmxch · 2026-06-03T03:02:03 1780455723

Its analysis from prompt/harness to end products.

IanCal · 2026-06-03T10:03:31 1780481011

You can't do that for me either.

Why would you refuse to use a patch that deals with a valid PoC exploit?

If a random contributor posted an explanation of an exploit, showed it worked in an executable way, presented a patch and you could see that the exploit no longer worked - would you refuse to use the fix until the contributor showed how they figured it out?

cmxch · 2026-06-03T13:10:50 1780492250

Given where Mythos alleges to go, reproducibility far beyond a hash promise, an alleged (but not really proven) existence of an PoC, and “Trust me bro” is necessary.

When an ungated (or even abliterated) public model can repeatedly, easily, and accurately embarrass Anthropic’s models, that might change.