I used to assume they pushed people into the prompt-only workflows because you’re paying them for the tokens, and not paying them for the scaffolding you built. However, I think that they’re really worried about is that a person needs to design and implement that stuff… It throws a wet blanket on their insistence that this will replace entire people in entire workflows or even projects, and I just don’t buy it. I do think it’s going to increase productivity enough to disastrously affect developer job market/pay scale, but I just don’t think this particular version of this particular technology is going to actually do what they say it will. If they said they were spending this much money bootstrapping a super useful thingy that can reduce a big chunk of the busy work of a human dev team— what most developers really want, and most executives really don’t— a bunch of investors would make them walk the plank.
I also think having granular, tightly controlled steps is much friendlier to implementing smaller, cheaper, more specialized models rather than using some ginormous behemoth of a model that can automate your tests, or crank out 5 novels of CSI fan fic in a snap.
> However, I think that they’re really worried about is that a person needs to design and implement that stuff… It throws a wet blanket on their insistence that this will replace entire people in entire workflows or even projects, and I just don’t buy it.
I think you are on to something. But I also think this sort of system lends itself to not needing really good LLMs to do impressive things. I've noticed that the quality of a lot of these LLMs just gets worse the more datapoints they need to track. But, if you break it up into smaller and easier to consume chunks all the sudden you need a much less capable LLM to get results comparable or better than the SOTA.
Why pay extra money for Opus 4.7 when you could run Qwen 3.6 35b for free and get similar results?
And then you realize that what you’re using the smaller models for is ALSO decomposable and part of it is just a few if statements, and then you realize that for this feature you don’t actually need or want a model because the performance, reliability, reproducibility are cheaper and better for you and your users.
Additionally, developers tend to become less expensive as venture capitalists turn off the spigot, while access to giant frontier models becomes way more expensive. Beyond that, a developer might go out and have a beer with you after work, which appeals to the sickos that have the gall to prioritize humanity over fanatical efficiency for corporate gains.
Indeed, I've been experimenting with agent workflows, for complicated tasks - where I essentially have a graph of agents with different roles/capabilities, including such things as breaking down complex tasks into simpler ones. There seems to be a point where a complex enough task is better performed by a group of cheaper agents/models than by one agent using one of the SOTA big models, in terms of both quality and cost.
The big SOTA models win in world knowledge, that's what all those parameters are for. But a huge fraction of agentic tasks is going to be plain clerical work that needs no special knowledge at all, a much simpler model can do them in a straightforward way.
It is also interesting because you get people with very different use cases arguing about the effectiveness of various models but doing very different things with them.
Its one things for a model to be very clearly instructed to add a REST endpoint to an existing Django app and add a button connected to it on the front vs "Design me a youtube". The smaller models can pretty dependably do the first and fall flat on the second.
> However, I think that they’re really worried about is that a person needs to design and implement that stuff… It throws a wet blanket on their insistence that this will replace entire people in entire workflows or even projects
You can have the AI design the custom harness in advance. It's not especially hard work! In fact, the AI could even come up with the workflow itself; it's a different and much simpler problem than trying to stick to it after-the-fact, with a filled-in context.
I would prefer that be deterministic though. This thread has me considering what if anything I can do to make it forced. Like, I could do it with hooks, but that's not elegant at all.
The designing and implementing of a code harness in your workflow can be as simple as running something like /skill-builder.
You prompt for what you want it to do, and it will write eg. python scripts as needed for the looping part, and for example use claude -p for the LLM call.
You can build this in 10 minutes.
I don’t use a cloud platform, so I can’t comment on that part. I‘d say just run it on your own hardware, it’s probably cheaper too.
I always knew the dev world leaned more toward interesting technical challenges and interoperability than maximizing the benefit to humanity- it’s why I switched to design. However, I didn’t realize the intensity of that preference until the entire industry got ridiculously AI-pilled.
While the author does mention the barriers to adoption, the premise— Apple was waiting for people to do something, but people weren’t doing it— subtly casts Apple as a passive entity in this scenario. The solution seems to be presented as Apple stepping in to make up for Developers’ inaction. If it’s been 14 years and there’s been very little adoption, this is clearly a UX problem. How many small venues or libraries have developers, let alone developers that do enough Apple-specific development work to have an Apple Developer account? In 14 years they couldn’t come up with an alternate solution? Maybe a less expensive administrative version of a developer account? It’s not users jobs to sell themselves on Apple’s products.
What there really should be is a wallet equivalent of an ics file. It doesn't need to support everything, static images would be enough for most use cases. Advanced features could then require the current model.
But that would require collaboration, and standards, which seem to have gone away as smart phones came in.
W3C Verifiable Credentials [1] does almost exactly what you suggested and was recently approved as a top-level W3C standard. Adoption has been sluggish outside of digital identity (with Android [2] and the EU digital identity wallet being notable exceptions), but I think it is because the family of standards is relatively new.
This has existed since the first version, except it needs to be signed with a valid apple cert.
A .pkpass file is a zipped directory that has a json file and some assets. There's no need to have a more limited version, a pass is already very limited.
The issue is spoofing. Major event ticketers are unwilling to publish passes if there's nothing to stop someone else from publishing a pass that is indistinguishable from their's and thus is an avenue for fraud.
The difference with events is that an ics file is not something someone's going to try to sell you or that you'd want to buy. But anyway, all Apple would have to do is stop checking the signing.
This exists, .pkpass. You mostly don’t know about them because iOS tries to abstract away the file system, and because each one has to be code signed by a registered Apple Developer account.
The problem is that those are treated almost like an app, you need a $99/year developer certificate to publish them.
Many third party ticketing solutions venues and events use do support this, but for instance if you want to sell tickets for a party and self-host, you need another external integration, or a developer account. Generating a PDF with a QR code, and publishing an .ics file is essentially free.
My guess is that the are requiring this in order to reduce the amount of fraud there (I am sure there still is some, but...). Apple really does not want to be involved when someone can't get into the Taylor Swift concert that they paid some scammer a lot of money for the Apple Wallet ticket they got.
Having an authenticated developer account at least provides some level of speed bump to scammers, and a better starting point for the police.
There are many events that are still sending you a pdf file with your tickets. Until fairly recently, that included major venues too.
The charitable explanation is that the wallet is designed for credit cards, and tickets were an after thought. Though I suspect it is really Apple trying to keep a walled garden, just like they always have.
Excellent take. Had Apple made a dummy proof "Pass" portal for clubs, venues, etc to use to visually design and manage passes (and maybe even distribute?) when they launched this, I think it would have exploded, and the ecosystem lock in would have just been all that much deeper. But Apple doesn't really think or operate like that.
Be really interesting to see how their approach evolves over the next couple years with sea changes happening all around them in this moment.
Code is pretty much the perfect use case for LLMs… text-based, very pattern-oriented, extremely limited complexity compared to biological systems, etc.
I suspect even prose is largely considered acceptable in professional uses because we haven’t developed a sensitivity to the artifice, and we probably won’t catch up to the LLMs in that arms race for a bit. However, we always manage to develop a distaste for cheap imitations and relegate them to somewhere between the ‘utilitarian ick’ and ‘trashy guilty pleasure’ bins of our cultures, and I predict this will be the same. The cultural response is already bending in that direction, and AI writing in the wild— the only part that culturally matters— sounds the same to me as it did a year and a half ago. I think they’re prairie dogging, but when(/if) they drop that bomb is entirely a matter of product development. You can’t un-drop a bomb and it will take a long time to regain status as a serious tool once society deems it gauche.
The assumption that LLMs figuring out coding means they can figure out anything is a classic case of Engineer’s Disease. Unfortunately, this hubris seems damn near invisible to folks in the tech industry, these days.
I think that might help a little, but is not a solution. When you’re figuring out some new way to combine code instructions to perform novel coding tasks, you’re just finding new configurations for existing patterns to get results you can easily test. The world outside of computers is infinitely more complex, random, and novel.
Well, there’s a hell of a lot more false confidence among people who think they can evaluate the merits of a design than designers that do major interface projects not knowing the purpose of what they’re doing. And there are different kinds of designers out there. If you hire a database genius that only has done serious, involved database work, and then add a bunch of front-end web dev work to their tasks because they’re ‘a developer,’ it’s neither an indictment of that developer or developers in general if your web front end is structurally wack. If you hired someone that’s only modified a few existing Wordpress plugins for a green field project, is it their fault or yours if they do a bad job?
The complexity in dev is a lot more obvious than the complexity in design. There’s a big long clear approach to Dunning-Kreuger’s Mt. Stupid with dev work. With design work, the whole idea is to make something that clearly communicates its purpose. That makes a lot of people think they understand what went into it because if it’s done well, the solution should feel ‘obvious.’ Getting something that feels obvious is way more nebulous and convoluted than getting from point a to point B in most dev tasks.
> There’s a big long clear approach to Dunning-Kreuger’s Mt. Stupid with dev work.
Is it really that clear? Or do we just think so because most of us here are devs, while everyone else is thinking “Wow what happened? That codebase was great until suddenly it wasn’t.”
Yes it’s really that clear. The moment non-technical people see code syntax or hear technical jargon, they instantly nope out. That’s why people ask their developer friends and family to fix their computers — they don’t know the difference. It’s all just ‘tech’ to them. It’s also why people toil away at bonkers “no-code” machinations that would have been far simpler with a little more tech knowledge… it’s very intimidating to outsiders. OTOH, far fewer people even know what problems designers are meant to solve, let alone judge the solution, but many think ‘well I have good taste and I’ve worked with designers before’ and confidently wield their broken ideas based on false assumptions.
Ok, let’s see that consent form and how explicitly it states that random call center people will possibly look at anything you record. I’ll bet you a crisp $50 it was a form designed to be as click-through-worthy as possible, being sure to not trigger the “wait, should I do this?” reflex in users, and also not loudly disclosing that you could still use the device without agreeing, if you even can, while still technically “””disclosing””” this information. The tech world has turned consent into a fucking joke.
Right. The whole point is that click-through consent forms get users’ ”clear“ ”consent” legally, but not morally. They’re deliberately opaque about the implications (ask 10 users if they consider recording a video on a device voluntarily ‘sharing’ it with anybody and I’ll bet 9 will say no,) are pretty inscrutable to regular people, are designed to not raise suspicions like a social engineering attack, often mean not being able to use the product they just bought if they don’t consent, (which is manipulative as hell when you’re talking about inessential functionality like telemetry,) and extremely consequential. The only evidence you need for that is how pissed off people get when they find out what these companies actually do with that consent.
But it means that the appellate decision will retain precedence, no? Wouldn’t losing precedence be the primary legal effect of overturning that decision? All case law that hasn’t touched the Supreme Court could theoretically be challenged, but most of it isn’t, and it’s considered the law until it isn’t anymore, right? How would this be any different?
The decision is binding only within the jurisdiction of the Court of Appeals for the D.C. Circuit.
So it’s not correct to say “because SCOTUS denied cert, Thaler is now binding national copyright law.”
Practically speaking, it is binding on the US Copyright office (one of the parties in the case) in CADC. And that’s important. But copyright litigation happens all across the country, while this ruling only directly constrains the relatively small number of cases within CADC.
Although this decision is not binding in other circuit courts, this decision still is something that you can bring to a judge in other courts. They are not required to follow this ruling because they are not in that circuit. However, they still will consider what other courts have said and that will be incentive to think hard before they do something different. A judge who does something different is generally expected to write up a reason why they did something different, and that's something that would be given to an appeals court if they do do something different for consideration of why the other court was wrong.
Yeah, I’ve heard lawyers use decisions in other jurisdictions to give weight to their line of reasoning. The SC saying they aren’t reviewing an appeal might not make that universally binding, but it signals that they don’t categorically reject the lower court’s decision.
I doubt any lawyer would mention the SC didn't review this - that is meaningless and judges know it. They will however mention this case. Even if they case goes against them they will mention it so they can say why it is wrong (the opposition will be sure to mention it so they have to be prepared to take it down)
And you’re saying the SC not taking up the issue has no effect on the weight of that non-binding citation in their argument, even if it was effectively the same situation in a different jurisdiction? The argument was that because the decision only had precedent in that circuit, the fact that the SC did not take the issue up has zero effect on decisions outside of that jurisdiction, even for essentially the same situation. If that’s what you’re arguing, I don’t buy it.
Yes, I didn’t imply national precedence. I imagine it would also signal to attorneys appealing cases other circuits that the same challenge will likely yield the same result.
Didn’t they say Sora will only be used to internally create training data? Integrated image generation seems more in the neat feature category than some fundamental advantage, but maybe someone has use cases I haven’t considered.
They made some of my favorite products. Their having GDP-level revenue doesn’t benefit me… at all. Their putting less effort into those products negatively affects me. There are more losers than beneficiaries, here. I couldn’t care less how many billions investors got. Monetarily, it’s a net gain. Societally, it’s a net loss.
I also think having granular, tightly controlled steps is much friendlier to implementing smaller, cheaper, more specialized models rather than using some ginormous behemoth of a model that can automate your tests, or crank out 5 novels of CSI fan fic in a snap.
reply