I mostly agree with the article - I believe the differentiation should be between documents and applications.
While HTML serves its purpose, especially for documents, the modern web is a giant mess of that legacy, combined with unfriendly ergonomics and glue/hacks built on top just so we as developers can have better DX for creating complex software on top of it.
Building a browser means having to deal with all that legacy, wether we like it or not, so most of the browser market got captured by the big players who have enough manpower to cover all those edge cases. That also means we have to deal with whatever technical choices or bloat they make, causing an infinite stream of issues, from memory usage, to size, to limitations that don't make sense in 2026 but are still there because someone 20 years ago decided to write them like that. As I deal with mobile webviews a lot in my daily work, I unfortunately had to get familiar with quite many gotcha's and edge cases, and some are just... absurd in this day and age.
I believe we need a separation between an application layer and the document layer, and especially between the UI language and the actual application code - script tags serve their purpose, but again, they are a hacky solution with its own bag of tricks, and those tricks impact all of the software built upon it.
Now, a bit of a shameless plug I've been working on something to fill that gap, at least for myself and hopefully for others who encounter the same issue - it's called Hypen (https://hypen.space) and it's a DSL for building apps that work natively on all platforms, with strict separation of code/UI/state, and support for as many languages and platforms as I can maintain, not "just javascript". While currently it's focused on streaming UI, it's built with Rust and WASM at it's core and will soon allow fully "compileable" apps.
While it may not be the future of software, once you get into building something like that, it becomes obvious that the way we are building now is at least wrong, and at best kafkaesque.
Back in the heyday, I used to work in a startup devoted to the cinema world, where with one app you could buy tickets for all cinemas - even those that did not "officially" support it.
Among them were arthouse theaters in Hamburg, which I often used for testing, as most of the time reserving a few seats would not matter as they would be empty, at least during the day. Some of them had projections of old movies, and I was like "if I lived there, I'd go every day".
Ironically, now I live between 2 art cinemas in my city and rarely go to any of them :)
We never left waterfall in the end.
Working with and for dozens, collaborating with probably a hundred software companies in different scales, every single one said:
We do agile
Guess what?
Every single one of them was doing waterfall.
Their agile included preplanning and pre-specifying the full spec and each task, before the project kicked off. We'd have meetings where we'd drill down into tasks, folks would write them down so detailed that there would be no other way than doing that. Agile would be claimed, but the start date, end date, end spec and number of developers was always concrete.
Sometimes, the end date was too late, so a panic would ensue. Most of the time, the date was too late because developers had "unknowns" which then had to be "drilled down and specced so they wouldnt be unknowns". Sometimes, nearly 50% of the workweek was spent on meetings.
A few times, a project was running late - so to make sure we are _really_ doing it agile, we'd have morning standups, evening standups, weekly plannings, retrospectives, and backlog refinement. It would waste the time, and the "unknowns" aka "tickets to refine" were again, as always, dependant upon the PM/PO/CEO's wishes, which wouldn't get crystallized until it was _really last minute_.
One customer wanted us to do a 2 year agile plan on building their product. We had gigantic calls with 20+ people in them, out of which at least half had some kind of "Agile SCRUM Level 3 Black belt Jirajitsu" certificates.
To them, Agile was just a thing you say before you plan things. Agile was just an excuse to deal with project being late by pinning it on Agile. Agile was just a cop out of "PM didn't know what to do here so he didnt write anything down". Agile was a "we are modern and cool" sticker for a company.
And unfortunately, to most of them, agile was just a thing you say for the job, as their minds worked in waterfall mode, their obligations worked in waterfall mode, companies worked in waterfall mode, and if they failed their obligation to the waterfall, their job would go down one.
So while we were doing the Agile ceremonies, prancing around with our Scrum master hats, using the right words to fit into the Agile™ worldview - we were doing waterfall all along.
And after 15 years, I'm not even sure - did agile really ever exist?
Continuous integration and demos to stakeholders (devs, designers, product managers etc) every 2 weeks - these practices are now engrained :-) It's frequent to then do corrections after these demos, and that really helps ensuring the product manager is getting what their customers need.
Easy to forget waterfall in 1970s / 80s really meant teams working on their own for months and then realizing there is no way to assemble the whole product from the parts. Or that the industry has moved on and the product is obsolete.
Agile as "devs can do what they want" never really existed ;-) Managers always have to plan / T-Shirt size resources (time, devs) to some degree. For stuff that's really hard to break into tasks, the magic word is "the plan is to do a POC first".
Coming from someone who also doesn't like teams being asked to break their unknowns into 30 known tasks. It's a compromise... I agree with all your points on how Agile is abused / misunderstood. Yet i believe in the progress from continuous integration and regular demos to stakeholders as a sign we did change something....
Most companies don't do that much of a regular demo to customers anyways - turns out most customers aren't even interested for the first 30-50% of the project, then they become mildly amused, until the final 80% - that's they start getting incredibly interested and opinionated.
> Agile as "devs can do what they want" never really existed ;-)
No real agile ever really exists in the end :)
But it's not devs not doing "what they want" that bothers me - it's the absurdly over-planned project estimates and timelines, with every detail of the project being specced out, not a lot of margin room for errors, invoking the name of "agile principles" as a way to deal with exactly things the PM's don't want to deal with in that moment.
I'd be fine with some degree of planning ahead, or starting with prototypes/PoC's, but such a huge part of the industry just chunks it into "same boat but we'll put agile stickers on the holes", and there is a whole industry of ceremonies around it, that it breaks the "core principles" of agile.
A joke says that its because once you get it, you lose the ability to explain it like a normal person :)
And another joke says the best way to explain a monad tutorial is to write another one, so sorry for this.
Just think of it as a box.
If amazon sent items themselves, it would be hard to pack, no way to standardize, things would break often or fall out of their respective boxes.
Now, if you put it into one of the standardized boxes, that makes things 100x easier. Now you can put these on a conveyor belt, now you can have robots sorting these, now you can use tape to close them, standardization becomes easy as it's not "t-shirt,tennis ball,drill" but just "box box box".
So now you can do all kinds of things because it's all a box. And you can also stress test the box.
It's the same with these.
A. You can just have a function that: calls a something on IO, maps it's values, does a calculation, retries if wrong, stores the result, spits it out.
Or B. you can have functions that calls any function on IO, functions that map any value to any other value, functions that take any other function and if that function fails calls another function or retries, one that stores any value given to it and returns with information if it saved or not etc.
The result is the same in the end, but while 1 makes the workflow be strictly defined only for that case, and now you have to handle every turn and twist manually (did the save save? what if not? write a check, write a test that ensures its not and the check works, same if it does...) the 2 lets you define workflows with pre-tested, pre-built blocks that work with any part of your codebase.
And it makes your life 1000x easier because now you have common components that work with any data type inside your codebase, do things your way always, are 100% tested and make it easier to handle good cases, bad cases, wiring and logistics. And you can build pipelines out of them. Because at the end, what it does is just lets you chain functions that return wrapped values.
And you end up with code like:
val profileData = asAsync { network.userData(userId) }
//returns a Async<Result<UserData, Error>
.withRetries(3) // Works on Async, and returns Result, retries async if fails
.withTraceId(userId) //wrapped flatmap that wraps success into Trace<T> and adds a traceId
.mapTrace(onError = { ErrorMappingProfile }, { user -> Profile(user.name, user.profileId) } // our mapTrace is a flatMap for Trace objects, so it knows how to extract trace objects, call the functions and wrap them again
.store("profile_data") //wrapped mapCatching again for storage explicitly that works on Trace objects, knows how to unwrap them, stores them,
.logInto(ourLogger) // maps trace objects into shared logger
Each of these things would before have to be manually written inside the function, the whole function tested for each edge case. if/else's, try/catch, match/when/switch.
This way, only thing you need to cover with tests now is `network.userData()`, as all other parts are already tested, written and do what they say they do. And you can reuse this everywhere in your projects. Instead of being a function you call with data, it becomes a function you give a box and it returns a box. Then you can give it to any other function that needs a box. If boxes make no sense, think of the little connectors on lego bricks, or pipe connectors in plumbing, or stacking USB adapters or power strips.
I can't stress enough how much this approach helped me in real life cases - refactoring old codebases especially, as once you establish some base primitives, the surface area starts massively collapsing as the test surface area increases.
Not sure if we read the same post, as I cannot agree with this claim, especially under this post that exactly goes into details of what happened.
>LLM is a sorcery tech that we don't understand at all
We do, and I'm sure that people at OpenAI did intuitively know why this is happening. As soon as I saw the persona mention, it was clear that the "Nerdy" behavior puts it in the same "hyperdimensional cluster" as goblins, dungeons and dragons, orcs, fantasy, quirky nerd-culture references. Especially since they instruct the model to be playful, and playful + nerdy is quite close to goblin or gremlin. Just imagine a nerdy funny subreddit, and you can probably imagine the large usage of goblin or gremlin there. And the rewards system will of course hack it, because a text containing Goblin or Gremlin is much more likely to be nerdy and quirky than not. You don't need GPT 5 for that, you would probably see the same behavior on text completion only GPT3 models like Ada or DaVinci. They specifically dissect how it came to this and how they fixed it. You can't do that with "sorcery we dont understand". Hell, I don't know their data and I easily understood why this is going on.
>they want you to think that LLMs are smart beasts (they are not)
I mean, depends on what you consider smart. It's hard to measure what you can't define, that's why we have benchmarks for model "smartness", but we cannot expect full AGI from them. They are smart in their own way, in some kind of technical intelligence way that finds the most probable average solution to a given problem. A universal function approximator. A "common sense in a box" type of smart. Not your "smart human" smart because their exact architecture doesn't allow for that.
>and that we know what LLMs are doing (we don't)
But we do.
We understand them, we know how they work, we built thousands of different iterations of them, probing systems, replications in excel, graphic implementations, all kinds of LLM's. We know how they work, and we can understand them.
The big thing we can't do as humans is the same math that they do at the same speed, combining the same weights and keeping them all in our heads - it's a task our minds are just not built for. But instead of thinking you have to do "hyperdimensional math" to understand them 100%, you can just develop an intuition for what I call "hyperdimensional surfing", and it isn't even prompting, more like understanding what words mean to an LLM and into which pocket of their weights will it bring you.
It's like saying we can't understand CPU's because there is like 10 people on earth who can hold modern x86-64 opcodes in their head together with a memory table, so they must be magic. But you don't need to be able to do that to understand how CPU's work. You can take a 6502, understand it, develop an intuition for it, which will make understanding it 100x easier. Yeah, 6502 is nothing close to modern CPU's, but the core ideas and concepts help you develop the foundations. And same goes with LLM's.
>personally side with Yann Le Cun in believing that LLM is not a path to AGI
I agree, but it is the closest we currently have and it's a tech that can get us there faster. LLM's have an insane amount of uses as glue, as connectors, as human<>machine translators, as code writers, as data sorters and analysts, as experimenters, observers, watchers, and those usages will just keep growing. Maybe we won't need them when we reach AGI, but the amount of value we can unlock with these "common sense" machines is amazing and they will only speed up our search for AGI.
We understand the low level details of how they are constructed. But we do not fully understand how higher-level behavior emerges - it is a subject of active research.
We do understand tho, it is exactly what they were made for.
If you train it on a dataset of Othello games, or a dataset including these, you are basically creating a map of all possible moves and states that have ever happened, odds of transitions between them, effective and un-effective transitions.
By querying it, you basically start navigating the map from a spot, and it just follows the semi-randomly sampled highest confidence weights when navigating "the map".
And in the multidimensional cross-section of all these states and transitions, existence of a "board map" is implied, as it is a set of common weights shared between all of them. And it becomes even more obvious with championship models in Othello paper, as it was trained on better games in which the wider state of the board was more important than the local one, thus the overall board state mattered more for responses.
The second research you linked is also has a pretty obvious conclusion. It's telling us more about us as humans than about LLM's, about our culture and colors and how we communicate it's perception through text.
If you want to try something similar, try kiki bouba style experiments on old diffusion models or old LLM's. A Dzzkwok grWzzz, will get you a much rougher and darker looking things than Olulola Opolili's cloudy vibes.
The active research is as much as:
- probing and seeing "hey lets see if funky machine also does X"
- finding a way to scientifically verify and explain LLMs behaviors we know
- pure BS in some cases
- academics learning about LLM's
And not a proof of where our understanding/frontier is. It is basically standardizing and exploring the intuition that people who actively work with models already have. It's like saying we don't understand math, because people outside the math circles still do not know all behaviors and possibilities of a monoid.
@hypendev I am not trying to start a flame war, but let me take a very simple example.
As another one put it, we know how to build deep-learning machines. No question about that. My statement is that we don't understand clearly why they output the observed results.
Let's imagine that you have a model that can detect cats on an image, with 95% accuracy. If you understood how the model worked, I could give you an image of a cat and you could _predict_ reliably if the model would detect the cat.
Yet, we are not able to do that: you have to give the image to the model to observe the result. We can't predict reliably (i.e. scientifically) the result and we don't know how to better train the model to detect the cat without altering the other results. (Of course including the test image in the training set is forbidden).
Back to LLM: we can't predict how they will behave. Therefore, even world-class scientists at OpenAI, knowing about a Goblin issue and making assumptions about the cause, are not able to edit the model directly to fix it. They would if they understood it fully. But they are reduced to test-and-hack their way through.
Sorry if it sounded like that, not trying to have a flame war, just trying to understand which part we don't _understand_, as it seems silly to me.
Yeah, we cannot predict with 100% accuracy the results of a model, not mentally, as to be able to do that we should be able to do the same math in our head and that's just ultra rare next level intelligence. And we can make a reliable predictor, but making a reliable prediction model of a models results would be the same model in the end.
So the closest that we can get to "understanding" it fully, is learning how it works, and developing intuition around it. And I think we pretty much have that, at least among the people in the field. Those who worked on training it especially have some intuitive understanding of what is going on, otherwise they would not know where to "test and hack".
It's math all the way down, but I feel like the angle some people in early days used about "magic emergent properties" or "signs of consciousness" ended up making it seem more mystical than it is.
I have a similar story, but I was playing on the beach. There was a mound right next to it and I would love to play there, and the mound had some funny stones.
One of them was square with something painted on it, I was fascinated by romans so I annoyed my parents with "I found a mosaic!" and took it with me.
Turns out, years later, they excavated a roman villa there.
Funnily enough, the same beach has roman villas, dinosaur prints, austro-hungarian tunnels and yugoslavian bunkers. Quite a lot of history in one pretty beach.
I am building Hypen, a UI framework that enables you to stream native UI to any platform (Android, iOS, Web, Desktop soon) from the server.
It supports languages like Rust, TS, Kotlin, Swift and Go for the backend.
Comes with things like reactivity, tailwind support, routing out of the box.
It basically lets you update apps without the app store, use the same codebase for all platforms or have custom server-driven modules in your apps.
Upcoming cool things:
- Working on canvas support so you can easily switch or render anything in canvas.
- Building an stdlib so apps can also be compiled and client-only
- Easy way to deploy apps
Open sourcing this in a few days, it's still early alpha now.
Depends - in the pure technical "implementation" level, only limits I've found are the ones with outdated knowledge - libraries, platforms, availability of some things.
But one big limit is the DX. Their choice of DX is usually abysmal - ironically, just like an average devs. They seem to lack the aesthetic instinct for code, so you have to really point them hard into the direction or provide a sample of the expected DX, for them to still fight against it at every turn.
While understandable in a way, as they are trained on average code and most code will now be written by the machines anyways making the DX "less relevant", it's also a giant code smell, as bad DX tends to point towards bad internals and wrong decisions along the way.
So not really a technical limit - they swallow anything you throw at them, even the most complex cases - but more of an aesthetic limit in terms of taste.
While HTML serves its purpose, especially for documents, the modern web is a giant mess of that legacy, combined with unfriendly ergonomics and glue/hacks built on top just so we as developers can have better DX for creating complex software on top of it.
Building a browser means having to deal with all that legacy, wether we like it or not, so most of the browser market got captured by the big players who have enough manpower to cover all those edge cases. That also means we have to deal with whatever technical choices or bloat they make, causing an infinite stream of issues, from memory usage, to size, to limitations that don't make sense in 2026 but are still there because someone 20 years ago decided to write them like that. As I deal with mobile webviews a lot in my daily work, I unfortunately had to get familiar with quite many gotcha's and edge cases, and some are just... absurd in this day and age.
I believe we need a separation between an application layer and the document layer, and especially between the UI language and the actual application code - script tags serve their purpose, but again, they are a hacky solution with its own bag of tricks, and those tricks impact all of the software built upon it.
Now, a bit of a shameless plug I've been working on something to fill that gap, at least for myself and hopefully for others who encounter the same issue - it's called Hypen (https://hypen.space) and it's a DSL for building apps that work natively on all platforms, with strict separation of code/UI/state, and support for as many languages and platforms as I can maintain, not "just javascript". While currently it's focused on streaming UI, it's built with Rust and WASM at it's core and will soon allow fully "compileable" apps.
While it may not be the future of software, once you get into building something like that, it becomes obvious that the way we are building now is at least wrong, and at best kafkaesque.
reply