I think these articles may benefit from a more thorough table of content at the beginning, or from some kind of abstract. If you briefly presented the whole list of topics in a single article, it would be more clear that your views on the topic are more complete. I initially thought the table of content would be scoped to the article itself rather than connecting it to the adjacent ones.
I had never heard of you, and this article appeared very biased to me. I found the information ecology piece superior, shame that it went unnoticed; I will try to go through all of them. I admire the breadth of topics you’re covering and appreciate the many sources. They’re clearly written in your own voice and that is great to see, I guess I mostly reacted to not being fully aligned with your view.
The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets. This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!".
This is obviously unrealistic. The cat is out of the bag. And we're not _actually_ talking about nuclear weapons here. This technology is useful, and coding agents are just the first example of it. I can easily see a near future where everyone has a Jarvis-like secretary always available; it's only a cost and harness problem. And since this vision is very clear to most who have spent enough time with the latest agents, millions of people across the globe are trying to work towards this.
I do think that safety is important. I'm particularly concerned about vulnerable people and sycophantic behavior. But I think it's better not to be a luddite. I will give a positively biased view because the article already presents a strongly negative stance. Two remarks:
> Alignment is a Joke
True, but for a different reason. Modern LLMs clearly don't have a strong sense of direction or intrinsic goals. That's perfect for what we need to do with them! But when a group of people aligns one to their own interest, they may imprint a stance which other groups may not like (which this article confusingly calls "unaligned model", even though it's perfectly aligned with its creators' intent). People unaligned with your values have always existed and will always exist. This is just another tool they can use. If they're truly against you, they'll develop it whether you want it or not. I guess I'm in the camp of people that have decided that those harmful capabilities are inevitable, as the article directly addresses.
> LLMs change the cost balance for malicious attackers, enabling new scales of sophisticated, targeted security attacks, fraud, and harassment. Models can produce text and imagery that is difficult for humans to bear; I expect an increased burden to fall on moderators.
What about the new scales of sophisticated defenses that they will enable? And for a simple solution to avoid the produced text and imagery: don't go online so much? We already all sort of agree that social media is bad for society. If we make it completely unusable, I think we will all have to gain for it. If digital stops having any value, perhaps we'll finally go back to valuing local communities and offline hobbies for children. What if this is our wakeup call?
> This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!".
I think the point was never to bring a solution or show any essence of reality. The point was being polemical and signalling savviness through cynicism.
Which LLMisms are you seeing in their post? Their grammar, word choice, thought flow, and markings all denote a fully human authorship to me, so confidently that I would say they likely didn't even consult an LLM.
lol. I did use a lot of short sentences, that’s my bad. But please read through [1] and compare my text onto it, it may enlighten you on how to actually spot llm writing.
For the future, try to avoid prevaricating when you actually have a clear sense of what you want to argue. Instead of convincing me that you've weighed both options and found luddism wanting, you just come off as dishonest. If you think stridently, write stridently.
I’m not a native speaker and you may find my writing simplistic if your standard vocabulary includes three expressions I’ve had to look up (I don’t mean this as an insult, I was just genuinely stumped I could barely understand your comment).
I may think stridently (debatable) but I generally believe it is best to always try to meet in the middle if the goal is genuine discussion. This is my attempt at that.
But meeting in the middle only works if you honestly believe the middle is a valuable place to be. I don't want to dissect your writing too much, but let's look at one example.
> The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets.
This is very confident, strident language. You clearly believe that there is a faction of people demonizing technology, akin to luddites, who are not worthy of being taken seriously.
> This one raises a lot of important points about LLMs, but...
So here you go for the rhetorical device of weighing the opposing view. Except, you don't weight it at all. You are not at all specific about what those points are. It's just a way to signal that you're being thoughtful without having to actually engage with the opposing viewpoint.
> I do think that safety is important... But I think it's better not to be a luddite.
Again, the rhetoric of moderation but not at all moderate in content.
It was a clear mistake to think that this was LLM writing. But I suspect the reason I made this mistake is that AI writing influences people to mimic surface level aspects of its style. AI writing tends to actually do the "You might say A is true, but B has some valid points, however A is ultimately correct." Your writing seems like that if you aren't reading it closely, but underneath that is a very human self-assuredness with a thin veneer of charitability.
Some more serious critique of things I noticed within 30 seconds:
- Text isn't selectable on the page.
- The tooltip in the "day 1" to "day 14" cards gets cut off by the border (I see this mistake ALL the time with AI-generated frontends btw)
- It's sparse and very long. I think the information could be condensed in half the size, and it would improve the presentation. This is personal preference though.
- The playbooks' "mark complete" are not persisted on reload or navigation.
All in all, it's functional and quite decent. I agree with the other people saying it looks generic, but I disagree on it being necessarily a bad thing for this kind of product.
I know nothing about pools so I can't comment on the accuracy of the playbooks. It's nice that there's so many of them, but given the LLM vibe of the text I'm slightly suspicious.
This is a pipe dream and I’m almost tempted to say a fever dream. The chemistry part seems somewhat sound, even though that’s outside of my field of expertise. But the entire readout process is questionable, and has clear signs of heavy AI writing.
The AFM mechanism described as “tier 1” (very strong LLMism, btw) is somewhat optimistic but realistic. The fields needed are large compared to usual values in solid state devices, but I’d guess achievable with an AFM. But “tier 2” is vague and completely speculative. Some random things I noted:
- handwaving that (not exact quote) “the read controller is cached. No need to read the same bit twice”. Cached with what?? If this miraculous technology can achieve 25 PB/s, what can possibly hope to cache it? More generally, it’s a strange thing to point out.
- some magic and completely handwaved MEMS array that converts an 8um spot size laser beam into atomic-resolution 2D addressing? In my opinion this is the biggest sin of the manuscript. What I understood to be depicted is just fundamentally physically impossible.
- a general misunderstanding of integrated electronics, and dishonest benchmarking, comparing real memory technologies being sold at scale right now, vs theoretical physical bounds on an untested idea. Also no mention of existing magnetic tape as far as I can tell.
- constantly pulling out specific numbers or estimates with no citation and insufficient justification. Too many examples to even count.
I’m sorry for the harsh language, I wouldn’t use it for a usual review. But in my opinion this needs a very heavy toning down and complete rewrite, and is unfit for a proper review. Final remark: electronics is, and will always fundamentally be, intrinsically denser than optics. Some techniques “described” here, if they were possible, would have been applied to existing optical tech (i.e. phase change materials in blue-ray).
Yes, this paper is insane. The actual quote about caching is:
> Once a region of tape has been read, the controller stores the
result. Subsequent operations reference the cache rather than re-interrogating the physical
medium. Re-reading a known bit is unnecessary; the controller already holds its state
However, earlier, the paper claims:
> The transformer architectures underpin-
ning modern large language models are bandwidth-limited, not compute-limited [1–3]. The
energy consumed moving data between DRAM, NAND flash, and processor cache already
exceeds the energy consumed by arithmetic in datacenter AI accelerators [2]. This is not an
optimization problem. It is a materials problem [emphasis mine].
as part of a longer rant about the AI "memory wall" in the very first section. If we open with a long spiel about how memory is expensive in material cost and energy cost and this material is a solution for that then what are we caching the read in? On that note, what kind of computer engineer thinks about cache on the order of individual bits on a medium?
And, as you point out, 25 PB/s is a lot. Around 1000x that of a typical on-die SRAM cache, I think.
A while later, the author speaks of using atomic force microscopy to read the data back. The size of AFM scans are, in practice, as I understand, along the order of square micrometers. I think this whole paper is an AI-driven, as you put it, 'fever dream', enabling an author to put forth 60 pages of sciencey claims and sciencey math without -- as far as I can tell -- any concrete and novel scientific result of any kind. AI-driven reality warps are not new; the difference is nowdays AIs are good enough at sounding smart to get past the barriers of a typical smart person who might want to be fooled or make a show of being open-minded. Later on, the author proposes using "shaped femtosecond IR pulses" -- without further elaboration -- to address single atoms! IR wavelengths are on the order of a micrometer at minimum!
The caching comment refers to the Tier 1 controller holding a bitmap of bits it has already scanned — standard practice in any scanning probe system. It's not competing with the storage medium for capacity.
Tier 2 is explicitly labeled speculative. The paper's validation target is Tier 1: one C-AFM scan, one voltage pulse, existing equipment.
The core contribution is not the architecture — it's the physics: a verified transition state for C-F pyramidal inversion at 4.6 eV (B3LYP) and 4.8 eV (CCSD(T)), one imaginary frequency, barrier below bond dissociation. That's standard computational chemistry, not handwaving. The architecture sections are forward-looking by design.
The fluorine passes between two carbon neighbors through a C-C gap of 2.64 Å at the transition state — not through any atom. This is pyramidal inversion, the same mechanism as ammonia, but with a 4.6 eV barrier instead of 0.25 eV.
Dude, you _have_ to write things in your own words if you want to be taken seriously. "The <x> is not <y> — it's <z>" will cause a bunch of people to disengage, and those people have high overlap with the people who may fund you.
"Dude, you _have_ to write things in your own words if you want to be taken seriously."
How is this lost on people? Everything that contains the slightest hint of "AI slop" is instantly panned anywhere it appears, and yet people such as Ilia Toli appear to be entirely oblivious to this.
It's tragic. There is at least a non-zero chance that this work is a world changing breakthrough. It's clear, based on his engagement with comments here, that he at least believes this. And yet the first thing the guy does with it is debase it all using a clanker.
It boggles the mind.
We're seeing this throughout academe, in courts with both lawyers and judges, and among lawmakers and journalists. Several times a week one or another of these makes another headline for misapplying "AI". It seems that the work for which we are all expected to have the highest regard is coming from people that are completely witless; both unaware of how transparent this is and unaware of the consequences.
You have to be deeply ensconced inside an impenetrable bubble to do that to yourself.
Yeah, I get that it can be amazing and be of superhuman intelligence and all that, but also it reads exactly like the slop article I saw yesterday that was giving baking instructions for “wood biscuits” (which are a method of joining in cabinetry and are not tasty at all): https://thehoneypotbakery.com/wood-biscuit-size-chart/
Do not match your communication style to nonsense articles.
> You have to be deeply ensconced inside an impenetrable bubble to do that to yourself.
I largely agree with your point, but I’m afraid you are the one in the bubble. Detecting AI writing is a rare skill, not the norm. It’s glaringly obvious to those of us who use AI a lot, but it’s not that obvious to the average person.
To the point of absurdity in cases – I’ve seen loads of people who hate AI complain about AI online, not realising that the account they are talking to is nothing but a simple spam bot.
Replying to myself, because iliatoli's reply to me was [dead] so fast I couldn't reply to it directly...
"The physics is mine — thirteen years of it, starting from the 2013 paper. I use AI for editing, as I use a calculator for arithmetic. The transition state, the barrier, the molecular model, the fluorine uniqueness argument — all computed on my workstation. The tone criticism is heard and will be addressed in revision. The calculations don't change with the prose."
This is NOT about "prose." You're missing the point. Badly. And damn that's frustrating.
Read carefully and inculcate: Do not use LLM to write anything you expect to be taken seriously. This is not negotiable. It doesn't matter if all your peers and colleagues are doing exactly that. It doesn't matter that this is your first experience with such a reaction: it's not a fluke. DO. NOT. DO. IT.
I believe what it destroyed was the strut holding the booster to the tank. When the strut burned through the assembly came apart and aerodynamic forces did the remainder of the destruction.
This is underselling the risks. On top of the many trajectories which push them into unrecoverable situations, leaving them stranded in orbit, there can be trajectories where the moon gives a gravity assist strong enough to fling the spacecraft into escape velocity, fulfilling the OP.
In fact, the trajectory they chose for this mission exploited the opposite effect to yield a free return without propellant expense.
In the modern day, the chance of a math error being the root cause behind this failure mode are vanishingly small, but minor burn execution mistakes that do not require hundreds of extra pounds of propellant are definitely plausible. They were extremely common in the early days of spaceflight and plagued most of the very first moon exploration attempts. Again, with modern RCS this is unlikely. But reentry is still incredibly tight and dangerous. Apollo famously had a +-1° safe entry corridor, and Orion is way heavier and coming in even faster. If their perigee was off they could’ve easily burned up or doubled their mission time, which they may not have been able to survive.
The amount of things that would have to go wrong for the craft to get an accidental gravity boost and be ejected would be significant.
I feel like the original claim paints the whole thing as on a knife edge and barely achieved by virtue of not making a single mistake. In today's age with so many moon landing deniers and worse I feel like we should be specific about where the actual dangers challenges and unknowns there were here. In reality, the orbital mechanics are one of the simplest parts of the entire problem, at least when we're talking about a moon flyby
Yes, this is a fair point. I agree that orbital mechanics is trivially easy compared to everything else. The chances of a math mistake in particular are null, these trajectories have all been calculated years in advance.
The moon's gravity turns out to be "lumpy" because its density is not constant. This was detected by the Apollo missions and caused them to make errors in orbit calculations. This source of error could have influenced the flyby.
Isn’t this what AGI is by design? People CAN learn to become good at videogames. Modern LLMs can’t, they have to be retrained from scratch (I consider pre-training to be a completely different process than learning). I also don’t necessarily agree that a grandma would fail. Give her enough motivation and a couple days and she’ll manage these.
My main criticism would be that it doesn’t seem like this test allows online learning, which is what humans do (over the scale of days to years). So in practice it may still collapse to what you point out, but not because the task is unsuited to showing AGI.
What I'm saying is that this test is just another "out-of-distribution task" for an LLM. And it will be solved using the exact same methods we always use: it will end up in the pre-training data, and LLMs will crush it.
This has absolutely nothing to do with AGI. Once they beat these tests, new ones will pop up. They'll beat those, and people will invent the next batch.
The way I see it, the true formula for AGI is: [Brain] + [External Sensors] (World Receptors) + [Internal State Sensors] + [Survival Function] + [Memory].
I won't dive too deep into how each of these components has its own distinct traits and is deeply intertwined with the others (especially the survival function and memory). But on a fundamental level, my point is that we are not going to squeeze AGI out of LLMs just by throwing more tests and training cycles at them.
These current benchmarks aren't bringing us any closer to AGI. They merely prove that we've found a new layer of tasks that we simply haven't figured out how to train LLMs on yet.
P.S. A 2-year-old child is already an AGI in terms of its functional makeup and internal interaction architecture, even though they are far less equipped for survival than a kitten. The path to AGI isn't just endless task training—it's a shift toward a fundamentally different decision-making architecture.
good post, but I disagree Surival Function is needed for AGI. Why do you think Survival Function is needed?
The item I think you should add is a Mesolimbic System (Reward / Motivation). I think AGI needs motivation to direct its learning and tasks.
Also, I don't think the industry has just been training LLMs with more data to get advancement the last 2 years. RAG / Agents loops / skills / context mgmt are all just early forms a Memory system. An LLM with an updatable working set memory is a lot more capable than just an LLM.
Kids develop video game skills, grandmothers do not. Hypothetically grandmothers develop baking skills, that kids do not (perfectly golden brown cookies). A human intelligence is generally capable of developing video game skills or baking skills, given enough motivation and experience to hone those skills. One test of AGI is if the same system can develop video game skills and baking skills, without having to rebuild the core models... this would demonstrate generalized intelligence.
Disagree on the last statement. Makie is tremendously superior to matplotlib. I love ggplot but it is slow, as all of R is. And my work isn’t so heavy on statistics anyway.
Makie has the best API I’ve seen (mostly matlab / matplotlib inspired), the easiest layout engine, the best system for live interactive plots (Observables are amazing), and the best performance for large data and exploration. It’s just a phenomenal visualization library for anything I do. I suggest everyone to give it a try.
Matlab is the only one that comes close, but it has its own pros and cons. I could write about the topic in detail, as I’ve spent a lot of time trying almost everything that exists across the major languages.
I love Makie but for investigating our datasets Python is overall superior (I am not familiar enough with R), despite Julia having the superior Array Syntax and Makie having the better API. This is simply because of the brilliant library support available in scikit learn and the whole compilation overhead/TTFX issue. For these workflows it's a huge issue that restarting your interactive session takes minutes instead of seconds.
I recently used Makie to create an interactive tool for inspecting nodes of a search graph (dragging, hiding, expanding edges, custom graph layout), with floating windows of data and buttons. Yes, it's great for interactive plots (you can keep using the REPL to manipulate the plot, no freezing), yes Observables and GridLayout are great, and I was very impressed with Makie's plotting abilities from making the basics easy to the extremely advanced, but no, it was the wrong tool. Makie doesn't really do floating windows (subplots), and I had to jump through hoops to create my own float system which uses GridLayout for the GUI widgets inside them. I did get it to all work nearly flawlessly in the end, but I should probably have used a Julia imGUI wrapper instead: near instant start time!
Yes. And I did port my GUI layer to CimGui.jl. The rest of it is pretty intertwined with Makie, didn't do that yet. The Makie version does look better than ImGui though.
I tried some Julia plotting libraries a few years ago and they had apis that were bad for interactively creating plots as well as often being buggy. I don’t have performance problems with ggplot so that’s what I tend to lean to. Matplotlib being bad isn’t much of a problem anymore as LLMs can translate from ggplot to matplotlib for you.
Some quick napkin math: AI energy usage for a chat like that in the post (estimated ~100 Wh) is comparable to driving ~100m in the average car, making 1 of toast, or bring 1 liter of water to boiling.
I’d wager the average American eats more than 20 dollars/month of meat overall, but let’s say they spend as much as an OpenAI subscription on beef. If you truly believe in free markets, then they have the same environmental impact. But which one has more externalities? Many supply chain analyses have been done, which you can look up. As one might expect, numbers don’t look good for beef.
I had never heard of you, and this article appeared very biased to me. I found the information ecology piece superior, shame that it went unnoticed; I will try to go through all of them. I admire the breadth of topics you’re covering and appreciate the many sources. They’re clearly written in your own voice and that is great to see, I guess I mostly reacted to not being fully aligned with your view.
reply