Be careful about how you interpret that paper. It looks really impressive -- real neurons in a petri dish seem to successfully (if amateurishly) murk a few imps.
So there is an entire pytorch stack wrapped around the mysterious little blob of neurons -- they aren't just wired straight into WASD. There is a conventional convnet-based encoder, running on a GPU, in the critical path. The README tries to argue that the "neurons are doing the learning" but to my dilettante, critical eye it really looks as though there is a hell of a lot of learning happening in the convnet also.
Are the neurons learning to play doom, or are they learning to inject ever so slightly more effective noise into the critical path? Would this work just as well if we replaced the neurons with some other non-markovian sludge? The authors do ablation experiments to try to get to the bottom of this but I can't really tell how compelling the results are (due to my own ignorance/stupidity of course)
The whole point of the CNNs is to act like a auto encoder for input and an auto decoder for output. The only reason why this is done in the first place is because the number of electrodes in the dish is pitiful and has no chance of describing something as complex as Doom. They are there to create a latent space that can be fed through 60 odd electrodes and decode the neuron latent space into pressing buttons.
The pong version of the game was the proof of concept that neurons can learn without a latent space intermediate in either direction. Both the world state and neuronal control were raw signals: https://pubmed.ncbi.nlm.nih.gov/36228614/
What I wanted to do after dish brain pong, but never had the budget for, was using live animals as the computational substrate. Use the visual cortex of one as the input, send the neural spikes to a second animals frontal lobe for computation and finally send those signals to a third animals motor cortex to physically press buttons. It's a shame we never raised enough because it wouldn't have cost more than $15m to build the hardware and do the biological proof of concept.
> using live animals as the computational substrate. Use the visual cortex of one as the input, send the neural spikes to a second animals frontal lobe for computation and finally send those signals to a third animals motor cortex to physically press buttons.
It does but most of what we do to animals is terrifying. I could see why getting funding for this idea might not have been that easy though "I want to mind control three animals to play Doom" is certainly a pitch
That is the fallacy of relative privation. The fact that most of what we do to animals is terrifying should be the motivating factor to NOT do more of it, such as the atrocity described above.
> The only reason why this is done in the first place is because the number of electrodes in the dish is pitiful and has no chance of describing something as complex as Doom.
This sounds a bit suspicious though. If we're confident that the neurons aren't complex enough to understand Doom, how can they be said to be complex enough to play it? Playing a game is a loose term but it seems difficult to say that it is playing something that it can't comprehend or interact with. By analogy, if there was a CNN between me and a game of Doom people would say "roenxi is cheating with an AI aim-bot", not 'roenxi is playing Doom".
The whole thing is still pretty cool though. Hopefully the neurons are having fun, I'm sure we all wish them what happiness they can muster.
There isn't enough input electrodes to encode a doom frame into the multi electrode array without compression.
That's all the artificial neural networks are doing.
If we could have gotten an MEA with 320x200 electrodes we wouldn't have used any encoding and just let the neurons figure it out. Instead it is an 8x8 grid.
We've got LLMs that seem to be smarter than anyone I'm talking to day-to-day and one useful model of them is just "compression". Compression is turning out to be a pretty key operation in intelligence and understanding (in fact, it seems to be intelligence and understanding in key ways). If we compress Doom into "shoot" and "the press buttons in the most favourable way to the player" then good compression could let a fair coin play Doom well if someone flips it fast enough.
I mean maybe ANN just means sampling the screen in which case I'm not sure why we're talking about it as a "network". But the type of compression seems critical.
Have I watched any of the videos or read the code? No I have not.
Yes...quite a shame that we never made a amalgamation cyborg horror out of parts and pieces of several different animals. That's definitely not the plot of every sci-fi horror movie.
Great idea, intense pain does provide a stronger response in the neuronal substrate. The prisoners, or uhh, “research subjects”, won’t mind. It’s for science. /s
I would have been quite happy to use my own brain as the computational substrate and I had more than a few other people keen to be the input and output parts of the system.
It's rather unfortunate that in the West it is impossible to get elective brain surgery. The countries that will do it have at best a spotty record. I talked to someone who had it done in Brazil and their electrodes became dislodged after a few months.
I'm totally fine with consensual human experimentation that somehow threads the needle around exploitation of the poor - just not sure how we do the latter part short of requiring experimentees to pass a minimum net-worth threshold?
I think the closest would be: if anyone involved ever complains to authorities at all, everyone involved gets in trouble. If no one ever complains, no trouble. Everyone involved is forever at the mercy of everyone else involved.
Complaining to the authorities needs to come with a cost, otherwise people who don’t believe in the research or are looking for a payday will join just to complain
I consider that a feature in this idea, you have to believe in whatever you want to human experimentation for, enough to select who you include very carefully, and ultimately still take that risk.
Or...at the mercy of a scary man with a big wrench. Every single post you've put in this thread is a volatile mix of idealistic, naive, and sociopathic. So, obviously you'll be a tech CEO in 10 years.
Haha, you made me laugh quite a bit, like ethical due diligence was even a bleep in the mental model of someone who talks like that about sentiment life forms.
Yeah, "nature is brutal, therefore what gives if i raise the bar in the suffering we bring to this world", great logic right there mate, specially when we all know such experiments are without the slightest shred of doubt aimed to end up using humans neurons, because those are the most powerful.
Also worth mentioning in district of Columbia and a few other places is illegal to sell live animals including mice, so there is some effort to do better about our behavior towards other living beings, unlike you.
The fact that they get eaten doesn't justify torturing mice by hooking their brains up to an LLM to make a slightly better climate catastrophe, or making them play doom.
Real talk, torturing animals is a hallmark of sociopaths. You should really seek professional help, and I say this 100% seriously.
>What I wanted to do after dish brain pong, but never had the budget for, was using live animals as the computational substrate. Use the visual cortex of one as the input, send the neural spikes to a second animals frontal lobe for computation and finally send those signals to a third animals motor cortex to physically press buttons. It's a shame we never raised enough
It's amazing that someone would feel comfortable sharing this.
When we can turn off distress and pain in farm animals we would have done more to improve well-being in the world than anyone alive today. Factory farms stop being an efficient evil and become the only moral way to produce meat.
And as a side effect we also get super intelligence on a substrate that is 10 orders of magnitude more energy efficient than silicon.
This is literal supervillian wire-heading type stuff. Poorly thought out ideas of a madman with no regard for the consequences; just vague claims about the definite super-good idea of direct brain state manipulation.
Gosh it's been years, but I think they did the dual animal experiment with rats about a decade ago. I'm likely misremembering but they tickled a rat in Japan and fed the impulses into the internet and had another rat in maybe Brazil move it's tail in response. From what I recall it did potentiate over time, implying learning at the more reflex level. Sorry I can't find the link though!
Reminds me of the ship of theseus philosophical experiment where they replace neurons by logic gates one by one and ask when exactly consciousness stops existing.
This reminds me of https://hackernews.hn/item?id=47897647, where a quantum computing demo worked equally well if you replaced the QC with an entropy source.
So far the wonders of claude/codex have been mostly constrained to applications that are built within the boundary conditions of existing libraries -- the models make direct use of the good work that humans have done to date to build Python, `requests`, `ffmpeg`, you name it.
But I'm excited for the (I think inevitable) stage where the shoggoth starts to reach outside those constraints -- rewriting, patching, renaming, rebuilding libraries, DLLs, binaries -- and we move into a regime where the libraries dissolve, the application floats on top of the shifting sands of an ever more efficient, secure, unified and totally inhuman technology stack.
Obviously this is a horrifying idea in some ways (interpretability, security etc), but it's also not obvious to me that it can't work, especially if there are dedicated, centralized efforts to do this. it's also not clear that interpretability is necessarily mutually exclusive with full slopification/machine rewrite of decades of foundational, incremental development
Journalists are so funny man. "Last week we told you that AI is fake and fraud. But we just learned something fascinating. A few months after everyone else was talking about it, we uncovered an amazing scoop: the companies which raised a lot of capital, are also doing tens of billions of dollars of revenue. So now we're starting to thing it might not all be a scam! Tune in next week for when we say it's all 100% fraudulent again."
Continue to believe that Cerebras is one of the most underrated companies of our time. It's a dinner-plate sized chip. It actually works. It's actually much faster than anything else for real workloads. Amazing
Google is crushing them on inference. By TPUv9, they could be 4x more energy efficient and cheaper overall (even if Nvidia cuts their margins from 75% to 40%).
Cerebras will be substantially better for agentic workflows in terms of speed.
And if you don't care as much about speed and only cost and energy, Google will still crush Nvidia.
And Nvidia won't be cheaper for training new models either. The vast majority of chips will be used for inference by 2028 instead of training anyway.
Nvidia has no manufacturing reliability story. Anyone can buy TSMC's output.
Power is the bottleneck in the US (and everywhere besides China). By TPUv9 - Google is projected to be 4x more energy efficient. It's a no-brainer who you're going with starting with TPUv8 when Google lets you run on-prem.
These are GW scale data centers. You can't just build 4 large-scale nuclear power plants in a year in the US (or anywhere, even China). You can't just build 4 GW solar farms in a year in the US to power your less efficient data center. Maybe you could in China (if the economics were on your side, but they aren't). You sure as hell can't do it anywhere else (maybe India).
What am I missing? I don't understand how Nvidia could've been so far ahead and just let every part of the market slip away.
Which part of the market has slept away, exactly ?
Everything you wrote is supposition and extrapolation. Nvidia has a chokehold on the entire market. All other players still exist in the small pockets that Nvidia doesn’t have enough production capacity to serve.
And their dev ecosystem is still so far ahead of anyone else. Which providers gets chosen to equip a 100k chips data center goes so far beyond the raw chip power.
The nice thing about modern LLMs is that it's a relatively large static use case. The compute is large and expensive enough you can afford to just write custom kernels, to a degree. It's not like CUDA where running on 1, 2, 8 GPUs and you need libraries that already do it all for you, and where researchers are building lots of different models.
There aren't all that many different small components between all of the different transformer based LLMs out there.
Yeah, given that frontier model training has shrunk down to a handful of labs it seems like a very solvable problem to just build the stack directly without CUDA. LLMs are mechanically simple and these labs have access to as much engineering muscle as they need. Pretty small price to pay to access cheaper hardware given that model runs cost on the order of $100M and every lab is paying Nvidia many multiples over that to fill up their new datacenters.
It's "dinner-plate sized" because it's just a full silicon wafer. It's nice to see that wafer-scale integration is now being used for real work but it's been researched for decades.
I'm fascinated by how the economy is catching up to demand for inference. The vast majority of today's capacity comes from silicon that merely happens to be good at inference, and it's clear that there's a lot of room for innovation when you design silicon for inference from the ground up.
With CapEx going crazy, I wonder where costs will stabilize and what OpEx will look like once these initial investments are paid back (or go bust). The common consensus seems to be that there will be a rug pull and frontier model inference costs will spike, but I'm not entirely convinced.
I suspect it largely comes down to how much more efficient custom silicon is compared to GPUs, as well as how accurately the supply chain is able to predict future demand relative to future efficiency gains. To me, it is not at all obvious what will happen. I don't see any reason why a rug pull is any more or less likely than today's supply chain over-estimating tomorrow's capacity needs, and creating a hardware (and maybe energy) surplus in 5-10 years.
If history has taught us anything, “engineered systems” (like mainframes & hyper converged infrastructure) emerge at the start of a new computing paradigm … but long-term, commodity compute wins the game.
Chips and RAM grew in capacity but latency is mostly flat and interconnect power consumption grew a lot. So I think the paradigm changed. Even with newer ones like NVlink.
For 28 years Intel Xeon chips come with massive L2/L3. Nvidia is making bigger chips with last being 2 big chips interconnected. Cerebras saw the pattern and took it to the next level.
And the technology is moving 3D towards stacking layers on the wafer so there is room to grow that way, too.
I think that was true when you could rely on good old Moore’s law to make the heavy iron quickly obsolete but I also think those days are coming to an end
Not for what they are using it for. It is $1m+/chip and they can fit 1 of them in a rack. Rack space in DC's is a premium asset. The density isn't there. AI models need tons of memory (this product annoucement is case in point) and they don't have it, nor do they have a way to get it since they are last in line at the fabs.
Their only chance is an aquihire, but nvidia just spent $20b on groq instead. Dead man walking.
Oh don't worry. Ever since the power issue started developing rack space is no longer at a premium. Or at least, it's no longer the limiting factor. Power is.
The dirty secret is that there is plenty of power. But, it isn't all in one place and it is often stranded in DC's that can't do the density needed for AI compute.
Training models needs everything in one DC, inference doesn't.
I guess it depends what you mean by "perf". If you optimize everything for the absolutely lowest latency given your power budget, your throughput is going to suck - and vice versa. Throughput is ultimately what matters when everything about AI is so clearly power-constrained, latency is a distraction. So TPU-like custom chips are likely the better choice.
I disagree. Yes it does matter, but because the popular interface is via chat, streaming the results of inference feels better to the squishy messy gross human operating the chat, even if it ends up taking longer. You can give all the benchmark results you want, humans aren't robots. They aren't data driven, they have feelings, and they're going to go with what feels better. That isn't true for all uses, but time to first byte is ridiculously important for human-computer interaction.
You just have to change the "popular interface" to something else. Chat is OK for trivia or genuinely time-sensitive questions, everything else goes through via email or some sort of webmail-like interface where requests are submitted and replies come back asynchronously. (This is already how batch APIs work, but they only offer a 50% discount compared to interactive, which is not enough to really make a good case for them - especially not for agentic workloads.)
All 1T models are not equal. E.g. how many active parameters? what's the native quantization? how long is the max context? Also, it's quite likely that some smaller models in common use are even sub-1T. If your model is light enough, the lower throughput doesn't necessarily hurt you all that much and you can enjoy the lightning-fast speed.
Just pick some reasonable values. Also, keep in mind that this hardware must still be useful 3 years from now. What’s going to happen to cerebras in 3 years? What about nvidia? Which one is a safer bet?
On the other hand, competition is good - nvidia can’t have the whole pie forever.
And that's the point - what's "reasonable" depends on the hardware and is far from fixed. Some users here are saying that this model is "blazing fast" but a bit weaker than expected, and one might've guessed as much.
> On the other hand, competition is good - nvidia can’t have the whole pie forever.
Sure, but arguably the closest thing to competition for nVidia is TPUs and future custom ASICs that will likely save a lot on energy used per model inference, while not focusing all that much on being super fast.
Cerebras has effectively 100% yield on these chips. They have an internal structure made by just repeating the same small modular units over and over again. This means they can just fuse off the broken bits without affecting overall function. It's not like it is with a CPU.
I suggest to read their website, they explain pretty well how they manage good yield. Though I’m not an expert in this field. I does make sense and I would be surprised if they were caught lying.
Defects are best measured on a per-wafer basis, not per-chip. So if if your chips are huge and you can only put 4 chips on a wafer, 1 defect can cut your yield by 25%. If they're smaller and you fit 100 chips on a wafer, then 1 defect on the wafer is only cutting yield by 1%. Of course, there's more to this when you start reading about "binning", fusing off cores, etc.
There's plenty of information out there about how CPU manufacturing works, why defects happen, and how they're handled. Suffice to say, the comment makes perfect sense.
That's why you typically fuse off defective sub-units and just have a slightly slower chip. GPU and CPU manufacturers have done this for at least 15 years now, that I'm aware of.
Sure it does. If it’s many small dies on a wafer, then imperfections don’t ruin the entire batch; you just bin those components. If the entire wafer is a single die, you have much less tolerance for errors.
They can be made from large wafers. A defect typically breaks whatever chip it's on, so one defect on a large wafer filled with many small chips will still just break one chip of the many on the wafer. If your chips are bigger, one defect still takes out a chip, but now you've lost more of the wafer area because the chip is bigger. So you get a super-linear scaling of loss from defects as the chips get bigger.
With careful design, you can tolerate some defects. A multi-core CPU might have the ability to disable a core that's affected by a defect, and then it can be sold as a different SKU with a lower core count. Cerebras uses an extreme version of this, where the wafer is divided up into about a million cores, and a routing system that can bypass defective cores.
There’s an expected amount of defects per wafer. If a chip has a defect, then it is lost (simplification). A wafer with 100 chips may lose 10 to defects, giving a yield of 90%. The same wafer but with 1000 smaller chips would still have lost only 10 of them, giving 99% yield.
As another comment referenced in this thread states, Cerebras seems to have solved by making their big chip a lot of much smaller cores that can be disposed of if they have errors.
Indeed, the original comment you replied to actually made no sense in this case. But there seemed to be some confusion in the thread, so I tried to clear that up. I hope I’ll get to talk with one of the cerebras engineers one day, that chip is really one of a kind.
Technically, Cerebras solution is really cool. However, I am skeptical that it will be economically useful for models that are larger in size, as the requirements on the number of racks scales with the the size of the model to fit the weights in SRAM.
At this point Tech investment and analysis is so divorced from any kind of reality that it's more akin to lemmings on the cliff than careful analysis of fundamentals
Cerebras is a bit of a stunt like "datacenters in spaaaaace".
Terrible yield: one defect can ruin a whole wafer instead of just a chip region. Poor perf./cost (see above). Difficult to program. Little space for RAM.
I do like XYplorer as well and have a license for it too, but its startup time is just so slooooow that I can't reach for it like I reach for File Pilot.
I love this… have been thinking about exactly this technology for years but combined with phased array directional loudspeaker and shotgun mic. Deploy during major political speech, instantly shut down brain of speaker, would appear to be an internal malfunction
Google Maps is the mind killer. We all worry about social media controlling the way we think, feel, vote etc. but Google Maps literally manipulates where people physically go in real life, what they do on holiday, where they hang out, what they eat etc. I got so sick of feeling like a four point five star Google Maps automaton I had to mostly stop with it. In addition to OSM, personal recommendations etc. the best substitute for me for a 4.5 star review is my nose, eyes and ears
https://www.youtube.com/watch?v=yRV8fSw6HaE
But there's more to the setup than you might assume from a casual reading. Here's the code used for that demo:
https://github.com/SeanCole02/doom-neuron
So there is an entire pytorch stack wrapped around the mysterious little blob of neurons -- they aren't just wired straight into WASD. There is a conventional convnet-based encoder, running on a GPU, in the critical path. The README tries to argue that the "neurons are doing the learning" but to my dilettante, critical eye it really looks as though there is a hell of a lot of learning happening in the convnet also.
Are the neurons learning to play doom, or are they learning to inject ever so slightly more effective noise into the critical path? Would this work just as well if we replaced the neurons with some other non-markovian sludge? The authors do ablation experiments to try to get to the bottom of this but I can't really tell how compelling the results are (due to my own ignorance/stupidity of course)
reply