Muon was invented by Keller Jordan (and then optimized by others) for the sake of this speedrunning competition. Even though it was invented less than a year ago, it has already been widely adopted as SOTA for model training
This is the common belief but not quite correct! The Muon update was proposed by Bernstein as the result of a theoretical paper suggesting concrete realizations of the theory, and Keller implemented it and added practical things to get it to work well (input/output AdamW, aggressive coefficients, post-Nesterov, etc).
Both share equal credit I feel (also, the paper's co-authors!), both put in a lot of hard work for it, though I tend to bring up Bernstein since he tends to be pretty quiet about it himself.
(Source: am experienced speedrunner who's been in these circles for a decent amount of time)
I think it's good to bring up Bernstein & Newhouse as well as Yuchen Jin, Jiacheng You and the other speedrunners who helped iterate on Muon. But I think it's very fair to call Keller Jordan the main author of Muon of its current form. I'm also in the speedrunning community though maybe not as long as you have
The most exciting thing about Muon for me is that it requires half the state of Adam while having either equivalent or better performance. That's amazing if you are VRAM limited! And just like Adam, you can also quantize it. I can get it to work relatively well as low as 4-bit, which essentially cuts down the memory requirements from full 32-bit Adam by a factor of 16x! (And by a factor of 4x vs 8-bit Adam).
It's for hidden layers and not for every parameter:
From Keller's Muon github page:
"Muon is an optimizer for the hidden weights of a neural network. Other parameters, such as embeddings, classifier heads, and hidden gains/biases should be optimized using standard AdamW."
And I just looked into this nanochat repo and it's also how it's used here.
I don’t believe this is the case. Do you have a link for that?
Cause from TFA “To work with ChatGPT Connectors or deep research (in ChatGPT or via API), your MCP server must implement two tools - search and fetch.”
Also, this page is actually the only docs site about MCP they have, and their help articles link to it too.
claude sonnet typically forgets about uv script syntax in my experience. I usually find myself having to paste in the docs every time. By default it wants to use uv project syntax.
I remember watching a video some years back where the researcher thought he had developed such a test.
As I recall (it's been many years; likely over a decade since I saw it) he tested it with a woman who was believed to have tetrachromatic vision. She could reliably tell the difference.
As a control, he tested it with a man who was trained as a graphic artist.
He too could reliably tell the difference.
That result strongly implied the test did not work as expected.
Do you know anything about this previous work? I tried reading the paper but was immediately out of my depth.
This is so cool. For your figures, how did you decide the RGB colors of the 4D colorspace? Or did you convince ACM to print your paper with special inks? :)
Definitely not the latter as the paper mentions "The digits are faintly visible in this photograph, because the camera’s color response differs from a human’s."
To be honest, I haven't, because the "This model is extremely expensive" popup on Cursor makes me a bit anxious - but given the accolades here I'll have to give it a shot.
isn't this survivorship bias? e.g. people who genuinely feel highly nihilistic, that there is no order, structure, meaning, etc. are very unlikely to be successful--and also unlikely to continue choosing to be alive
I don't see why believing that life has no inherent meaning would lead to not wanting to be alive. I think this is all the result of random cosmic accident yet I'm having plenty of fun.
Kurt Vonnegut said it best: “We are here on Earth to fart around, and don't let anybody tell you any different.”
I think they are both true and closely related. Typically and colloquially, when people talk about meaning they are talking about some state of affairs about what is good or bad with respect to the universe (if the universe includes things like God, a world of forms, ideas of perfection, etc).
I think its very reasonable to believe that the universe does not have any of those properties and that life is random and has no inherent or universal meaning.
I guess there could be some kind of subjective meaning but I don't really see the utility of that idea.
In this particular case you would only have to push back the lack of meaning to the ~multiverse or whatever a sequence/family of child universes would be called.
I don't think Tegmark <IV had any simple parameters for goodness or meaning, and neither does logic or mathematics. We assemble our meanings out of more fundamental relationships but I actually think they concretely exist in a real way as real as the matter in this universe, but more in the way that galaxies and other complex structures exist. Meaning is a property of complex self-reflective systems and so inherent meaning will probably always be tied inexorably to context and environment, or in our case meaning is tied specifically to our human nature.
E.g. I will find it fascinating if universes do evolve from progenitor universes and therefore the guiding selection pressure is "make more black holes/universes" but that isn't the same thing as the human concept of "good" since our nature isn't aligned with entire (families of) universes.
But, speaking precisely, there is no human nature at all. You and I have nothing fundamentally in common except that our atoms happen to be organized in a similar way. We have no nature in common except as a coincidence.
It is a coincidence that delights me and I happen to feel quite a lot bonhomie for my fellow human beings and lifeforms, but I don't see how it makes life meaningful in any universal sense.
Do you consider the relationship of two molecules of water to be similarly coincidental or along a continuum from e.g. the nature of two elections all the way to how two universes might be similar? I figure fundamental particle nature is less coincidental than human nature, which is correspondingly less coincidentally related than two heterogenous dust clouds.
I don't see any reason to have a strong belief about why any fundamental constants are what they are. This is so far beyond what even our best physics can say anything meaningful about that I feel an obligation to studiously have no opinion about it.
I will say that I see no compelling reason to believe that the values of fundamental constants are NOT just random.
That's what the field of philosophy is about. I think, for instance, utilitarianism makes a lot more sense than "follow whatever your birth community historically does."
I dunno. Utilitarianism sounds nice on the surface—how can you be against the greatest good for the greatest number?—but it’s pretty under-specified (hedonic or preference? act or rule? do you discount future beings’ utils, and at what rate?) and if you take any particular specification seriously you get moral claims that are wildly counterintuitive, like “insect suffering is orders of magnitude more important than heart disease in humans” or “there may be quadrillions of sentient beings in the far future, and making their lives 1% better is a better use of resources than eradicating malaria now” or “it’s morally justified to steal billions of dollars of other people’s money to give to pandemic prevention and AI safety.” And maybe these are correct claims, but they definitely don’t align with many people’s moral intuitions, and it’d be a tall task to convince those people.
MacIntyre wrote in response to the failures of utilitarianism and deontology, and certain responses to that, so you'd better have arguments for utilitarianism that top those he knew about.
You've also clearly misread him. His argument is that morality is inherently social and good morals necessarily dependent on a good society. To show how deeply connected morals and society are he opts for the use of descriptions of historical societies, because superstition and fantasy alone, like math or thought experiments, just aren't good enough for him. In this he agrees with Marx and disagrees with parts of the analytic tradition in philosophy.
Specifically in After Virtue he also uses such examples to show that the ethics of a society might carry little moral weight, and that some historical societies were better at understanding and teaching morals than his own, in particular the philosophical ethics of the Enlightenment, i.e. deontology and utilitarianism.
If you actually have an argument for why the napkin math morals and disregard of freedom at the center of utilitarianism would be the pinnacle of human ethics I'd really like to hear it.