HN2new | past | comments | ask | show | jobs | submit | varunneal's commentslogin

Highly AI generated article, including the whiteboard image.


Muon was invented by Keller Jordan (and then optimized by others) for the sake of this speedrunning competition. Even though it was invented less than a year ago, it has already been widely adopted as SOTA for model training


This is the common belief but not quite correct! The Muon update was proposed by Bernstein as the result of a theoretical paper suggesting concrete realizations of the theory, and Keller implemented it and added practical things to get it to work well (input/output AdamW, aggressive coefficients, post-Nesterov, etc).

Both share equal credit I feel (also, the paper's co-authors!), both put in a lot of hard work for it, though I tend to bring up Bernstein since he tends to be pretty quiet about it himself.

(Source: am experienced speedrunner who's been in these circles for a decent amount of time)


I think it's good to bring up Bernstein & Newhouse as well as Yuchen Jin, Jiacheng You and the other speedrunners who helped iterate on Muon. But I think it's very fair to call Keller Jordan the main author of Muon of its current form. I'm also in the speedrunning community though maybe not as long as you have


sharing some useful resrources for learning Muon (since I'm also just catching up on it)

- https://x.com/leloykun/status/1846842883967692926

- https://www.yacinemahdid.com/p/muon-optimizer-explained-to-a...


This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

https://www.youtube.com/watch?v=bO5nvE289ec

I found the above video as a good introduction.


The most exciting thing about Muon for me is that it requires half the state of Adam while having either equivalent or better performance. That's amazing if you are VRAM limited! And just like Adam, you can also quantize it. I can get it to work relatively well as low as 4-bit, which essentially cuts down the memory requirements from full 32-bit Adam by a factor of 16x! (And by a factor of 4x vs 8-bit Adam).


I haven't heard of this before. Has Muon dethroned Adam and AdamW as the standard general purpose optimizer for deep learning?


It's for hidden layers and not for every parameter: From Keller's Muon github page:

"Muon is an optimizer for the hidden weights of a neural network. Other parameters, such as embeddings, classifier heads, and hidden gains/biases should be optimized using standard AdamW."

And I just looked into this nanochat repo and it's also how it's used here.

https://github.com/karpathy/nanochat/blob/dd6ff9a1cc23b38ce6...


I think this is a different construction. "And" in "try and find out" is acting as a conjunction, separating two different actions


this guide is just for an example of how to build a single mcp (e.g. a vector store). chatgpt connectors implement mcps in general now


I don’t believe this is the case. Do you have a link for that?

Cause from TFA “To work with ChatGPT Connectors or deep research (in ChatGPT or via API), your MCP server must implement two tools - search and fetch.”

Also, this page is actually the only docs site about MCP they have, and their help articles link to it too.


claude sonnet typically forgets about uv script syntax in my experience. I usually find myself having to paste in the docs every time. By default it wants to use uv project syntax.


Not publicly, but a few people in berkeley are working on it. Here is a paper from last year: https://imjal.github.io/theory-of-tetrachromacy. (Disclaimer: i am on this paper).

They've prototyped displays that can test for it as well.


I remember watching a video some years back where the researcher thought he had developed such a test.

As I recall (it's been many years; likely over a decade since I saw it) he tested it with a woman who was believed to have tetrachromatic vision. She could reliably tell the difference.

As a control, he tested it with a man who was trained as a graphic artist.

He too could reliably tell the difference.

That result strongly implied the test did not work as expected.

Do you know anything about this previous work? I tried reading the paper but was immediately out of my depth.


This is so cool. For your figures, how did you decide the RGB colors of the 4D colorspace? Or did you convince ACM to print your paper with special inks? :)


Definitely not the latter as the paper mentions "The digits are faintly visible in this photograph, because the camera’s color response differs from a human’s."


Have you tried o3 on those problems? I've found o3 to be much more impressive than Opus 4 for all of my use cases.


To be honest, I haven't, because the "This model is extremely expensive" popup on Cursor makes me a bit anxious - but given the accolades here I'll have to give it a shot.


isn't this survivorship bias? e.g. people who genuinely feel highly nihilistic, that there is no order, structure, meaning, etc. are very unlikely to be successful--and also unlikely to continue choosing to be alive


I don't see why believing that life has no inherent meaning would lead to not wanting to be alive. I think this is all the result of random cosmic accident yet I'm having plenty of fun.

Kurt Vonnegut said it best: “We are here on Earth to fart around, and don't let anybody tell you any different.”


Just one more thing you teleologists tell yourselves. I'm alive and successful, I just don't delude myself about the universe giving a shit about it.

It may be that people need to believe nonsense about the cosmos in order to "maximize productivity" but I do not think that is the case.


I see two different assertions

1. The universe doesn't care about you

2. Life has no inherent meaning

Do you mean to conflate these two? Do you find them merely agreeable, or do these propositions depend on each other?


I think they are both true and closely related. Typically and colloquially, when people talk about meaning they are talking about some state of affairs about what is good or bad with respect to the universe (if the universe includes things like God, a world of forms, ideas of perfection, etc).

I think its very reasonable to believe that the universe does not have any of those properties and that life is random and has no inherent or universal meaning.

I guess there could be some kind of subjective meaning but I don't really see the utility of that idea.


In this particular case you would only have to push back the lack of meaning to the ~multiverse or whatever a sequence/family of child universes would be called.

I don't think Tegmark <IV had any simple parameters for goodness or meaning, and neither does logic or mathematics. We assemble our meanings out of more fundamental relationships but I actually think they concretely exist in a real way as real as the matter in this universe, but more in the way that galaxies and other complex structures exist. Meaning is a property of complex self-reflective systems and so inherent meaning will probably always be tied inexorably to context and environment, or in our case meaning is tied specifically to our human nature.

E.g. I will find it fascinating if universes do evolve from progenitor universes and therefore the guiding selection pressure is "make more black holes/universes" but that isn't the same thing as the human concept of "good" since our nature isn't aligned with entire (families of) universes.


But, speaking precisely, there is no human nature at all. You and I have nothing fundamentally in common except that our atoms happen to be organized in a similar way. We have no nature in common except as a coincidence.

It is a coincidence that delights me and I happen to feel quite a lot bonhomie for my fellow human beings and lifeforms, but I don't see how it makes life meaningful in any universal sense.


Do you consider the relationship of two molecules of water to be similarly coincidental or along a continuum from e.g. the nature of two elections all the way to how two universes might be similar? I figure fundamental particle nature is less coincidental than human nature, which is correspondingly less coincidentally related than two heterogenous dust clouds.


I don't see any reason to have a strong belief about why any fundamental constants are what they are. This is so far beyond what even our best physics can say anything meaningful about that I feel an obligation to studiously have no opinion about it.

I will say that I see no compelling reason to believe that the values of fundamental constants are NOT just random.


That doesn't make you nihilistic, more of an absurdist.


Eh, potato potato.


How can reason give us shared values across cultures? Why do you suppose your moral clarity was generated from your brain and not your gut?


That's what the field of philosophy is about. I think, for instance, utilitarianism makes a lot more sense than "follow whatever your birth community historically does."


I dunno. Utilitarianism sounds nice on the surface—how can you be against the greatest good for the greatest number?—but it’s pretty under-specified (hedonic or preference? act or rule? do you discount future beings’ utils, and at what rate?) and if you take any particular specification seriously you get moral claims that are wildly counterintuitive, like “insect suffering is orders of magnitude more important than heart disease in humans” or “there may be quadrillions of sentient beings in the far future, and making their lives 1% better is a better use of resources than eradicating malaria now” or “it’s morally justified to steal billions of dollars of other people’s money to give to pandemic prevention and AI safety.” And maybe these are correct claims, but they definitely don’t align with many people’s moral intuitions, and it’d be a tall task to convince those people.


How did you come to this hedonic position?

MacIntyre wrote in response to the failures of utilitarianism and deontology, and certain responses to that, so you'd better have arguments for utilitarianism that top those he knew about.

You've also clearly misread him. His argument is that morality is inherently social and good morals necessarily dependent on a good society. To show how deeply connected morals and society are he opts for the use of descriptions of historical societies, because superstition and fantasy alone, like math or thought experiments, just aren't good enough for him. In this he agrees with Marx and disagrees with parts of the analytic tradition in philosophy.

Specifically in After Virtue he also uses such examples to show that the ethics of a society might carry little moral weight, and that some historical societies were better at understanding and teaching morals than his own, in particular the philosophical ethics of the Enlightenment, i.e. deontology and utilitarianism.

If you actually have an argument for why the napkin math morals and disregard of freedom at the center of utilitarianism would be the pinnacle of human ethics I'd really like to hear it.


Why is o3-mini there but not o3?


We should definitely add o3 - probably will soon. Also looking at testing the Qwen models


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: