Hacker News .hnnew | past | comments | ask | show | jobs | submit | marcel-c13's commentslogin

Dude now I want a hamburger :(


Do it. I recommend a flame grilled angus with blue cheese.


But isn't the max training efficiency naturally tied to the architecture? Meaning other architecture have another training efficiency landscape? I've said it somewhere else: It is not about "caring too much about new model architecture" but to have a balance between exploitation and exploration.


I didn't really convey my thoughts very well. I think of the actual valuable "more efficient ways of training" to be paradigm shifts between things like pretraining for learning raw knowledge, fine-tuning for making a model behave in certain ways, and reinforcement learning for learning from an environment. Those are all agnostic to the model architecture, and while there could be better model architectures that make pretraining 2x faster, it won't make pretraining replace the need for reinforcement learning. There isn't as much value in trying to explore this space compared to finding ways to train a model to be capable of something it wasn't before.


I think you misunderstood the article a bit by saying that the assertion is "that a new architecture will be the solution". That's not the assertion. It's simply a statement about the lack of balance between exploration and exploitation. And the desire to rebalance it. What's wrong with that?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: