| 1. | | Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers (arxiv.org) |
| 2 points by andy12_ 3 months ago | past | 1 comment |
|
| 2. | | From Memorization to Reasoning in the Spectrum of Loss Curvature (arxiv.org) |
| 65 points by andy12_ 4 months ago | past | 14 comments |
|
| 3. | | Concrete "battery" developed at MIT now packs 10 times the power (news.mit.edu) |
| 3 points by andy12_ 6 months ago | past | 2 comments |
|
| 4. | | Gauss, an Agent for Autoformalization (math.inc) |
| 6 points by andy12_ 6 months ago | past |
|
| 5. | | Spurious Rewards: Rethinking Training Signals in RLVR (rethink-rlvr.notion.site) |
| 1 point by andy12_ 10 months ago | past |
|
| 6. | | VR-CLI: Learning to Reason for Long-Form Story Generation (arxiv.org) |
| 2 points by andy12_ 10 months ago | past |
|
| 7. | | Tokenformer: Rethinking transformer scaling with tokenized model parameters (arxiv.org) |
| 3 points by andy12_ on Oct 31, 2024 | past | 1 comment |
|
| 8. | | Selective Attention Improves Transformer (arxiv.org) |
| 1 point by andy12_ on Oct 7, 2024 | past | 1 comment |
|
| 9. | | The AdEMAMix Optimizer: Better, Faster, Older (arxiv.org) |
| 2 points by andy12_ on Sept 10, 2024 | past |
|