Yes it can! That's the whole point of RL! it generates slightly out of distribution rollouts, and rewards good rollouts to change the distribution of the output
That's not out of distributíon, that's inside the distribution of the rollout. If you don't create rollouts for the game of Chess then it doesn't know how to play Chess no matter how smart it is at tasks you've created rollouts for. It's structurally stuck in its distribution.
I really don't see how this can be possible unless they're accepting abysmal recall? Perhaps I'm missing something fundamental here, but the idea that AI and non-AI assisted text can be separated with "nearly 0 false positives" just says to me that it's really just a filter for the weakest, most obvious AI generated text. Is that valuable?
Anthropic is very big (the biggest AI co?) in B2B, where you don't have ads. Also, if they end up creating a datacenter full of geniuses, ads won't make sense either.
1. you can definitely tell apart an S-curve and an exponential if you look at the derivative(s). AI progress does not seem to be close to the middle of the S-curve.
2. e.g.: a slowdown hasn't shown up in moore's law yet.
It is the case that Anthropic employees have no usage limits.
Some people do experiments where they spawn up hundreds of Claude instances just to see if any of them succeed.
reply