This seems like the sort of optimization that should be occurring in MongoDB itself - instead of acquiring the lock, loading the record into memory (if it's not already), then making the change and releasing the lock, acquire the lock after the record has been loaded into memory (if it's not already).
Have you spoken with any of the MongoDB developers about why it's currently the way it is, vs. a more efficient update path?
I think there are some possible timing issues with making that a general behavior in the server. 10gen did make it the default behavior on slaves, where the inserts are controlled by the oplog (http://jira.mongodb.org/browse/SERVER-1646).
For us, our DB abstraction layer made this behavior so simple to add that we didn't make much fuss about it.
Have you spoken with any of the MongoDB developers about why it's currently the way it is, vs. a more efficient update path?