As opposed to using a simple or recursive mutex while forgoing concurrent reads, in this case. There was an order of magnitude speed difference in raw lock/unlock performance.
Obviously this can still be useful in read-heavy systems where the lock has to be held for long spans, but for our synchronized collections, we switched to std::mutex, which was massively faster.
Have you tried GCD barriers for this? A custom parallel queue basically gives you a reader/writer lock, with dispatch_sync or _async acting as a reader lock, and dispatch_sync_barrier or _async_barrier acting as a writer lock. GCD has a heavy emphasis on performance so I'd be interested to know how the speed of that approach is.
I recently reimplemented some of the caches within RestKit using a `NSMutableDictionary` guarded by a dispatch queue instead of a `NSCache`/`NSMutableDictionary` + `NSRecursiveLock`. The general idea is that you use a concurrent dispatch queue to provide concurrent read access and then use barriers to obtain an exclusive write lock to the resource. There were significant performance benefits from the change as I was able to shift the cache updates on a miss into the background. The dispatch queue approach also outperformed `NSCache` significantly under the workloads I was testing -- it appears that some of its internal calculations on add/remove can be quite costly if you are adding/removing rapidly.
Its also very flexible, as you can do `dispatch_sync` if you need a synchronous fetch of the resource or use callback blocks to go as async as possible.
I've been thinking about this since yesterday, and it seems like a really good solution for some use cases, but isn't there quite a bit of overhead to using a dispatche queue for fine-grained operations? I mean it seems like the overhead of cross-thread communication would severely outweigh a simple synchronous lock call.
Then again, if you can do things like background non-essential operations, then the higher-level benefits can probably outweigh that.
Like I said, GCD has a heavy emphasis on performance. For example, a dispatch_sync will involve no cross-thread communication in the uncontended case, and the cost is comparable to taking a spinlock. A bit heavier, but not too much.
dispatch_async will necessarily be slower, but don't overestimate how much work is really going to happen in the cases where it matters.
As opposed to an ordinary lock which can be acquired only once. Compared to that, a read/write lock can have significantly lower lock contention, since multiple readers can acquire it at once.