Note: I'm being brief, not trying to be snarky -- I'd love to hear what's behind your statements.
> - It does not scale.
What does this mean? I'm not saying you're wrong, but without some context, that doesn't say a whole lot.
I recall one big consulting company dropping Plone (and zodb) some five years ago because zodb/plone scaled to about 10.000 documents, and using a custom index solution based on lucene they managed to scale to around 100.000 documents for their cms -- but they ended up needing something else for "their biggest" clients. Can't find the link or remember the company now (I believe it was a German design shop). But it's the only story I've heard about zodb not scaling for it's typical use case?
> - Poor supprt for replication and sharding.
Are you aware that ZRS is now free and open source?
> - It is slow. Really slow. Un-pickle every time you get something from cache.
You have to marshall structures that you load in a different way too -- is this really something specific to zodb? Are you saying unpickle is slower than other ways to marshall python objects?
> - It is error prone. Forget a commit or forget to handle conflict errors and you're in big trouble.
As opposed to not handling transaction errors with a postgres backend?
> - No interoperability. Want to write a service in C++ to access your db? You're out of luck.
>
Well, it's an object database. The only other I know of off the top of my head that I know people are actually using, is Gemstone. You could of course wrap zodb in a xml/json api -- but yes, I don't think interop with other languages is a good fit for zodb.
> - Some server side stuff needs to have your objects, e.g. if you want to do conflict resolution.
> - Migrations/change to schemas are painful, once you change your object you're no longer going to be able to de-serialize it.
This is a problem I'm constantly running into with Plone and a more or less well understood set of third party add-ons. I really think the smalltalk image approach is better (if you have the "data" you also have the "behaviour" -- with zodb you might have a serialization of a complex class, but not the ability to marshal it).
I wasn't aware ZRS is now free/open source. I would need to look at what that brings to the table but it's unlikely to change my views. I'll check it out though and thanks for the heads up!
Ignoring ZRS- As your number of clients and transactions go up you're still bottlenecked in a single server. That's what I mean by doesn't scale. For various reasons (e.g. objects can refer to each other) you're basically stuck. A scalable database provide various means of growing as your load grows and ZODB does not.
There are a few problems with pickling. First of all it is slow. Under some assumptions there are faster ways of marshalling in Python. Secondly your granularity of access is the entire object. You can't just get a certain field out of a large object. Thirdly because objects in the client cache are pickled you are spending a whole lot of time serializing/de-serializing them when you don't really need to do that. In one of my applications that happens to account for 80% of the execution time.
I haven't spent a lot of time with SqlAlchemy but I think an ORM that maps well to some performant database is a better approach in Python.
It's best to keep object records small when using ZODB. This does mean you need to do some planning about object structure. Objects that inherit from "persistent.Persistent" are kept as separate records in ZODB, so you can break up a large object into several smaller ones by attaching persistent attributes to another object. If you just make a big structure out of nonpersistent objects, ZODB will have all the downsides of plain old pickle (like slow loading time for large objects) indeed, but its entire purpose is to allow you to not do this.
It does not unpickle every time you get something from cache. It maintains a first-level in-memory LRU per-thread object cache which is the actual Python object (not its pickle representation). In practice, that means you don't need to maintain your own results cache (with e.g. redis or memcached) as you do with most SQL-based systems. If it's slow for you, it's probably not due to its cache.
Write conflicts are indeed hairy to deal with. Using a BTree in places where you might instead use a dictionary usually makes this a lot better.
It sounds like you got burned by using it without understanding it very well.
- It does not scale.
- Poor support for replication or sharding.
- It is slow. Really slow. Un-pickle every time you get something from cache.
- It is error prone. Forget a commit or forget to handle conflict errors and you're in big trouble.
- No interoperability. Want to write a service in C++ to access your db? You're out of luck.
- As your system grows you'll have conflicts all over the place.
- Some server side stuff needs to have your objects, e.g. if you want to do conflict resolution.
- Migrations/change to schemas are painful, once you change your object you're no longer going to be able to de-serialize it.
- You have to roll your own if you want change notifications.
So if you're just looking to persist some Python objects in a small system, great. Otherwise I'd stay away from this.