Does anyone have experience working with Titan, either on the side or in production? Is Titan production ready?
The flexible storage backends, clustering, and the open source license are all very enticing. I've been looking for a graph database for an upcoming project and have yet to find something that really matches what we're looking for.
Titan is used in production, and it's one piece of the Aurelius Graph Cluster (http://thinkaurelius.com/subscription/), which is some of the most impressive tech to hit the open-source scene in the last few years.
Matthias Broecheler (https://twitter.com/MBroecheler) is the original creator of Titan, and he is incredibly bright. When he finished his PhD, he linked up with Marko Rodriquez (https://twitter.com/twarko), the creator of Gremlin (https://github.com/tinkerpop/gremlin/wiki), and they formed Aurelius to focus on building the big-data graph ecosystem (like Cloudera for graphs -- in fact, the Aurelius Cluster integrates with Hadoop and Cloudera).
There are other distributed graph databases, but most of these are batch processing engines like Pregel. However, Titan is a real-time, transactional graph database backed by either Cassandra or HBase, and it provides fast, horizontally scalable write performance (10,000+ tps) that hasn't been available in an open-source graph database.
Combining this with Faunus for batch processing and the Aurelius Graph Cluster's integration with the Hadoop ecosystem makes for an incredibly powerful platform for building applications such as social startups.
We just started using Titan in production last week for shift.com, on a 3 node cassandra cluster. We open sourced our Object Graph Mapper library for Python here:
There's a few caveats that come with working with distributed databases, so it's important to know what you're getting into. Neo4j might be easier out of the box (since more people are using it), but if you want a robust solution that'll work for 50 or 50,000 users, Titan feels like the way to go.
We have a various clients using Titan in production. Of course, like any project, there are always more desired features. The Titan/Faunus roadmap is greatly influenced by our clients.
I'm currently playing around with it for a large internal development project. My only concern (and why I'm leaning towards using Neo4j at least initially) is that I don't really yet have a good indication if I'm going to be dealing with enough data to warrant a big distributed solution.
I'm actually right now mostly messing with TinkerGraph (an in memory graph database that's part of the Tinkerpop utilities that the Titan guys make).
With Titan/BerkeleyDB you will get blazing performance for a single-machine distribution. One of the wonderful innovations of Titan is vertex-centric indices that is even necessary at single-machine scale.
Next, if you decide to scale horizontally, then you can simply change the storage.backend=cassandra and thats that (of course, you need to do a bulk data transfer from BerkeleyDB to Cassandra).
my team has made a ton of contributions to hermes - it's a pretty solid library. that said, i'm happy to see more traction for clojure + titan, especially from the clojurewerks crew.
There is support for graph rewriting for a Titan-backed data-set using Faunus. Titan does not support global graph operations. Therefore, Faunus was created to allow you to perform offline graph operations much like the one you've described.
The flexible storage backends, clustering, and the open source license are all very enticing. I've been looking for a graph database for an upcoming project and have yet to find something that really matches what we're looking for.