> There is a cookie setting you can use to get some features early on GitLab.com but I can't find a link right now.
Andrew from the Infrastructure team at GitLab here: non-team members are welcome to use our canary service, provided they understand that service may at times we downgraded (they can always switch back to production when this happens).
We're hoping to complete this by the end of the year. Once it's done, we'll be able to end our reliance on NFS, which should greatly improve performance and uptime on GitLab.com and other large GitLab instances. In fact, we're already seeing some big performance payoffs as we bring services online.
> Please please don't get complacent. Not yet.
I can confirm that this is not the case. We're focused and working really hard to improve performance and we're also working hard to improve our metrics, so that we can target optimizations where we can gain greatest benefit.
It's also worth pointing out that routinely experiencing 5+ second render times on browsing a repo homepage is outside our 99th percentile latencies for that route. I'd be interesting in digging into it further. Would you mind creating an issue in https://gitlab.com/gitlab-org/gitaly/issues/new (mark it as confidential if you wish) and ping me `@andrewn`.
The 5+ second delay is a little above the average (I'm rarely frustrated enough to actually go off and get distracted by HN while I wait), but notable delays on loading repo trees are definitely common, if not usually quite so high.
Right, I meant more in a Q&A fashion with perpendicular discussion like with SA. But I'm happy to see there is an archive. I've had decade old Mozilla bug reports save me more than once.
Gitter co-founder here: we use neo4j for suggesting rooms.
So, if you're in room A and room B, and most people in those two rooms are also in room C, then we suggest that you join room C.
As I mentioned elsewhere, we also use some non-neo4j based methods for suggestions (including your GitHub graph). One of the problems we've had with neo4j is that we haven't been able to make it scale. It frequently burns up from being overloaded.
This is almost certainly down to the way we use neo4j but at some point I'd like to ditch it for a clustered suggestion algo that uses batch processes to cluster rooms together.
Sounds like you're going to be doing a lot of writes, which neo4j isn't designed for. While I get why you're using it, I'd suggest that it's not really the right tool for that job, which is possibly why you're having problems with it.
Do you really need persistance at all? If the application crashes everyone would be kicked out of the channels anyway no? I think making an in memory graph and ditching the graph DB entirely would be more helpful for that kind of suggestion tool.
Neo4j Powers a 1.2 Terabytes Size Data with Nearly Three Billion Nodes and Nine Billion Relationships to Create the World’s Largest Consumer Identity Graph. Neo4j’s native graph clustering architecture is ideally suited to deliver real-time query performance at scale, across massive customer graphs. Please see: https://neo4j.com/news/qualia-media-customers-experience-gre...
Unfortunately our neo4j setup as it stands really doesn't scale. At an application-level we're built error-handling to gracefully handle the frequent outages we experience running it.
These are probably our fault, rather than a failing in the product, but I've wanted to replace the current neo4j-based suggestion algo with a new one that uses batching/clustering for a while. As soon as I get a chance, I would like to remove the dependency we have on neo4j.
It looks like you're still using Neo4j 2.3. Have you tried using newer versions of Neo4j? There have been some pretty substantial performance improvements.
Additionally, Neo4j does support clustering with their Enterprise edition. It also has much better tooling (better metrics/logging, backups, etc.). It is AGPL licensed, so there's no reason not to use it really.
You might want to reach out to Al Baker or Kendall Clark at Stardog. Their product is more RDF centric than for generic graphs like Neo4j, but is in my opinion a much better product. They have a community version, and have open sourced several pieces, but it is commercial. https://www.stardog.com/docs/
The big problem with the way we use neo4j is around huge rooms with ~100K users. The number of possible rooms that neo4j has to traverse (even a shallow traversal) for a user in a big room is huge.
Thanks for your recommendations. I'll definitely check them out.
Those huge rooms really don't add anything to recommendations (like everyone buys milk), if almost everyone is in the room you can also just leave it off (the same is true for rooms with very few users).
That's why in recommendations, I usually filter them out b degree (which is a constant value read).
MATCH (r:Room) WHERE size( (r)<-[:MEMBER]-() ) < 5000)
...
You're also not using directions in your queries, which will also help.
I looked at your other queries, they can also be improved easily, if you want to I can help you do that. Just need some sample data or a read-only access to the db.
Just ping me at michael@neo4j.com.
Gitter is not intended to be a replacement for Mattermost, Slack or other team collaboration tools. We see Gitter as a community instead.
As such, we're not expecting to see a huge uptake of on-site installations, so the list of required services (es, neo etc) is big compared to other products focused on on-prem.
We're hoping that our users will contribute to the main site, Gitter.im.
Obviously, we're also totally happy with users running their own Gitter installations but, while we would like it to be easy, ease-of-installation of a production instance is not a goal currently.
We currently use docker-compose for the development environment. I'm currently simplifying this to allow developers to get started with minimal effort:
We have a docker-compose.yml for development environments and I'm currently in the process of simplifying this[1] so that Docker for Mac/Windows will be able to spin up an environment with little effort.
We use ansible for provisioning beta/staging and production. We have yet to open source the ansible repository but, since we're switching to GitLab CI/CD, the deployment process will soon be publicly accessible - even for production.
We use Ansible / packer & terraform. Open sourcing this repo will be fairly complicated but it may make more sense for us to publish a k8s helm chart or more complete, production-ready docker-compose.yml
you are already using docker-compose.yml. you should use definitely do a production ready docker swarm deployment. you only will need to do incremental changes to your docker-compose
Andrew from the Infrastructure team at GitLab here: non-team members are welcome to use our canary service, provided they understand that service may at times we downgraded (they can always switch back to production when this happens).
Details of how to toggle the canary environment can be found in our handbook: https://about.gitlab.com/handbook/engineering/#canary-testin...