More

suprememoocow · on July 23, 2018

> There is a cookie setting you can use to get some features early on GitLab.com but I can't find a link right now.

Andrew from the Infrastructure team at GitLab here: non-team members are welcome to use our canary service, provided they understand that service may at times we downgraded (they can always switch back to production when this happens).

Details of how to toggle the canary environment can be found in our handbook: https://about.gitlab.com/handbook/engineering/#canary-testin...

suprememoocow · on Oct 10, 2017

I'm the Gitaly Lead at GitLab. Gitaly is a GitLab project to move all the git operations out of the GitLab Ruby monolith into a Git RPC service.

https://gitlab.com/gitlab-org/gitaly/

We're hoping to complete this by the end of the year. Once it's done, we'll be able to end our reliance on NFS, which should greatly improve performance and uptime on GitLab.com and other large GitLab instances. In fact, we're already seeing some big performance payoffs as we bring services online.

> Please please don't get complacent. Not yet.

I can confirm that this is not the case. We're focused and working really hard to improve performance and we're also working hard to improve our metrics, so that we can target optimizations where we can gain greatest benefit.

It's also worth pointing out that routinely experiencing 5+ second render times on browsing a repo homepage is outside our 99th percentile latencies for that route. I'd be interesting in digging into it further. Would you mind creating an issue in https://gitlab.com/gitlab-org/gitaly/issues/new (mark it as confidential if you wish) and ping me `@andrewn`.

lucideer · on Oct 10, 2017

The 5+ second delay is a little above the average (I'm rarely frustrated enough to actually go off and get distracted by HN while I wait), but notable delays on loading repo trees are definitely common, if not usually quite so high.

suprememoocow · on Oct 10, 2017

Repo tree rendering is a known issue that we're currently looking into: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/14680

We hope to make improvements in this area soon. Hopefully you'll notice them!

suprememoocow · on July 4, 2017

All public rooms are archived and searchable:

https://gitter.im/gitterHQ/gitter/archives/all https://gitter.im/gitterHQ/gitter/archives/2017/06/14

I agree that chat isn't as digestible as SO, but it is definitely archived.

lamlam · on July 4, 2017

Right, I meant more in a Q&A fashion with perpendicular discussion like with SA. But I'm happy to see there is an archive. I've had decade old Mozilla bug reports save me more than once.

suprememoocow · on July 4, 2017

Glad to hear and thank you very much!

suprememoocow · on July 4, 2017

Gitter co-founder here: we use neo4j for suggesting rooms.

So, if you're in room A and room B, and most people in those two rooms are also in room C, then we suggest that you join room C.

As I mentioned elsewhere, we also use some non-neo4j based methods for suggestions (including your GitHub graph). One of the problems we've had with neo4j is that we haven't been able to make it scale. It frequently burns up from being overloaded.

This is almost certainly down to the way we use neo4j but at some point I'd like to ditch it for a clustered suggestion algo that uses batch processes to cluster rooms together.

scaryclam · on July 4, 2017

Sounds like you're going to be doing a lot of writes, which neo4j isn't designed for. While I get why you're using it, I'd suggest that it's not really the right tool for that job, which is possibly why you're having problems with it.

Do you really need persistance at all? If the application crashes everyone would be kicked out of the channels anyway no? I think making an in memory graph and ditching the graph DB entirely would be more helpful for that kind of suggestion tool.

suprememoocow · on July 4, 2017

We update on a batch[1] fairly infrequently (daily iirc). We don't need exact room membership since the results are aggregated.

> If the application crashes everyone would be kicked out of the channels anyway no

No, Gitter's room membership is persistent (between restarts, crashes and sessions)

1: https://gitlab.com/gitlab-org/gitter/webapp/blob/master/scri...

dsala · on July 6, 2017

Neo4j Powers a 1.2 Terabytes Size Data with Nearly Three Billion Nodes and Nine Billion Relationships to Create the World’s Largest Consumer Identity Graph. Neo4j’s native graph clustering architecture is ideally suited to deliver real-time query performance at scale, across massive customer graphs. Please see: https://neo4j.com/news/qualia-media-customers-experience-gre...

suprememoocow · on July 4, 2017

From the project page, choose Repository | Charts.

Beware though, it's pretty slow right now.

It might be quicker to just look at a screenshot of it instead http://imgur.com/a/qBGVf

For reference:

   JavaScript 86.45 %
   CSS 6.79 %
   HTML 4.79 %
   Shell 1.07 %
   Lua 0.82 %
   Makefile 0.05 %
   Groovy 0.03 %
   Python 0.01 %

suprememoocow · on July 4, 2017

(Gitter Co-founder here)

Unfortunately our neo4j setup as it stands really doesn't scale. At an application-level we're built error-handling to gracefully handle the frequent outages we experience running it.

These are probably our fault, rather than a failing in the product, but I've wanted to replace the current neo4j-based suggestion algo with a new one that uses batching/clustering for a while. As soon as I get a chance, I would like to remove the dependency we have on neo4j.

Gudahtt · on July 4, 2017

It looks like you're still using Neo4j 2.3. Have you tried using newer versions of Neo4j? There have been some pretty substantial performance improvements.

Additionally, Neo4j does support clustering with their Enterprise edition. It also has much better tooling (better metrics/logging, backups, etc.). It is AGPL licensed, so there's no reason not to use it really.

sandGorgon · on July 4, 2017

you should really look at Janus (with Cassandra as the backing datastore). http://janusgraph.org/

Since it uses Gremlin, you should not have that big a problem in porting.

Communitivity · on July 4, 2017

You might want to reach out to Al Baker or Kendall Clark at Stardog. Their product is more RDF centric than for generic graphs like Neo4j, but is in my opinion a much better product. They have a community version, and have open sourced several pieces, but it is commercial. https://www.stardog.com/docs/

If you want pure Open Source then Titan might be a good fit for Gitter. http://titan.thinkaurelius.com/

suprememoocow · on July 4, 2017

The big problem with the way we use neo4j is around huge rooms with ~100K users. The number of possible rooms that neo4j has to traverse (even a shallow traversal) for a user in a big room is huge.

Thanks for your recommendations. I'll definitely check them out.

mesirii · on July 7, 2017

Those huge rooms really don't add anything to recommendations (like everyone buys milk), if almost everyone is in the room you can also just leave it off (the same is true for rooms with very few users).

That's why in recommendations, I usually filter them out b degree (which is a constant value read).

MATCH (r:Room) WHERE size( (r)<-[:MEMBER]-() ) < 5000) ...

You're also not using directions in your queries, which will also help.

I looked at your other queries, they can also be improved easily, if you want to I can help you do that. Just need some sample data or a read-only access to the db. Just ping me at michael@neo4j.com.

maxdemarzi · on July 5, 2017

Why not get in touch with us and let us help you instead of dropping us?

suprememoocow · on July 5, 2017

I'm totally open to that.

I'm not going to be changing anything around the suggestions any time soon, but who should I get in touch with when I do?

maxdemarzi · on July 5, 2017

Email me: max.demarzi@neotechnology.com

suprememoocow · on July 4, 2017

Thank you :)

suprememoocow · on July 4, 2017

Andrew, co-founder of Gitter here.

Removing secrets was a lot of work - more than I expected - while we open-sourced the product.

I agree with your sentiment though. Handling secrets in a codebase is not something that it currently easy or standardised.

As an aside, BFG Repo Cleaner really helped a lot with cleaning things up: https://rtyley.github.io/bfg-repo-cleaner/

kobeya · on July 4, 2017

It's been my impression that the standard (promoted by services like Heroku and Travis) is to pass secrets as environment variables.

suprememoocow · on July 4, 2017

Fair enough: this is exactly what we've moved to on Gitter on Gitter since open-sourcing the product.

stavros · on July 4, 2017

I quite like git-crypt for secrets, I store them in a single place (eg as environment variables) and encrypt that.

suprememoocow · on July 4, 2017

Hi, co-founder of Gitter here.

Gitter is not intended to be a replacement for Mattermost, Slack or other team collaboration tools. We see Gitter as a community instead.

As such, we're not expecting to see a huge uptake of on-site installations, so the list of required services (es, neo etc) is big compared to other products focused on on-prem.

We're hoping that our users will contribute to the main site, Gitter.im.

Obviously, we're also totally happy with users running their own Gitter installations but, while we would like it to be easy, ease-of-installation of a production instance is not a goal currently.

type0 · on July 4, 2017

What would be the hardware requirements to run gitter for a couple of hundred accounts?

suprememoocow · on July 4, 2017

We haven't done much benchmarking at that end of the scale, but I can't think of any reason why you wouldn't be able to run it on a single server.

ptman · on July 4, 2017

Sure, but is it a 5USD/month DO instance or a 640USD/month one?

supergreg · on July 4, 2017

The easiest route for installations would be to offer a docker image and docker-compose file to set up the multiple services.

suprememoocow · on July 4, 2017

We currently use docker-compose for the development environment. I'm currently simplifying this to allow developers to get started with minimal effort:

https://gitlab.com/gitlab-org/gitter/webapp/blob/feature/mon...

Going forward, I agree that we could offer a more production-like docker-compose setup and/or Kubernetes Helm Charts.

marmaduke · on July 4, 2017

> we would like it to be easy, ease-of-installation of a production instance is not a goal currently.

ok but doesn't it behoove you anyway to have e.g. dockerfile and/or ansible roles for dev/staging environments?

suprememoocow · on July 4, 2017

We have a docker-compose.yml for development environments and I'm currently in the process of simplifying this[1] so that Docker for Mac/Windows will be able to spin up an environment with little effort.

We use ansible for provisioning beta/staging and production. We have yet to open source the ansible repository but, since we're switching to GitLab CI/CD, the deployment process will soon be publicly accessible - even for production.

1: https://gitlab.com/gitlab-org/gitter/webapp/merge_requests/1... so that

corney91 · on July 4, 2017

They probably do have some sort of automation, but it's likely specific to their environment.

suprememoocow · on July 4, 2017

We use Ansible / packer & terraform. Open sourcing this repo will be fairly complicated but it may make more sense for us to publish a k8s helm chart or more complete, production-ready docker-compose.yml

sandGorgon · on July 4, 2017

you are already using docker-compose.yml. you should use definitely do a production ready docker swarm deployment. you only will need to do incremental changes to your docker-compose

kobeya · on July 4, 2017

Make a merge request.