This is a pretty neat hack, but not really good for a true production deployment...

Ixiaus · on June 23, 2013

Both Git and Rsync are incompetent deployment tools for many reasons I will not go into in a comment as there are quite a few articles expounding on the virtues of not using your VCS. There are also numerous reasons why rsync is inappropriate too (what if you push up a nasty bug and you have to revert? Op, better go revert to my tagged release then rsync again - this is ugly in comparison to versioned releases combined with proper use of the operating system's dominant package manager where you can upgrade/downgrade an application based on its version...)

I generally see four stages in the devops maturation of a programmer:

1. I rsync my code using pre-built commands in Fabric when I'm ready to push.

2. I write code and have hooks on the server to pull the repo when I tag a release in my VCS.

3. I use my language's package management system to build a source distribution that includes all of the necessary static assets, the web application, and any database migration code; I also use a sane versioning scheme to keep track of releases. When I want to push I use a build system that hooks into my continuous integration server and builds a distribution whenever the senior programmer tags a release. It is then made available to the production server in a deb or rpm repository where the senior programmer can then just run an update command (that updates with the new distribution and runs any necessary database migration or post-upgrade hook scripts).

4. You are so big that you've got a custom deployment system built on-top of BitTorrent (ala Facebook) or something similar.

It should be obvious where I'm at - I progressed from being an adherent to VCS deployment, to rsync only, to a proper source distribution release system. I haven't managed the devops for a team/application the size of Facebook yet but I'm sure I will get there soon.

contingencies · on June 24, 2013

I agree.

Benefits of versioned archives over VCS for deployments: easily checksum and cryptographically sign; easily integrate with existing distribution specific package databases; deploy without requiring a VCS (and all its dependencies, including maintained and accessible VCS repo-hosting deployment infrastructure), probable security and speed benefits of the resulting (ie. minimalist) approach (both at the level of the host and the network).

Personally I use a combination of versioned archives and named and versioned target environments, each of which can be tested both individually and in combination (including regression tests). This works well for me.

zobzu · on June 24, 2013

Theres many package management systems which sucks at this.

I suppose then that you mean "rpm" or "debs" or the like. Not the "language package management system" as the previous poster mentioned. Because I've yet to see one that truly support more than tar xzf <list of deps>. Even when they have signing support none of the packages are signed, anyways.

eru · on June 24, 2013

> easily checksum and cryptographically sign

Just as small nitpick: Your VCS should support this, too. (Git does, for example.)

cleaver · on June 23, 2013

Anywhere that I have a say in the matter, FTP is disabled. I've been a fan of rsync for years and have a bunch of scripts that can make the whole process seamless. That said, I'm starting to be won over by git deploys.

The reason I've started to like git is deletes. You can handle them with rsync:

rsync --delete

The problem is that some projects have content uploaded in the same file tree (simple CMS installs). This might not be an issue if it was structured differently (symlink to another directory), but sometimes it's what I have. Using "rsync --delete" would remove newly uploaded user content. Yeah, I could use the "--exclude" option as well.

With git, I can just "git rm ..." and the file will be removed on deploy. Content can be mixed in the same tree and hidden with a .gitignore file. File content can be managed separately with rsync, if that's the best way. Just not FTP. Please.

snogglethorpe · on June 23, 2013

> Content can be mixed in the same tree and hidden with a .gitignore file

Note that rsync also allows fairly powerful in-tree tweaking of details: if you give it the "-F" option, it will look for ".rsync-filter" files (see man page for details).

inthewind · on June 24, 2013

What's wrong with exclude and exclude-from? Using git rm without file manager integration - can be a little thorny.

avar · on June 23, 2013

We've been working on moving away from rsync for our code syncing to using Git where I work.

I'm not saying there aren't uses for rsync, but your dismissal of git as not being suitable for a "true production deployment system" isn't supported in any way. And stating that rsync was "specifically made for this kind of thing" without comparing any of the trade-offs involved is just appealing to authority.

Some things you may have not considered:

  * rsync is meant to sync up *arbitrary filesystem trees*, whereas
   with Git you're snapshotting trees over time.

   When you transfer content between two Git repositories the two ends
   can pretty much go "my tree is at X, you have Y, give me X..Y
   please". You get that as a pack, then just unpack it in the
   receiving repository.

   Whereas with rsync even if you don't checksum the files you still
   have to recursively walk the full depth of the tree at both ends
   (if you're doing updates), send that over the wire etc. before you
   even get to transferring files.

 * Since syncing commits and actually checking them out are two
   different steps you can push out commits (without checking them
   out!) to your production machines as they're pushed to your
   development branches.

   Then deploying is just sending a message saying "please check out
   such-and-such SHA1" and the content will already be there!

 * You mentioned in another post here that rsync has --delay-updates,
   this is just like "git reset --hard" (but I'll bet Git's is more
   efficient). With Git you can do the transfer of the objects and the
   checking out of the objects as separate steps.

 * It's way easier for compliance/validation reasons to not get the
   data out of Git, since you can validate with absolute certainty
   that what you have at a given commit is what you have deployed
   (just run "git show"). If you check the files out and then sync
   them with some out-of-bound mechanism you're back to comparing
   files.

Edit: One thing I forgot, it's distributed. Which gives you a lot of benefits. Consider this problem, you have 1000 servers running your code and you've decided that you want to deploy now from a staging server.

Having trying to rsync to 1000 servers at once from one box (the naïve implementation with rsync) would take forever and overload that one box, especially if you wanted to take advantage of pre-syncing things on every commit so the commit will already be there if you want to roll out (constant polling and/or pushing).

You can mitigate this by having intermediate servers you push to, but then you've just partitioned the problem, what if you need to swap out those boxes, they go down etc.

With Git you can just configure each of the 1000 boxes to have 3 other boxes in the pool as a remote. Then you seed one of them with the commit you want to rollout. The content will trickle through the graph of machines, any one machine going down will be handled gracefully, and if you want to rollout you can just block on something that asks "do you have this SHA1 yet" returning true for all live machines before you "git reset --hard" to that SHA1 everywhere.

pvnick · on June 23, 2013

Thank you for your well-constructed reply. I've updated my original post to reflect my current thoughts.

tux1968 · on June 23, 2013

You've described some admirable utility that can be achieved by using Git. However, it can all be accomplished with other tools and without needing the entire deployment history stored on each production machine.

As for your comment about being "back to comparing files", that's all Git is doing internally anyway. You can do the same with other deployment tools and sha1 hashes etc.

avar · on June 23, 2013

   > it can all be accomplished with other tools

Sure it can be accomplished with other tools, but if Git is sufficient introducing other tools just increases the complexity of your stack, and the complexity of e.g. validating that a Git tag corresponds to what claims to be rolled out as that tag.

   > and without needing the entire deployment history stored on each
   > production machine.

This is a constraint a lot of people seem to think they need but they don't actually need. If someone gets your current checkout they'll have current code / passwords (if you accidentally checked in a password but removed it you should change that password). Getting the code history will just satisfy historical curiosity. Hardly a pressing concern for an attacker.

   > As for your comment about being "back to comparing files", that's
   > all Git is doing internally anyway. You can do the same with
   > other deployment tools and sha1 hashes etc.

Yes, but the point is that it just gives you that for free without you having to hack anything extra on top of your syncing mechanism.

You'd be pleasantly surprised how much checking/validation/syncing logic that you have to write around e.g. rsync when syncing a Git repo just disappears entirely if you just use Git to sync the files.

tlrobinson · on June 23, 2013

Note that git could be used as the developer/ops-facing deploy interface, while under the hood you do something more complicated/robust like Capistrano or rsyncing to multiple machines, or whatever.

Maybe you start out on Heroku. Then you switch to your own machines and use this simple hack, or Dokku or something. Then something home grown. The complexity of deploy scripts can grow while the interface stays the same.

Aqueous · on June 23, 2013

I've been using this method of deployment for several production sites for a couple of years now. I don't really see why rsync is better, or how using git is meaningfully different from having a live branch that you Rsync from. As long as you're checking out into a detached work tree it is functionally identical to rsync.

mark_l_watson · on June 23, 2013

I use rsync with a local commit hook. I wrote it up here: http://blog.markwatson.com/2013/06/automating-clojure-web-ap... for Clojure auto deployments and I am starting to use something similar for Meteor deployments.

bretthoerner · on June 23, 2013

And why is rsync superior?

pvnick · on June 23, 2013

Part of my reasoning is that rsync is specifically made for this kind of thing, whereas git is specifically made to synchronize coding among multiple developers. So my argument is partly theoretical and less practical.

But for an argument based in pragmatism, rsync has tools such as the --delay-updates flag, which allows your entire deployment procedure to become a pass-or-fail atomic operation. This kind of assurance slows my hair loss as a systems administrator. AFAIK git has no such tools, but I'm certainly open to being corrected.

statusgraph · on June 23, 2013

Note that delay-updates is not actually atomic [but it's closer] =)

pvnick · on June 23, 2013

Good point, thank you!

michaelmior · on June 23, 2013

As mentioned in another comment, you can get the rough equivalent of --delay-updates with git fetch followed by git reset or git merge.

bretthoerner · on June 23, 2013

The same thing in git:

    git fetch <remote> && git reset --hard <remote/branch|tag|hash>