This is a pretty neat hack, but not really good for a true production deployment system. Rsync is a far superior alternative. That being said, git should definitely be incorporated into the workflow such that, for example, you have a "live" branch which always reflects what is to be on production frontend nodes. From there you do 1) git pull origin live 2) rsync to live servers 3) build/configure/restart/etc. Set -e on that script obviously...
Edit: I should also mention, if you are stuck on something like restricted hosting with CPanel which severely limits your deployment options (some of my clients are in this boat), then http://ftploy.com/ is a really cool solution. But you should really get your ass off cpanel asap.
Double edit: Some of the replies below have made some good points that I had not considered which weaken my argument. So while I'm now more ambivalent than dismissive towards the idea of using git to deploy, there are several modifications that should be made to this particular system to make it production-ready. See avar's and mark_l_watson's comments below and mikegirouard's comment elsewhere for some ideas.
Both Git and Rsync are incompetent deployment tools for many reasons I will not go into in a comment as there are quite a few articles expounding on the virtues of not using your VCS. There are also numerous reasons why rsync is inappropriate too (what if you push up a nasty bug and you have to revert? Op, better go revert to my tagged release then rsync again - this is ugly in comparison to versioned releases combined with proper use of the operating system's dominant package manager where you can upgrade/downgrade an application based on its version...)
I generally see four stages in the devops maturation of a programmer:
1. I rsync my code using pre-built commands in Fabric when I'm ready to push.
2. I write code and have hooks on the server to pull the repo when I tag a release in my VCS.
3. I use my language's package management system to build a source distribution that includes all of the necessary static assets, the web application, and any database migration code; I also use a sane versioning scheme to keep track of releases. When I want to push I use a build system that hooks into my continuous integration server and builds a distribution whenever the senior programmer tags a release. It is then made available to the production server in a deb or rpm repository where the senior programmer can then just run an update command (that updates with the new distribution and runs any necessary database migration or post-upgrade hook scripts).
4. You are so big that you've got a custom deployment system built on-top of BitTorrent (ala Facebook) or something similar.
It should be obvious where I'm at - I progressed from being an adherent to VCS deployment, to rsync only, to a proper source distribution release system. I haven't managed the devops for a team/application the size of Facebook yet but I'm sure I will get there soon.
Benefits of versioned archives over VCS for deployments: easily checksum and cryptographically sign; easily integrate with existing distribution specific package databases; deploy without requiring a VCS (and all its dependencies, including maintained and accessible VCS repo-hosting deployment infrastructure), probable security and speed benefits of the resulting (ie. minimalist) approach (both at the level of the host and the network).
Personally I use a combination of versioned archives and named and versioned target environments, each of which can be tested both individually and in combination (including regression tests). This works well for me.
Theres many package management systems which sucks at this.
I suppose then that you mean "rpm" or "debs" or the like. Not the "language package management system" as the previous poster mentioned. Because I've yet to see one that truly support more than tar xzf <list of deps>.
Even when they have signing support none of the packages are signed, anyways.
Anywhere that I have a say in the matter, FTP is disabled. I've been a fan of rsync for years and have a bunch of scripts that can make the whole process seamless. That said, I'm starting to be won over by git deploys.
The reason I've started to like git is deletes. You can handle them with rsync:
rsync --delete
The problem is that some projects have content uploaded in the same file tree (simple CMS installs). This might not be an issue if it was structured differently (symlink to another directory), but sometimes it's what I have. Using "rsync --delete" would remove newly uploaded user content. Yeah, I could use the "--exclude" option as well.
With git, I can just "git rm ..." and the file will be removed on deploy. Content can be mixed in the same tree and hidden with a .gitignore file. File content can be managed separately with rsync, if that's the best way. Just not FTP. Please.
> Content can be mixed in the same tree and hidden with a .gitignore file
Note that rsync also allows fairly powerful in-tree tweaking of details: if you give it the "-F" option, it will look for ".rsync-filter" files (see man page for details).
We've been working on moving away from rsync for our code syncing to
using Git where I work.
I'm not saying there aren't uses for rsync, but your dismissal of git
as not being suitable for a "true production deployment system" isn't
supported in any way. And stating that rsync was "specifically made
for this kind of thing" without comparing any of the trade-offs
involved is just appealing to authority.
Some things you may have not considered:
* rsync is meant to sync up *arbitrary filesystem trees*, whereas
with Git you're snapshotting trees over time.
When you transfer content between two Git repositories the two ends
can pretty much go "my tree is at X, you have Y, give me X..Y
please". You get that as a pack, then just unpack it in the
receiving repository.
Whereas with rsync even if you don't checksum the files you still
have to recursively walk the full depth of the tree at both ends
(if you're doing updates), send that over the wire etc. before you
even get to transferring files.
* Since syncing commits and actually checking them out are two
different steps you can push out commits (without checking them
out!) to your production machines as they're pushed to your
development branches.
Then deploying is just sending a message saying "please check out
such-and-such SHA1" and the content will already be there!
* You mentioned in another post here that rsync has --delay-updates,
this is just like "git reset --hard" (but I'll bet Git's is more
efficient). With Git you can do the transfer of the objects and the
checking out of the objects as separate steps.
* It's way easier for compliance/validation reasons to not get the
data out of Git, since you can validate with absolute certainty
that what you have at a given commit is what you have deployed
(just run "git show"). If you check the files out and then sync
them with some out-of-bound mechanism you're back to comparing
files.
Edit: One thing I forgot, it's distributed. Which gives you a lot of
benefits. Consider this problem, you have 1000 servers running your
code and you've decided that you want to deploy now from a staging
server.
Having trying to rsync to 1000 servers at once from one box (the naïve
implementation with rsync) would take forever and overload that one
box, especially if you wanted to take advantage of pre-syncing things
on every commit so the commit will already be there if you want to
roll out (constant polling and/or pushing).
You can mitigate this by having intermediate servers you push to, but
then you've just partitioned the problem, what if you need to swap out
those boxes, they go down etc.
With Git you can just configure each of the 1000 boxes to have 3 other
boxes in the pool as a remote. Then you seed one of them with the
commit you want to rollout. The content will trickle through the graph
of machines, any one machine going down will be handled gracefully,
and if you want to rollout you can just block on something that asks
"do you have this SHA1 yet" returning true for all live machines
before you "git reset --hard" to that SHA1 everywhere.
You've described some admirable utility that can be achieved by using Git. However, it can all be accomplished with other tools and without needing the entire deployment history stored on each production machine.
As for your comment about being "back to comparing files", that's all Git is doing internally anyway. You can do the same with other deployment tools and sha1 hashes etc.
Sure it can be accomplished with other tools, but if Git is sufficient
introducing other tools just increases the complexity of your stack,
and the complexity of e.g. validating that a Git tag corresponds to
what claims to be rolled out as that tag.
> and without needing the entire deployment history stored on each
> production machine.
This is a constraint a lot of people seem to think they need but they
don't actually need. If someone gets your current checkout they'll
have current code / passwords (if you accidentally checked in a
password but removed it you should change that password). Getting
the code history will just satisfy historical curiosity. Hardly a
pressing concern for an attacker.
> As for your comment about being "back to comparing files", that's
> all Git is doing internally anyway. You can do the same with
> other deployment tools and sha1 hashes etc.
Yes, but the point is that it just gives you that for free without you
having to hack anything extra on top of your syncing mechanism.
You'd be pleasantly surprised how much checking/validation/syncing
logic that you have to write around e.g. rsync when syncing a Git repo
just disappears entirely if you just use Git to sync the files.
Note that git could be used as the developer/ops-facing deploy interface, while under the hood you do something more complicated/robust like Capistrano or rsyncing to multiple machines, or whatever.
Maybe you start out on Heroku. Then you switch to your own machines and use this simple hack, or Dokku or something. Then something home grown. The complexity of deploy scripts can grow while the interface stays the same.
I've been using this method of deployment for several production sites for a couple of years now. I don't really see why rsync is better, or how using git is meaningfully different from having a live branch that you Rsync from. As long as you're checking out into a detached work tree it is functionally identical to rsync.
Part of my reasoning is that rsync is specifically made for this kind of thing, whereas git is specifically made to synchronize coding among multiple developers. So my argument is partly theoretical and less practical.
But for an argument based in pragmatism, rsync has tools such as the --delay-updates flag, which allows your entire deployment procedure to become a pass-or-fail atomic operation. This kind of assurance slows my hair loss as a systems administrator. AFAIK git has no such tools, but I'm certainly open to being corrected.
Edit: I should also mention, if you are stuck on something like restricted hosting with CPanel which severely limits your deployment options (some of my clients are in this boat), then http://ftploy.com/ is a really cool solution. But you should really get your ass off cpanel asap.
Double edit: Some of the replies below have made some good points that I had not considered which weaken my argument. So while I'm now more ambivalent than dismissive towards the idea of using git to deploy, there are several modifications that should be made to this particular system to make it production-ready. See avar's and mark_l_watson's comments below and mikegirouard's comment elsewhere for some ideas.