HN2new | past | comments | ask | show | jobs | submitlogin

This is why it's so important to take some simple steps to ensure that you don't bork your site. For my production sites, I usually have the following setup:

* Replicated Database with one slave in a separate data center just as disaster recovery (not a performance thing with the separate data center). Heartbeat for the slave(s) in the same datacenter so that it stays up.

* Nightly backups offsite. Used to be rsync, but recently thinking of using tarsnap for the ease of it (and S3 puts it in several data centers).

* Files stored either in a MogileFS setup or S3 in multiple data centers.

It doesn't have instant failover to another data center, but the offsite DB slave should mean no data loss beyond a second or two. Ma.gnolia is a decent sized site. Maybe they did have a decent infrastructure and it will only be a little while before they've gotten everything back. Of course, after the fiasco with the blog site that used RAID as their backup system, I've started to think that many people don't take data as seriously as it needs to be taken.



Reminds me of the Couchsurfing fiasco back in 2006.

http://www.techcrunch.com/2006/06/29/couchsurfing-deletes-it...

http://www.couchsurfing.com/crash_page.html

Couchsurfing had a backup process in place, but failed due to human error. It's highly likely that magnolia had backups of some sort. Before we all start berating magnolia (and website owners in general) for not backing up their data, we need to know more about the real reason for this failure.

Couchsurfing was able to come back from the dead because of limited competition. In this case, unfortunately, it will be easier for users to look elsewhere.


Reminds me of LeafyHost.

(Summary: Small hosting company advertising multiple backups; their single server crashed and they had no backups at all. Long story involving terrible customer service ensues).


You're right that we probably shouldn't jump to conclusions, but it doesn't look good.

And human error shouldn't really be possible if your backups are automated and you periodically do test restores.


I like the idea of the offsite slave. I was also reading about another idea, forcing a slave delay to recover from oopses/corruption:

http://www.rustyrazorblade.com/2008/05/07/mysql-time-delayed...


I do something like that for my outgoing email ;)


"take some simple steps to ensure that you don't bork your site"

unborked[.com] is currently available.

A site dedicated to stopping all of the borking going on out there.

===

I think some cheap, simple sql dumps from the db, burned to an external HD and taken home would've been a good first step -- your data would be dated but somehow restorable nonetheless without the need for off-site, replicated, heart-beat monitored databases.

IMHO you really need an expert database admin (of which I am not) to get all of this up and keep all of it going; a caretaker who's only job in life is the maintenance of any and all persisted data.

On another note, backups are really useless unless you have actually tested the restores.


what does 'burning to an external harddrive' do ?

is this some new tech that I'm unfamiliar with ;) ?


I'd really like to store incrementals of my MySQL at S3. Anyone using something they really like? Or should I just break-down and do full dumps each night?


Tarsnap.

Tarsnap lets you say: tarsnap -c -f backup01302009 mysql_dir/

And you can just adjust the date each day. It gives you the luxury of a full dump (anytime you want to restore, just reference backup01302009), but it only actually stores the deltas (making sure not to duplicate data that might be in backup01292009 or backup01282009 and so on). Tarsnap stores the data to S3 so that it's replicated in multiple data centers.

It costs a little more than S3 at 30 cents per GB, but it's metered out so that if you only use 1MB of storage, you'll only be charged 0.03 cents for that storage. You could try creating your own way of doing incrementals, but I doubt you'd get it as efficient as Colin (the math genius behind Tarsnap) and so I doubt you'd get it cheaper. Plus, this way you don't have to deal with it.

And remember, it's hard to fill up a database.* As the Django Book notes: "LJWorld.com's database - including over half a million newspaper articles dating back to 1989 - is under 2GB." So, if they were using Tarsnap, they might be storing 5 or 10GB tops at a whopping $1.50-$3 per month plus whatever the transfer of their deltas was for the month. Oh, and tarsnap compresses the data too. So, maybe they'd be paying $1 or something lower.

* Clearly, if you hit it big time, you might not want to continue paying for tarsnap. However, if you become the next big thing, you can hire someone to deal with it for you.


This doesn't work. You can't just copy the files and expect them to be in a sane or consistent state.

You either need to a) use InnoDB hotbackup or b) use a slave, stop the slave, run the backup, and restart the slave to catch up.

At delicious we used B, plus a hot spare master, plus many slaves.

Additionally, every time a user modified his account, it would go on the queue for individual backup; the account itself (and alone) would be snapped to a file (perl Storable, iirc.) Which only got generated when the account changed, so we weren't re-dumping users that were inactive. A little bit of history allowed us to respond to things like "oh my god all my bookmarks are gone" and various other issues (which were usually due to API-based idiocy of some sort or another.)


Using a slave isn't fool proof either. If someone were to run a malicious command, it gets replicated, and could get backed up before being caught.


I didn't say that. Read what I wrote.

You use the slave so you can shutdown the database and get a consistent file snapshot. Then you do offline backup.


Yeah, it's true. I was a little simplistic. I usually use A, but I'm not dealing with the amount of data that delicious is.


Whenever Tarsnap is mentioned, I have to mention Duplicity which does the same thing, but is Free Software.

I use this for my personal backups, as well as backups of our work svn (fsfs) and git repositories. I use it against S3, and have found it incredibly reliable.

As a bonus, it encrypts everything but still does incremental backups. It's a really nice piece of software, and you don't have to pay anyone to use it.


...Duplicity which does the same thing...

Duplicity is not the same thing as tarsnap. Duplicity uses a full plus incrementals model compared to tarsnap's snapshot model, so with duplicity you're either going to be stuck paying to store extra versions you don't want or be stuck paying for multiple full backups. Moreover, tarsnap is considerably more secure than duplicity.

Before I started working on tarsnap, I considered using duplicity; but it simply didn't measure up.


How is tarsnap considerably more secure?


Some problems with duplicity off the top of my head -- I'm sure there are others (there always are):

1. Duplicity uses GnuPG. GnuPG has a long history of security flaws, up to and including arbitrary code execution. Yes, these specific bugs have been fixed; but the poor history doesn't inspire much confidence.

2. Duplicity uses librsync, which follows rsync's lead by making rather dubious use of hashes. In his thesis, Tridge touts the fact that 'a failed transfer is equivalent to "cracking" MD4' as a reason to trust rsync; but now that we know how weak MD4 is, it's possible to create files which rsync -- and thus Duplicity -- will never manage to back up properly.

3. When you try to restore a backup, the storage system you're using can give you your most recent backup... or it can decide to give you any previous backup you stored. Duplicity won't notice.

4. If you try to use the --sign-key option without also using the --encrypt-key option, duplicity will silently ignore --sign-key, leaving your archives unsigned. Based on comments in the duplicity source code, this seems to be intentional... but this doesn't seem to be documented anywhere, and it seems to me that this is an incredibly dumb thing to do.


EBS does deltas too. Is anyone else using it? I like the ability to mount a volume or clone a volume almost instantly and mount it on another machine.


EBS does deltas, but there are a few caveats. The most important being that you need to be using EC2. For many, $72/mo plus bandwidth might be a bit much for what they're doing if it can work on a 512MB Xen instance for under $40 with a few hundred gigs of transfer included.

Beyond that, drive snapshots aren't the easiest things to do. I know that Right Scale tells their customers to freeze the drive so that no changes can occur until the backup is complete. With S3 performance around 20MBytes/sec, to backup 1GB would take around a minute. That's not bad and only doing deltas it's unlikely you're going to have a huge amount to backup at any given time, but it isn't exactly good either. With file-level backup, you can do a mysqldump and then just back up that file. Eh, maybe I'm just preferring the devil I know in this situation.

It's a little more complex to set up (doing file-level backups), but if you're going the volume route, you need to make sure you don't leave the drive in an inconsistent state.

All that said, EBS is awesome. If it fits what you're looking for, then go for it!


This is not totally accurate. EBS snapshots are basically instantaneous, its just the copy to S3 that takes time, but Amazon performs this in the background. We use XFS on our EBS volumes (running MySQL 5 innodb) and then have a little perl script (http://ec2-snapshot-xfs-mysql.notlong.com/) that does FLUSH TABLES WITH READ LOCK -> xfs_freeze -> snapshot -> xfs_thaw -> UNLOCK TABLES. The whole process takes a fraction of a second, and it also logs where in the binlog the snapshot was made (handy since we create new slaves based off snapshots and reduces how much data we shuttle around).

We snapshot a slave every 10 minutes and the master once a night (just in case something totally weird happens to the slave and the sync isn't right). This is a multi-gig DB and we've had no problems.

Here is a link to a full tutorial about running MySQL on EC2 with EBS: http://developer.amazonwebservices.com/connect/entry.jspa?ex...

I wanted to also point out that a live slave is NOT a backup scheme. If someone hacks your database and runs DROP ALL FROM PRODUCTION_DATABASE you've now got a perfect copy of nothing.


Depends on the data and your budget, I guess.

Disk space is cheap -- I do a full dump of the database nightly (and a separate system dumps a few key tables every 15 minutes)


Then you can still lose a days data. Why not switch and ship the mysql binary log every x minutes? Last backup plus all the logs gives you much better recoverability




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: