Make a Backup!
What are you protecting?
It is important to refine some terms at the start.
A backup is a redundant, non-functional copy of
something important, made recently, that you might need
in a hurry.
A code repository is not a backup.
For web applications we have to consider two things:
the Database(s) and the Codebase.
These two things run together to provide the
application.
At its best, a backup is a point-in-time copy of a
functioning system... a snapshot.
This can be a difficult thing to get, especially from a
high traffic site. As a system runs, the database is
constantly changing or is running slightly behind its
save-graph, the codebase changes less often (or not at
all) but to some extent it is temporally dependent on
the state of the database.
As this is a redundant copy that might be needed it
just has to be useful not necessarily perfect. Let's go
with hourly backups of the running DB and hourly copies
of the application code.
To get a copy of the database, the backup system needs
to do a dump. For MySQL there is a tool called
mysqldump we can call from a crontab script set to run
each hour.
#crontab
5 * * * * /usr/bin/mysqldump --quick -ureadonlyuser
-p$PASS innodbDatabase >
/data/db_backups/running-backup.sql
Now that we have a new copy each hour we need to put it
somewhere safe.
In the same crontab the running-backup.sql file is
copied to an Amazon S3(Simple Storage Solution) bucket.
This step is really important. If we lose access to the
server for any reason we cannot get to the hourly
backup of the database. if the backup is in an S3
bucket we can quickly copy it to a new server and
install it in a different MySQL server and carry
on.
Now the Codebase.
We need to mirror the files to another directory,
create a compressed archive of the copied filesystem,
then copy the archive to an S3 bucket.
To get a point in time copy of the running code we can
and should rely on a tool called rsync. Rsync makes a
copy of a filesystem and can follow rules about what to
mirror. Rsync is also extremely fast at refreshing the
copy. It uses algorithms to calculate hashes for files
and parts of files and only copies the parts of the
file that have changed between the source and
destination directories. This speed is critical as we
need to tread lightly on the source code of the running
application.
#crontab
5 * * * * /usr/bin/rsync -av --delete /var/www/html
/home/ec2-user/rsync-backup
Now that we have our mirror created at the same time as the database dump we need to make a compressed archive using tar and copy it to the S3 bucket.
25 * * * * /bin/tar -pczf
/home/ec2-user/site.filesystem.tar.gz
/home/ec2-user/rsync-backup/
45 * * * * /usr/bin/s3put
running-backups/codebase/microinstance-filesystem-backup.tar.gz
/home/ec2-user/site.filesystem.tar.gz
For most purposes, this backup regime is probably good
enough. Disaster recovery would involve launching a new
LAMP server, installing the running-backup.sql,
unpacking the microinstance-filesystem-backup.tar.gz in
the apache document root.
There are a couple of failure modes that need to be
dealt with. If the S3 bucket gets a bad or empty
version of your backups just before the server fails it
is possible you don't actually have a useful backup
when you need it. To guard against this it is wise to
write each new backup file to S3 with some sort of
dynamic file name. I could use:
Monday.database.sql
Monday.codebase.tar.gz
Tuesday.database.sql
Tuesday.codebase.tar.gz
etc...
So 24 times each day I have a good backup and I have a good backup for 6 yesterdays until the cycle repeats. However, in the above failure mode I can only count on the backup from yesterday!
I can change this to:
Monday-13-00.database.sql
Monday-14-00.database.sql
etc...
Now I have the ability to step back up to 168 hours
into the past to find a good backup and recover with
that.
If your application must stay as up-to-date as possible
this higher-redundancy system is useful but will result
in higher storage fees.
This backup discussion does use AWS specific parts but it doesn't need to. As long as you have a copy of your application automatically copied to a different computer somewhere in the world everything will be ok.
chris.macdonald@@doctrinaire.com.au