Many IaaS Cloud Providers offer a “Snapshot” feature on their block storage (eg: AWS EBS). Generally speaking, a snapshot is a point in time that can be reverted to, at some time after it was made. A good use for a snapshot is when you are about to make a big hairy change to your server, VPS or VM that has many steps that are not easily rolled back. You take a snapshot *before* the change. If the change goes poorly, you can just restore the snapshot. Then you are back to the moment you took the snapshot. This feature has saved countless butts.
However, “Snapshots are not backups.” I often get puzzled looks when I say this to people in person. I hear deafeningly silent pauses when I say this on the phone. I can tell that this is a poorly understood distinction. I thought I should spend some time writing about this. Perhaps I can help clear this up.
How a snapshot works…
When you click the button to create a snapshot in most cloud hosting systems, this what happens: First, a new file is created on the storage array of your host. This file contains what has and will change after that snapshot button is clicked. In fact, your “disk” was a file that was everything that had changed since you created the VM from a template. The template is probably a whole copy of the OS. All of these parts are required for your VM to run. Your root “disk” is really the same template file that hundreds or thousands of other VMs are based on, plus the file that is everything that you have done to your VM since it first powered on, plus any snapshots your have made. Sounds precarious, doesn’t it? We take steps to mitigate the risk of corruption, etc… but yes, it might be more precarious than you might realize.
The reason this is set up this way is for the convenience of being able to make instant snapshots, and to spin up new VMs in under a minute. These things would take much much longer to do if it meant copying several Gigabytes of data for each operation. It would also cost much more, as much more disk space would be consumed for all of these copies.
Your snapshots are not backups because they depend on these other chunks of data to make any sense. If the underlying template is corrupt, your snapshot is useless. You need a backup to make sure you have your data after some unforeseen event. Note: In the M5 Hosting Cloud, you can convert a snapshot to a template, which will cut this dependency chain.
OK, now that I have described what a snapshot is. Let me describe what makes a good backup and why a Snapshot is NOT a backup…
A backup is stored and operated on separate infrastructure. It’d be even better if the backup can’t be made unavailable by the people who can make your servers unavailable. Systems administrators, data center or hating vendors, etc. If something can happen to your servers, make darn sure it can’t happen to your backups in the same event or situation: if you don’t pay your bill, if the building catches fire, a disgruntled employee deletes data, ransomware, etc. Whatever wipes out your data or access to the primary infrastructure, make sure it can’t happen to the backup data too. If your primary data is at AWS, and you get your AWS account compromised or deleted for non-payment, then your AWS backups weren’t an effective backup of your AWS account.
A backup is an offline, read-only, immutable, recoverable, archived full copy of the data. This means backups are not a RAID array (you should have RAID, but RAID is not a backup). A backup is not a replica, duplicate or mirror server that is kept up to date with the primary server. A backup is not a slightly-delayed replication setup either. These aren’t backups because they make it impossible to recover from human errors, which include obvious things like dropping the wrong table, but also less obvious things, like a subtle bug that corrupts/damages some records and may take days or weeks to notice. Your standbys/mirrors are going to copy both obvious and non-obvious things before you have a chance to stop them.
Replicas and redundancies are for uptime, and fault-tolerance. If one hardware or network component fails, things keep running and your users stay happy. However, replicas and redundancies often do not protect against errors such as human behavior (mistakes and malice), software errors that cause corruption, and data loss.
A backup needs to be regularly verified by real-world restoration cases; backups can’t be trusted until they’re confirmed, at least on a recurring, periodic basis. Automated alarms and monitoring should be used to validate that the backup process/job ran, and that a file is present. Periodically a human should do a sanity check on the schedule, the file sizes, etc.
A backup is labeled and stored in such a way that when needed, it can be identified for what data it contains, as of what time and date. Labels like “server before last upgrade” are no good. What server? Does it include the OS or do you have to reinstall the OS and software to recover this data? Which last upgrade? Was it the last upgrade after that one over the holidays or the last one that we had to abort?… try “server01.m5hosting.com – All mount points – 02:00 2017-01-1” or something like that.
If your backups are differential, you need to have all the previous differentials and the last full/Level 0… You need whatever you need to restore your backup… and whatever you need if your most recent backup is not recoverable.
M5 Hosting offers backups that are off-site, monitored, organized, recoverable, easy to use, that allow recovery of single files, directories or whole system bare-metal restores.