What’s your current backup solution? What would happen if your datacenter burned down this weekend? What would happen if a junior administrator accidentally deletes a production SQL database or server?
I’ve been asked to do a webinar describing the features that Microsoft Azure provides for backup and recovery, so I started thinking about the state of disaster recovery that I’ve observed. A common problem with virtually every kind of customer I’ve experienced is the challenge of preparing for a data loss incident – whether it’s a natural disaster, freak accident involving a manager (a la BOFH), accidental deletions, or, recently, the spectre of ransomware. Many clients are simply not backing up their data effectively.
tl;dr: Microsoft Azure provides several excellent disaster recovery solutions enabling both application recovery and data protection – Azure Site Recovery and Azure Backup reduce a lot of challenges of maintaining a reliable backup solution.
A huge part of this problem is simply the complexity of providing backup. This usually shows up pretty early on in the process of disaster recovery planning – often as a major blocker to implementing a real solution. It can be hard to estimate the cost of configuring backup for a wide variety of applications, as they all have different requirements. BCDR solutions need to cover everything from physical servers to virtual servers, Windows, various flavours of Linux, BSD, and others, applications with multiple tiers that need to be quiesced so they can be restored in a good state. There’s lots of excellent backup solutions available for most of the workloads you’ll encounter, but they require some combination of second datacentres and tapes, which may be cost-prohibitive for small and medium-sized businesses that don’t have large IT budgets, and adds to the complexity.
Adding to the challenge is the size of the data to back up, and planning for growth. Even if your company’s business isn’t growing rapidly, the data storage requirements probably are – whether it’s 20%, 40%, 60%, or more per year, it’s a challenge to deal with and to pinpoint. The second- and third-tier workloads in your environment are often left without sufficient backup, and the costs of dealing with data loss only become apparent when it’s time to restore. Often, this could be the network file share that most companies still live with – and can’t live without, or perhaps it’s the individual PCs that are difficult to back up because their network presence isn’t regular (think about what executives keep on their laptops). Ransomware can impact these workloads and lead to significant financial impact. Sometimes, regulation requires organizations to retain data for extended periods of time – particularly in healthcare – and the point-in-time backups of a very large data repository can become extremely expensive to maintain. Because of the capital expense involved with backing up these workloads regularly due to the large data consumption, they’re often given low RPOs.
Beyond the simple cost of maintaining a good backup strategy, there’s a often a significant amount of manual media management that needs to occur. Whether you’re shipping tapes to a storage facility or moving hard drives between locations, keeping track of your media is an expensive and time consuming process. It’s vital that chain-of-custody be maintained to protect your confidential data, and this can add to the cost. There’s numerous stories around the internet about administrators who swap out tapes and ship them offsite on a regular basis, only to discover that the tapes have never been swapped out of the tape drive and the off-site ‘backups’ are simply blank tapes.
That brings me to the third point, which is the ability to test your disaster recovery situation. Simply – if you haven’t fully-tested your restoration process from every point of your backup (local, off-site, cold storage, hot site, etc), the untested portion of your DR plan may as well not exist. If you don’t test this regularly, it may as well not exist – it’s only when you test it, live or as a drill, that the value of your data protection process is demonstrated. Because of the potential impact to production workloads (accidentally overwriting live data with last night’s backup does happen), the cost of secondary hardware, and the time investment required (BCDR drills often happen over weekends), it’s challenging to perform these backups.
These problems are well known in the industry – so what does Azure change?
Well, in addressing the first problem I highlighted, complexity, the solution is automation. In general, the Windows IT Pro should be relying on automation more and more, especially with the advent of PowerShell. Of course, the *nix administrators are chuckling, as they have been writing effective scripts for many processes for many years, but the reality is that there’s usually some manual process that is challenging to work through.
Azure Backup adds automation for your off-site replication. Once a backup policy is configured, you can monitor the ongoing replication and ensure it is occurring using the Azure Backup Solution for OMS.
However, for the truly automated solution, Azure Site Recovery allows your applications to be replicated into Azure or a second site, and performs fully tier-aware orchestrated recovery of the solution. Azure Site Recovery is also a testable solution. At any time, you can execute a test failover to your recovery environment (whether it’s a second site or Azure itself) and validate your DR configuration, and, especially, demonstrate your RTO and RPO in a disaster situation. I’m going to be writing another blog post about this (sometime this year, I swear!).
For the size and scale of data, and the increase in VM count, the obvious solution is the hyper-scale afforded by the Azure cloud. You can simply scale your backup solution to meet your needs, without massive capital investment. You can backup all your production workloads, without making the up-front investment in a large storage solution. Media management ceases to be an issue – though, Microsoft’s Data Protection Manager product allows you to continue to maintain tape backups if it is required by policy or compliance, while having the confidence that your data is available and secure in Azure.
Why would you select the native options provided by Azure, rather than your existing backup vendor? Ultimately, that’s a question that each company needs to answer for themselves. If your existing solution works, if it’s testable, secure, reliable, and you’re comfortable with it, chances are you shouldn’t fix what’s not broken. However, the cost of running a backup appliance in Azure can be significant, and it simply doesn’t scale the way Azure Backup can, as this image illustrates. Costs are dependent on many factors, so it’s very much an estimate.
I hope to put together some follow-up blog posts to further explore business continuity and disaster recovery features that Microsoft Azure enables. I hope that this post is helpful and informative if you are exploring how the public cloud can provide an effective backup solution.