Recently we performed our annual Disaster Recovery test. We have learned something very valuable every year and tried to adjust our recovery plans accordingly, with this year being no different. Even with all the new technology, DR still seems to be a tricky undertaking.
The first year we tested our plan we found that we really, really, REALLY, don’t want to restore Active Directory onto unlike hardware. The following year we had gotten a node onto our MPLS cloud which allowed us to have a replicated AD server at the DR site. This greatly reduced the problem of restoring AD. The year after that we tested our phone system portion of the DR plan and discovered that working with the telco in a DR situation will be challenging at the very least. In the two years since that second test there have been some major changes that removed our ability to have replicated AD, so we were back to square one on that front.
This year we thought we would do a restore of our two year old VMWare environment. We had decided to keep the scope to restoring only “Tier 0” service. This included VMWare ESX , Symantec BackupExec, vSphere, and AD. Time permitting we planned to restore as many servers as possible beyond the Tier 0 that were the bare minimum.
In the last year we had made the choice to purchase Symantec’s BackupExec Agent for VMware Virtual Infrastructure (AVVI). This is a BackupExec agent that allows you to backup VMWare Guest OS files directly through the ESX server and/or SAN. The idea is that we would have our virtualized servers backed up to tape at the VMWare file level and that this would allow us to restore directly back to ESX.
So here is what we thought would happen in our test. We install ESX onto whatever hardware our DR vendor provides, (It shouldn’t matter right?!) while installing BackupExec directly onto one of the other servers that we are provided. Then we will inventory and catalog the AVVI backups from home, restore AD and vCenter, boot them up in ESX, and start rebuilding our environment from there. Easy peasy right? Not so much.
As I have come to expect, nothing goes right in a DR/DR Test scenario. We were able to get ESX and BackupExec installed without any major obstacles. Our first problem came when cataloging our tapes. This process took far longer than we had expected. At 2 hours in with no indication of when it would ultimately finish we canceled the job as the resources we really needed were already cataloged and able to restore. Not too major a problem so we’re moving on.
I feel like it would be helpful to give a little context to explain the state we were in. We now have a Windows 2003 server that is our BE2010 server, which is in a workgroup. This server also has the VMWare VIC Client and is connected to ESX. We also have the ESX server that is on the same network, but is not yet controlled by vSphere.
BE2010 has some features that allow you to discover the ESX server and pick the datastore to restore to, etc. This seemed great and we setup our restore job accordingly per our current setup. We started by trying to restore our Active Directory VM to ESX. This is where things started to go awry. We were getting a credentials error and the job would not restore. We tried a few different variants unsuccessfully with a number of failed jobs. We suspect that this is due to BE and ESX not being in a domain, and with ESX specifically not being controlled by vCenter. (Chicken or egg right?!)