You only have to turn on the TV and watch some of the footage of the destruction caused by the tsunami in Japan to realize the importance of business continuity and disaster recovery planning or think back to the September 11 attacks and remember the destruction in New York City to realize the importance of business continuity and disaster recovery planning.
The CISSP exam as well as the certification exams from the Disaster Recovery Institute International (ABCP-Associate Business Continuity Professional, CBCP-Certified Business Continuity Professional, and MBCP-Master Business Continuity Professional) all focus on the same issues, namely continuing business in the event of a disaster.
There are several definitions that you need to know for this domain:
BCP (Business Continuity Plan) – the overall organizational plan for “how-to” continue business.
COOP (Continuity of Operations Plan) – the plan for continuing to do business until the IT infrastructure can be restored.
DRP (Disaster Recovery Plan) – the plan for recovering from an IT disaster and having the IT infrastructure back in operation.
BRP (Business Resumption Plan) – the plan to move from the disaster recovery site back to your business environment or back to normal operations.
MTBF (Mean Time Between Failures) – a time determination for how long a piece of IT infrastructure will continue to work before it fails.
MTTR (Mean Time to Repair) – a time determination for how long it will take to get a piece of hardware/software repaired and back on-line.
RPO (Recovery Point Objective) – is the organization’s definition of acceptable data loss.
RTO (Recovery Time Objective) – is the organization’s definition of the acceptable amount of time an IT system can be off-line.
Let’s begin this domain by enumerating some tasks that need to be performed in order to be successful at business continuity and disaster recovery. The first thing an organization needs to do is to complete a Business Impact Analysis (BIA). That BIA will identify all of the business functions, which then need to be evaluated to determine which ones are critical to the business and which ones aren’t. The BIA also includes which IT assets are required to support the business function as well as which supporting business functions are required. So in addition to the BIA, the organization needs to have an accurate IT asset inventory to support those functions. Once those two pieces are complete, but still in the BIA process, the owner of the business function needs to define the Recovery Point Objective and the Recovery Time Objective. The RPO will help IT determine what backup strategy will be required. For example, let’s say the owner of the business function states they can afford to lose up to one day’s worth of entered data. Your choice in this case might be to have weekly full backups and daily incremental or differential backups. You will need to understand the following terms related to backups: Full, Incremental, Differential, Electronic Vaulting, Remote Journaling, Database Shadowing and High Availability. Pay particular attention to how many tapes would be required to restore a system if it crashed mid-week and you were doing Full and Daily Incremental vs. Full and Daily Differential.
Now that the owner of the business function has completed the BIA there will likely need to be some negotiation with IT. For example, the BIA has an RTO of four hours and IT knows that it takes eight hours to rebuild the server. With a little give and take on both sides — and there are always options — in this case it might make sense to change the RTO to eight hours or to purchase a second server and implement HA (High Availability) clustering.
The next thing we want to look at is the COOP. The COOP is where the owner of the business function will define how they’re going to continue to do business while IT is restoring the systems that crashed. Pay particular attention to (HINT) documented procedures for manual processes. The COOP will also include things like succession planning, contacts with external authorities, and contact lists. Remember in a disaster scenario, people act differently so when you put someone’s phone number down, don’t put the office number only, because the office is no longer there. Put an alternate phone number down and remember to put the area code and/or country code because people will dial what they see and if you have people in different area codes, they need to know the full phone number.
Just a quick sidebar on preventive measures like surge protectors, UPSs, backup generators, dual but separate power feeds, dual but separate ISP connections. OK, enough said, you get the picture. If you have a data center or just a server room, you need to consider all of those things which go into supporting the infrastructure to “PREVENT” interruptions from occurring.
Now when you talk about your Disaster Recovery Plan (DRP) you need to know the different types of recovery sites or options; namely reciprocal agreement, cold site, warm site, hot site, redundant site and mobile sites. Two things come to mind, the first is cost and the second is availability. Obviously it is more expensive to have a mirror image redundant site and it is debatable as to whether a reciprocal agreement will actually provide the facilities you need in the event of a disaster. One thing to consider, particularly in light of 9/11 and the recent tsunami, is how many businesses are using your same backup site and what happens if that backup site can’t support a major disaster? What’s your backup plan? Where’s your secondary site? And last, but not least, are your backups; tapes or whatever; protected from the same disaster. In other words are they stored a reasonable distance away from your business such that the disaster will not affect the backups.
Finally, but probably most important, is the testing of the plans, all of them, BCP, COOP, DRP, and BRP. You need to know the different types of testing, such as, checklist, structured walkthrough/tabletops, simulation, parallel processing, and full business interruption testing. And of course, to go along with all this testing don’t forget to train your recovery team members.
Now as a parting note there are a few documents and websites you need to become familiar with: