Business Continuity & Disaster Recovery
Business Impact Analysis (BIA)
This involves determining the operational and financial impact of a potential disaster or disruption, including loss of sales, credibility, compliance fines, legal fees, PR management, etc.
Learn Incident Response
It also includes measuring the amount of financial/operational damage depending on the time of the year. A risk assessment should be conducted as part of the BIA to determine what kind of assets are actually at risk – including people, property, critical infrastructure, IT systems, etc.; as well as the probability and significance of possible hazards – including natural disasters, fires, mechanical problems, supply failure, cyber attacks; etc.
Mapping out your business model and determining where the interdependencies lie between the different departments and vendors within your company is also part of the BIA. The larger the organization, the more challenging it will be to develop a successful business continuity and disaster recovery plan. Sometimes organizational restructuring and business process or workflow realignment is necessary not only to create a business continuity/disaster recovery plan, but also to maximize and drive operational efficiency.
Ready.gov/business has a BIA worksheet available (PDF) (seen below) to help you document and calculate the operational and financial impact of a potential disaster by matching the timing and duration of an interruption with the loss of sales/income, as well as on a per department, service and process basis.
Analyzing your company’s most valuable data, that is data that directly leads to revenue, is key when determining what you need to backup and restore as part of your information technology (IT) disaster recovery plan.
Create an inventory of documents, databases and systems that are used on a day-to-day basis to generate revenue, and then quantify and match income with those processes as part of your recovery strategy/business impact analysis.
Aside from IT, a recovery strategy also involves personnel, equipment, facilities, a communication strategy and more in order to effectively recover and restore business operations.
Using information derived from the business impact analysis in conjunction with the recovery strategies, establish a plan framework. Documenting an IT disaster recovery plan is part of this stage.
As can be seen from the multiple steps within business continuity planning, disaster recovery is a subset within a larger overarching plan to keep a business running. It involves restoring and recovering IT infrastructure, including servers, networks, devices, data and connectivity.
A data backup plan involves choosing the right hardware and software backup procedures for your company, scheduling and implementing backups as well as checking/testing for accuracy.
Testing & Exercises
Develop a testing process to measure the efficiency and effectiveness of your plans, as well as how often to conduct tests. Part of this step involves establishing a training program and conducting training for your company/business continuity team.
Testing allows you to clearly define roles and responsibilities and improve communication within the team, as well as identify any weaknesses in the plans that require attention. This allows you to allocate resources as needed to fill the gaps and build up a stronger, more resilient plan.
As an integral part of business continuity plan development, creating an IT disaster recovery plan is essential to keep businesses running as they increasingly rely on IT infrastructure (networks, servers, systems, databases, devices, connectivity, power, etc.) to collect, process and store mission-critical data.
A disaster recovery plan is designed to restore IT operations at an alternate site after a major system disruption with long-term effects. After successfully transferring systems, the goal is to restore, recover, test affected systems and put them back in operation.
Your IT infrastructure is, in most cases, the lifeblood of your organization. When websites are down or patient data is unavailable due to hacking, natural disasters, hardware failure or human error, businesses cannot survive.
According to FEMA, a recovery strategy should be developed for each component:
- Physical environment in which data/servers are stored – data centers equipped with climate control, fire suppression systems, alarm systems, authorization and access security, etc.
- Hardware – Networks, servers, devices and peripherals.
- Connectivity – Fiber, cable, wireless, etc.
- Software applications – Email, data exchange, project management, electronic healthcare record systems, etc.
- Data and restoration
Identify the critical software applications and data, as well as the hardware required to run them. Additionally, determining your company’s custom recovery point and time objectives can prepare you for recovery success by creating guidelines around when data must be recovered.
Recovery Point and Time Objectives
Recovery Point Objective (RPO)
A recovery point objective (RPO) specifies a point in time that data must be recovered and backed up in order for business operations to resume. The RPO determines the minimum frequency at which interval backups need to occur, from every hour to every 5 minutes.
Recovery Time Objective (RTO)
The recovery time objective (RTO) refers to the maximum length of time a system (or computer, network or application) can be down after a failure or disaster before the company is negatively impacted by the downtime. Determining the amount of lost revenue per amount of lost time can help determine which applications and systems are critical to business sustainability.
For example, if your email server was down for only an hour, yet a large portion of your database was wiped out and you lost 12 hours’ worth of email, how would that impact your business?
Designing for Recovery
High Availability Infrastructure
Strategic data center design involving high availability and redundancy can help support larger companies that rely on mission-critical (high-impact) applications. High availability is a design approach that takes into account the sum of all the parts including the application, all the hardware it is running on, power infrastructure, and the networking behind the hardware.
Using high availability architecture can reduce the risks of lost revenue and customers in the event of Internet connectivity or power loss – with high availability, you can perform maintenance without downtime and the failure of a single firewall, switch, or PDU will not affect your availability. With this type of IT design, you can achieve 99.999%, meaning you have less than 5.26 minutes of downtime per year.
High availability power means the primary power circuit should be provided by the primary UPS (Uninterruptible Power Supply) and be backed up by the primary generator. A secondary circuit should be provided by the secondary UPS, which is backed up by the secondary generator. This redundant design ensures that a UPS or generator failure will never interrupt power in your environment.
For a high availability data center, you should seek not only a primary and secondary power feed, but also a primary and secondary Internet uplink if purchasing Internet from them. Additionally ensure any available hardware, firewalls or switches include redundant hardware.
If using managed services and purchasing a server from a data center, ensure all of the hardware is configured for high availability, including dual power supplies and dual NIC (network interface controller) cards. Ensure their server is also wired back to different switches, and the switches are dual homed to different access layer routing so there is no single point of failure anywhere in the environment.
Offsite backup and disaster recovery are still important; as high availability cannot help you recover from a natural disaster such as a flood or hurricane. Additionally, disaster recovery comes after high availability has completely failed and you must recover to a different geographical location.
Redundancy is another factor to consider when it comes to disaster recovery data center design. With a fully redundant data center design, automatic failover can ensure server uptime in the event that one provider experiences any connectivity issues.
This includes multiple Internet Service Providers (ISPs) and fully redundant Cisco networks with automatic failover. Pooled UPS (Uninterruptible Power Supply), battery and generators can ensure a backup source of power in the event one provider fails. View an example of Online Tech’s redundant network and data centers below:
Cold Site Disaster Recovery
A cold site is little more than an appropriately configured space in a building. Everything required to restore service to your users must be retrieved and delivered to the site before the process of recovery can begin. As you can imagine, the delay going from a cold backup site to full operation can be substantial.
Warm Site Disaster Recovery
A warm site is leasing space from a data center provider or disaster recovery provider that already has the power, cooling and network installed. It is also already stocked with hardware similar to that found in your data center, or primary site. To restore service, the last backups from an offsite storage facility are required.
Hot Site Disaster Recovery
A hot site is the most expensive yet fastest way to get your servers back online in the event of an interruption. Hardware and operating systems are kept in sync and in place at a data center provider's facility in order to quickly restore operations. Real time synchronization between the two sites may be used to completely mirror the data environment of the original site using wide area network links and specialized software. Following a disruption to the original site, the hot site exists so that the organization can relocate with minimal losses to normal operations. Ideally, a hot site will be up and running within a matter of hours or even less.
When you partner with a data center/disaster recovery provider, you're sharing the cost of the infrastructure, so it's not as expensive if you were to have an entirely secondary data center.
Learn Incident Response
For more in-depth discussion of business continuity & disaster recovery, download the complete version of Online Tech's Disaster Recovery WhitePaper.