IT Continuity Planning
Today most organizations have committed resources, developed policies, procedures, and tools, and set their organization and IT infrastructure to maintain their critical business process (Business Continuity Plan) and recover to their normal activities (Disaster Recovery Plan) as quickly as possible during unforeseen circumstances and major outages.
Having a plan for these situations is not straightforward; the planning tasks are challenging and require several expertise and efforts.
In summary, the following details should be included in the IT continuity plan:
IT and business core process list
GAP analysis exercise outcome which includes the Recovery Time Objective and Recovery Point Objective for each process and component
Roles and responsibilities during contingencies and recovery
IT continuity procedures
IT recovery procedures
Invocation procedures (call tree)
Contact details (staff, vendors, stakeholders, rescue services, hospital, etc.)
The IT continuity plan includes four stages:
Initial response includes the first following processes: Notification and plan activation,
Relocation mainly covers staff relocation schedules, logistics, and transportation to the alternate site, activation of the alternate site (IT equipments, telecoms, servers, etc.)
Recovery includes the damage assessment of primary facilities, initiation and completion of recovery tasks
Restoration requires verifying and confirming primary facilities and infrastructure readiness, staff relocation schedules from the alternate site to the primary site, restoring business files, consolidating and archiving incident documentation, returning to business as usual.
In practice, how to build your plan (dos and don’ts)
You need to have a valid business case. Management commitment is probably the first and most important requirement to succeed and have a sustainable IT continuity plan.
Today most organizations have developed business continuity planning and set their IT infrastructure, process, and business model to reduce the impact of natural disasters and outages they might face, but how many have an annual program testing of their plan to identify all areas where improvements are needed?
Companies need to conduct a gap analysis exercise to assess their plan with the standards and best practices in order to identify their weaknesses and develop a roadmap to include all missing elements and take the right steps to implement strategies, so they do not need to start from scratch and do not try to cover all Business Continuity Plan aspects at the same time.
Know your business! The IT continuity plan is a piece of the Business Continuity Plan, hence it needs to be aligned with business strategies and objectives. Wrong or incomplete solutions can waste time and money.
Perform a regular company risk assessment review exercise to ensure all risks are covered and set the plan accordingly. Get more flexibility by outsourcing some IT functions such as the help desk; the company will be less reliant on people in case of contingency, where tasks will be handled through SLA and covered by external vendors. This will help the company to focus on their core business process.
As people are a key element in IT continuity plan, creating a plan that depends on too few qualified people can threaten the overall plan. What if one of those people is unavailable for some reason? You need to identify a pool of employees who are capable of responding in an emergency, and initiate a set of best practices: job rotation, staff mobility in the job contract, a succession plan, and training, to ensure that people are ready to run the plan regardless of their positions or experience in the company.
The IT continuity plan requires a budget that should be included in the annual exercise and company plan. The key point here is to have a proactive approach so management will be aware of the fact that the organization might have to finance the IT continuity plan so appropriate action can be taken.
The BCP should not be an afterthought when preparing the budget. It has to be included in the company plan and discussed. As with the IT continuity plans, management must be aware that the BCP might have to be financed by the organization. External funds may be required.
New trends in technology such as virtualization, mobile devices, cloud computing, and social media need to be assessed.
Many new technologies introduce complexity, so maintaining the IT environment may require skills and resources. Reduce complexity and keep it simple for operational staff to run and eliminate potential sources of human errors.
To reduce costs of having to buy, rent, and maintain alternate facilities, a disaster recovery site, datacenters, etc., organizations should look for mutual agreements with other companies to share IT infrastructure and office desks in contingency situations.
Organizations should also consider leasing or procuring new IT infrastructure (including data communications) and arranging with suppliers to have them carry a contingency stock of IT equipment, software, etc., to be available at short notice.
In contingency situations, phone communication and the primary carrier might be down. Then you will have to plan for multiple communication options and make sure everyone knows the options and has the appropriate phone numbers, web addresses, and emergency contacts to get and stay in touch.
Password protection is a key goal of data security, IDs and password need to be stored in two geographically separate and secure locations and more than one IT staff person should have access to all passwords and codes.
Every major application enhancement, technology infrastructure change, or new service offering should have its own BIA (Business Impact Analysis) and risk management reviewed for applicability, along with its RTO (Recovery Time Objective) and RPO (Recovery Point Objective) to ensure that change management is embedded during the Business Continuity Plan lifecycle.
The Business Continuity Plan is an ongoing process which will not stop after testing. It has to be maintained and updated as required
Tests will familiarize staff and IT teams with the continuity and recovery process. They will verify the effectiveness of the selected strategies and the readiness of the recovery site, and will identify improvements required to the process and infrastructure.
The recovery tests should be conducted at service level, and should avoid focusing on components such as hardware, systems, and applications. A particular service may require different servers, data on several local drives, or user network connectivity.
Organizations are urged to assign individuals and teams to lead, drive, and run the IT continuity plan. Authority should be given to a crisis management team group to make the process effective and sustainable.
Auditing plans and procedures will enable an impartial third-party review of regulations, laws, standards, and best practices and provide recommendations.
Finally, the business’s perception of risk must be changed.
It’s no surprise that risk management and continuity planning often end up siloed into separate functional areas. Changing the perception and culture has to begin at the top level with a top-down approach to the following tasks: putting the organization in place; instituting reporting at the top level to avoid any conflict of interest; including continuity management on the board meeting agenda; ensuring that a continuity section is included in every corporate document; initiating policies and procedures to promote and develop internal control and compliance functions; conducting regular risk assessment to determine changes in the organization’s risk profile and assess performance; and proceeding with regular audits. “The boss knows best” philosophy must be avoided. Top management must listen to and accept others’ thoughts and ideas.
People must be educated through training and awareness programs, brainstorming sessions, and workshops. Use metrics and KPI to assess performance and ensure compliance.
The challenge is to create a situation where people will instinctively look for risk and consider its impact prior to making decision
When you think about processes, setting up new systems, hiring new employees, contracting with vendors, and opening new accounts for customers, you need to think RISK
IT continuity planning trends
Virtualization will make the plan easier by reducing the number of IT assets which need to be maintained, supported, and reviewed. We will have fewer devices to worry about, and the RTO can be reduced by switching quickly to virtual machines from live environment to backup.
Desktop virtualization can enable people and company staff to work off-site, at home through Citrix and DVI, which allow flexibility for the organization to recover quickly and get people on board without having to invest in alternate sites areas, reducing the cost of maintaining a wide alternate site for their employees. This needs to be secure through appropriate tunneling with data leakage protection installed on the machine.
The deployment of virtual machines over the internet can be an alternative to allowing staff access through their personal home computers, making them more productive by using the environment they are familiar with during outages.
As applications (SaaS), platforms (PaaS), or infrastructures are delivered from the cloud, an organization can mitigate and drastically reduce the risk of major or minor disruptions. The drawback for IT is the additional responsibilities involved in managing third parties through an efficient problem management process and services level agreement to ensure that third-party suppliers have resources in place, failover systems, people and processes to maintain the same level of services and guarantee data availability regardless of disruption and outages faced at supplier level.
This exercise can become more complicated in the future. As more and more companies outsource services to the cloud, the process will have to include several suppliers and services for maintaining the plan and proceeding with required testing and audit reviews.
Getting more mobile devices in the workplace will definitely improve business continuity strategies. It has become easier to communicate during disaster through computer tablets, smartphones, and Blackberries, which gives more flexibility for workforce recovery options by accessing the corporate applications, communicating with coworkers, customers, and vendors from multiple remote locations. More software designed for mobile devices enables users to access information needed during crisis situation, such as status of recovery, recovery site location, list of applications and services available, and, finally, emergency updates.
An article published by Forrester in July 2011, “It’s Time to Include Social Technology in Your Crisis Communication Strategy,” stressed the fact that subscribing to automated communication services is now common and widely used by many professionals. The proliferation of mobile devices and easy Internet access enable the use of social technologies such as Twitter, Facebook, and Skype as elements of business continuity and recovery strategies.
Organizations should leverage and assess technologies to make their response plan effective. They need to look at which platform is actually used by employees, customers, and vendors. These channels can be used for both communicating and getting information and help from external resources to improve the business continuity and recovery process. The drawback is more uncontrolled spreading of information outside, which can damage the organization’s reputation and make the crisis communication process more complicated.