Aligning Disaster Recovery and Cybersecurity

Introduction

Planning for disaster situations—understanding the various scenarios that could impact a business and implementing continuity and recovery plans—has long been a cornerstone of business risk management. In today’s environment, business continuity and disaster recovery plans should be updated to reflect modern threats to business operations, such as cyber attacks.

Disaster Recovery and Business Continuity

The terms disaster recovery and business continuity are commonly heard in the disaster planning field. While they may seem similar, these are two separate functions that inform and influence each other.

Disaster recovery is the process of recovering IT components, systems and data in a disaster situation. Disaster recovery is a technical process owned by IT.

Business continuity is the processes and procedures that allow the business to continue operating during and after a disaster situation. Business continuity is owned by the entire organisation, or sometimes by each department.

As an example, let’s take a hypothetical business that recently suffered a critical failure of the server that hosts their on-premise CRM solution. The business continuity plan, which is owned by the sales team, will determine how they continue working while the system is unavailable; for example, they may take notes with pen and paper, which they will manually input into the CRM once it’s back online. Based on the impact to the business of the CRM system being unavailable, the business continuity plan will also identify the criticality of the system. In this example, due to factors such as lost revenue and productivity, it is likely the CRM system would be classified as a highly critical system.

The disaster recovery plan, owned by the IT team, will dictate the steps that IT takes to return it to operation, such as procuring new hardware from a vendor and restoring the CRM data from backups. The disaster recovery plan will likely prioritise the recovery of the CRM system above other systems because of its highly critical classification. However, the business continuity plan and disaster recovery plan need to inform and influence each other. In this example, the time IT estimate it will take them to return the CRM to operation in their disaster recovery plan may affect the sales team’s actions in their business continuity plan. If procuring new hardware will take days, then Sales may implement a more robust system of recording notes.

Similarly, the system criticality in the business continuity plan may influence the disaster recovery plan. Given the CRM system is business critical, it may be more appropriate to keep spare hardware in storage, ready to be used, and pay for a higher level of vendor support. It may even be relevant to revisit the system design and build more resiliency into the CRM system to reduce the likelihood of outages to such a critical system.

Aligning Disaster Recovery with Cybersecurity

In past decades, disaster recovery and business continuity plans covered scenarios such as an aeroplane crashing into a data centre or an office being unavailable due to a power outage. These scenarios are often less relevant with the cloud and remote work. Modern disaster recovery and business continuity plans need to incorporate today’s most significant threats to business operations, including malware, ransomware, and other cyber attacks.

While some cyber threats can be challenging to plan a response to, as determined attackers will vary their attack techniques, many can be treated similarly to other disaster situations. Some cybersecurity disaster scenarios that should be included in disaster recovery plans are:

• A ransomware infection is moving through user and infrastructure systems

• An intentional or accidental data leak has occurred

• A social engineering attack and fraudulent invoice have resulted in a payment being sent to the incorrect account

In traditional IT disaster situations, recovery efforts can begin as soon as the issue is reported. With many cybersecurity disasters, however, containment and eradication must take priority.

Roles and responsibilities also become particularly important when dealing with cybersecurity disaster situations. IT and security operations are separate functions, and it’s essential to understand which tasks fall to which teams and which can be done in parallel. Where security and IT operations are combined, management must provide clear guidance on which tasks to focus on as the team moves through containment, eradication, and recovery—particularly if eradication and recovery are being addressed concurrently.

Planning for the Worst

When planning for a disaster, several areas must be considered:

System Criticality

First, start by ranking your organisation’s systems based on importance to the business. Systems will need to be grouped where they integrate or work together to achieve a specific outcome. To help with this conversation, consider the impact of a system being unavailable for an hour, a day, a week and a month.

It also helps to consider the importance of the data held within the system, i.e., what would the impact be if a system was brought back online, but some of the recent data was lost? Again, consider losing an hour, a day, a week and a month’s worth of data.

Once you know the criticality of each system, you can determine an order of recovery.

Disaster Scenarios

Next, brainstorm a list of possible disaster scenarios that may impact the business. These should cover a variety of areas, such as a natural disaster resulting in the loss of a data centre, network and link failures within the corporate network, widespread national internet failures resulting in the loss of external connectivity, loss of specific sites, cybersecurity issues and SaaS platform and critical vendor failures.

While you should plan for as many specific disaster situations as possible, you can use a standard risk-based impact and probability matrix to prioritise them and determine which to prepare for first and in the most detail.

Once you’ve listed the disaster scenarios, you can develop technical recovery plans for each system and then adjust them for each disaster scenario. You can also use these plans to estimate recovery time and data loss; these metrics can be used by the business to begin planning how it would continue operating without each system for the relevant period of time (business continuity planning). If there are significant mismatches between recovery time/data loss and system criticality, you should log this on a risk register, and those responsible for IT architecture should be informed so costing estimates can be drawn up for system improvements or replacement to reduce the risk.

For any systems that your organisation does not directly manage, for example, those provided by an external service provider or SaaS vendor, you should examine the provider’s contract for expected recovery time and disaster recovery obligations and exclusions.

The communication plan is the final element of disaster recovery and business continuity plans. Communications should be sent from a central management function throughout the disaster scenario to ensure stakeholders are informed without detracting from the recovery efforts.

Recovery Plans

Recovery plans should be based on systems or components, as each disaster scenario will impact a different group of systems. Each recovery plan for each system should be independent and standalone. When a disaster occurs, the proper collection of recovery plans can be selected based on the specific impact of the disaster.

Testing the Disaster Recovery and Business Continuity Plans

There are three main ways to test a disaster recovery plan. Each has its own benefits, level of risk and cost to the business: paper-based testing, component testing, and scenario simulations.

A paper-based test is most useful for businesses new to disaster recovery or new systems that have not been tested previously. As this type of test is run entirely on paper, no systems are affected, and business operations are not impacted or put at risk.

In a paper-based disaster recovery test, recovery team participants gather and present their actions to recover a particular system or respond to a specific disaster situation. The primary purpose is to ensure roles and responsibilities are understood and all relevant documentation is available and up to date.

These tests are an excellent way to begin disaster recovery testing with minimal risk to the business. They are also valuable tools to reduce risk ahead of more extensive, more intrusive disaster recovery testing methods.

Component disaster recovery tests are mini– disaster recovery tests where the recovery plan for a single system or component is tested. These are often used where recovery involves avoiding the issues rather than fixing them, such as systems with built-in failover and cold standbys. Component disaster recovery tests help validate the recovery plans for a single system in a controlled manner.

The final type of disaster recovery test is the scenario simulation. A disaster scenario from the disaster recovery plan is chosen, and a method to simulate it is determined. This might include removing network routes to isolate different network segments or powering off infrastructure to simulate ransomware.

Scenario simulations are the most valuable type of test. They ensure that the proposed steps are effective in system recovery; they identify unexpected blockers and validate recovery time and data estimates. Unfortunately, scenario tests also represent the most risk to the business, as multiple systems are usually taken offline. While extensive planning can reduce this risk, it cannot be eliminated entirely. For this reason, scenario simulations are generally planned for weekends, often making them quite expensive to conduct.

Each type of disaster recovery test has its own benefits and risks to be considered. When disaster recovery is a new process for a business, the risk of failure during a scenario simulation is higher, and a paper-based test may represent the best risk/reward trade-off. Similarly, businesses that have completed paper-based and component tests will likely benefit most from simulating a disaster scenario and validating their recovery plans and time estimates.

Business continuity can be tested during or separately from technical disaster recovery. In the same way disaster recovery can be tested in a paper-based way, so can business continuity without the involvement of IT. A simple meeting room based walkthrough of the roles and responsibilities, as well as the various steps and tasks in the business continuity plans, can be extremely valuable.

Additionally, during component or scenario simulation disaster recovery test business stakeholders can and should be involved to test that recovered systems meet their requirements, and fully enable their business continuity plans. This may be in the form of unit and system test or may involve running the business on backup or failover systems for a period to ensure their viability.

The most important thing to remember is that any disaster recovery test aims to improve processes and response in the event of a real disaster. To help achieve that, a culture of openness around mistakes, omissions or missing documents/processes should be fostered. The more information gathered about any failures during testing; the more improvements can be made ready for the actual event.

Conclusion

Risk mitigation through business continuity and disaster recovery planning is critical for all businesses. The length to which disasters are planned and simulated may vary based on the business size. Still, even small businesses should have a basic plan to respond to various possible disaster situations. In today’s world, outages and data breaches caused by cybercriminals and other cybersecurity issues must be included in an effective disaster recovery or business continuity plan.

Previous
Previous

A Whitepaper for Australian SMEs

Next
Next

The Value of User Training in Modern Cybersecurity