When creating a recovery plan, organizations consider the what-if scenarios. Not only in the IT field; actually, all aspects of business operations should be included.
If the company is located in a region that is a potential flood zone, there must be a recovery plan that addresses that as well. Bad weather may need extra attention. What does that mean for the organization and how do they deal with it?
If, for example, a lot of work is still done on-site, the next question is: where must employees absolutely have access? Are the necessary IT resources arranged for that? How do you ensure that employees can continue to work from other locations in case of emergency?
Many companies probably think: we have learned our lessons after the floods in Limburg and the lockdowns during the pandemic. However, our research shows that there are still significant gaps in the current recovery plans.
What is important in an IT recovery plan?
An illustrative example is that of the hospital. Should they include in their recovery plan that TVs in patient rooms always remain operational? Probably not: that a patient cannot watch TV is inconvenient, but not life-threatening.
What hospitals should focus on are the business-critical components, such as operating rooms, intensive care, and emergency services. This means that, for example, an emergency power supply must be included in the plan. But also that business-critical systems are implemented redundantly. In situations where, for example, a quarantine plays a role, this can even include an extraction system.
The hospital cannot go dark if someone accidentally hits a cable during excavation work.
This principle applies to every company. Organizations must continually determine what is truly necessary to continue to exist. This starts with recognizing the business-critical components, or the components to which the organization owes its existence. An example: for trading banks, these are the IT resources used for trading. A failure of these leads to significant payment problems and loss of trust within a very short time, often within hours, resulting in reputational damage.

The role of the cloud
Cloud providers can also experience outages and fail (partially). This has direct consequences for organizations using this cloud: they are then also unreachable.
Fully switching to the cloud does not change the issue of redundancy. If everything is with one provider and that provider fails, a problem arises. If the organization communicates with that cloud provider via a single connection and that path fails, the same applies.
Organizations must therefore ensure that they can access a service or services in multiple ways. This should also be included in the plan.
Compare it to the physical location of the company: if there is one bridge and it collapses, a problem arises. With two bridges, the location remains accessible. The same applies to IT resources. If organizations have two doors, they can always enter, provided they have the keys.
Key management
The management of those keys should also be included in the plan. If someone loses their house key and is locked out, they are faced with a closed door and need a crowbar to get in. The same applies in IT.
If an organization needs a certificate to access services and that certificate expires and cannot be reapplied for, a problem arises. Then the organization no longer has the key to the service. There must always be a fallback scenario for that.
Especially in the cloud, this is crucial. If cloud services are well secured, they are encrypted. In the event of loss or destruction of the key, only a lot of ones and zeros remain, but no data and no information.
How long is the recovery time?
If things go wrong, organizations need to know how long it takes to recover. How long does it take to prepare the recovery actions? How much time does the actual execution take? Which people are needed? What mandate? Which suppliers?
In preventing a crisis, these kinds of matters must be documented and agreed upon in advance. This should therefore be included in the plan.
The order in which the recovery is executed is also important. First, the network must be operational again. Without this layer, you cannot transport data. After that, the application and data layer are next.
Why is this order important? In a different order, problems can arise with the timestamps. If they are no longer synchronized, the applications no longer communicate with each other, leading to data loss.
Without a plan, no recovery
A good IT recovery plan must also include a good process description. Who is allowed to take which steps? Who communicates with whom? If necessary, typed work instructions are created that are dummy-proof. Then someone can always restore processes, even if the regular IT administrator cannot.
