Organizations must be prepared to stay operational via a comprehensive IT disaster recovery (DR) plan that meets today’s increasing threats. Everything from natural disasters, supply chain interruptions, and ransomware cyberattacks has continued to be on the rise. Although previous posts have covered IT disaster recovery, it’s important to place it in the context of the importance of IT dependency mapping.
The United Nations’ latest Global Assessment Report (GAR2022) predicts that by 2030, there will be 560 disaster events every year; that’s 1.5 disasters every day. Meanwhile, the U.S., Israel, the Middle East, and North Africa will see an increase in hacker exploits against businesses, according to the CrowdStrike’s 2022 Global Threat Report.
Every organization’s attack points, ranging from applications, workloads, and databases to devices and networks, continue to expand. This makes disaster recovery even more challenging as time goes on. Everything is at stake when even the temporary loss of mission-critical resources can lead to major monetary and productivity losses. IBM has reported that the average cost of a data breach is now $4.24 million, and the average time to identify and contain a breach is 287 days.
Although a ransomware disaster recovery plan can provide both reactive and proactive defense against potential fallout, organizations must clearly understand how to create a plan that fits their business. This starts with a detailed understanding of disaster recovery planning for all aspects of the organization in every applicable scenario.
What is an IT Disaster Recovery Plan?
While DR enables a business to recover following a disaster, the broader plan can go beyond backend infrastructure, devices, networks, applications, workloads, and databases. In the broader picture, disaster recovery has historically required businesses to deal with the physical displacement of their workforce in the event of a natural disaster; this encompasses all equipment, communications, and a physical location from which to work.
In the post-pandemic hybrid work era, most businesses have decreased the impact of disasters on their ability to reach a central office given the predominance of work from home. This may change in the hybrid work era with significant numbers of companies returning employees to the office. Regardless, the recovery of IT infrastructure, such as networks, applications, workloads, databases and compute power, in the event of a disaster is at the heart of DR.
DR is a subset of every organization’s business continuity plan (BCP), which encompasses more than a disaster recovery plan by providing approaches for the entire business. BCP focuses on the maintenance of business operations in the event of a disaster, while DR focuses on keeping IT systems and access functioning.
Challenges of DR and DR Planning
There are increasing challenges to the creation of a comprehensive IT disaster recovery plan in the digital age, as systems and the way people work are constantly changing. One major example is the multiple IT architectures that often exist in an enterprise with everything from bare metal and physical servers to VMs and cloud systems with a combination of SaaS, IaaS, and PaaS.
The Need for Different Recovery Approaches
The hybrid world of on-premises data centers and cloud environments increases those challenges, with data centers using different operating systems and hypervisors that may require different recovery approaches. Applications and workloads also often have varied DR needs. Every organization will have mission-critical systems and data they must restore in minutes alongside non-critical assets and data they can restore in hours.
Complex Backup and Recovery Management
Managing backups and recovery can be complex since requirements can vary across architectures, applications, and environments. This can lead to the use of costly and complex backup and recovery tools to meet different recovery time objectives (RTOs) and recovery point objectives (RPOs) that call for multiple forms of backup and environments.
Constant Data, Application, and Workload Changes
IT infrastructure, applications, workloads, devices, databases, and other aspects of an IT organization are constantly changing. This makes it difficult for a DR plan to keep up when it’s tied to specific infrastructure where system recovery does not constantly stay updated to reflect those changes. This also makes it challenging to test the effectiveness of a DR plan in the event of an actual disaster when the underlying systems, applications, databases, and devices are always changing.
Applications and Workloads Across Different Clouds
As more organizations work in a multicloud world, IT teams have difficulty determining the shared responsibility rules of different cloud providers and the SaaS applications controlled by a third-party provider where guaranteed levels of recovery may vary.
Cybercriminals are targeting the remote and hybrid work world with more sophisticated attacks as regulatory compliance is increasing the requirements of audits and tests to ensure data privacy. These factors require organizations to perform extensive IT audits as part of the foundation for an IT disaster recovery plan that is resilient and agile enough to meet tomorrow’s changes.
Creating a Disaster Recovery Plan and Best Practices
A DR plan can be a complex document tied to people, processes, technology, and procedures. Creating one starts with considering several factors that help shape the recovery strategy, and that starts with identifying the personnel responsible for developing and executing it across business and IT departments.
IT Infrastructure and Application Mapping
The next step is to map IT Infrastructure, applications, and asset dependencies. This requires a comprehensive mapping tool to determine all the hardware, software, devices, systems (owned, leased, or use of a service), and databases. Organizations must identify every asset’s physical and virtual location, versions, and dependencies.
Risk Assessments Based on Critical Business Processes
A risk assessment will then help discover likely threats to the organization, as well as the most likely service disruptions to business processes and the critical systems underpinning them. This, in turn, supports the next step—determining the impact of how an organization uses its assets and defining their impact on normal business operations as high, medium, or low.
Establishment of a Recovery Process
The next phase establishes the recovery process, setup, and tooling for all assets and the goals of disaster recovery. The business impact analysis and asset ranking will guide the determination of:
- Recovery time objective (RTO): How fast an organization must return an application, system, or process to normal operations after a point of failure or loss.
- Recovery point objective (RPO): The amount of data loss an organization can experience before incurring any real damage.
Disaster Recovery Setup and Tools
The organization will then need to determine backup procedures and tools, which will include:
- A backup strategy to determine backup sites and their designation of:
- Hot (mission-critical assets for immediate recovery)
- Medium (critical but more time allowed for needed recovery)
- Cold (low-priority data storage with only long-term rather than immediate need)
The backup strategy should be based on the 3-2-1-1-0 gold standard. This requires maintaining three copies of business data on two different types of storage, with one copy stored offsite (preferably in the cloud), and one copy stored offline. The strategy must ensure that there are zero errors in all the data.
With cloud as the preferred method of DR today, organizations can use backup as a service (BaaS) and/or disaster recovery as a service (DRaaS) to simplify the management and complexity of backup and recovery. While both BaaS and DRaaS back up to a cloud environment, BaaS only backs up data while DRaaS backs up data and infrastructure with replication (continual copying of data changes) to ensure the latest recovery or failover to the recovery environment.
The budget for an IT disaster recovery plan and strategy depends on the unique needs and approach of each organization. This starts with understanding the organization’s compliance requirements based on its sector. IT then calculates the organization’s cost of downtime per hour based on revenue and productivity loss.
RPO and RTO have a major impact on revenue and productivity loss while also affecting the costs of the backup, such as BaaS or DRaaS. Third-party DRaaS providers have SLAs for uptime and recovery; cloud providers also have uptime SLAs (based on a shared responsibility model). These can determine the percentage of costs the business will incur from downtime.
The Uptime Institute reported that 2021 saw over 60% of outages costing businesses at least $100,000. These statistics provide an indication of what’s at stake when it comes to comprehensive DR planning supported by IT infrastructure and application mapping.
Backup Testing and Restoration
A DR plan and strategy are only as good as the testing and restoration that ensure it works in the way intended. Both restoration and testing require that the workforce and DR team have logical processes in place. Many organizations will define restoration processes and procedures by the SLAs set up with a BaaS or DRaaS provider if they exist. Others will need to determine restoration step by step if the organization is managing all aspects of the DR recovery.
Testing the plan requires regular drills to ensure that the people, processes, and technology are working as expected. It’s best for most organizations to test every quarter and larger enterprises to test monthly to confirm the plan reflects current needs, IT infrastructure, applications, and workloads.
Every organization can establish a solid disaster recovery foundation when they follow a universal set of best practices for DR planning. This becomes a way to reevaluate the plan, strategy, and testing to ensure that they constantly meet the needs of the organization. A good list of best practices includes:
- Continuously iterate the process to keep up with business and IT changes by performing regular IT infrastructure and application mapping.
- Maintain a readily accessible disaster recovery playbook so everyone knows their roles and what procedures will take place based on specific triggers. A DR plan must also be easily modifiable since it is subject to change with every iteration.
- Create a plan for backup work locations with secure system-access procedures for a hybrid workforce.
- Schedule DR plan testing each quarter at a minimum, with large enterprises performing monthly testing. The IT infrastructure and application mapping should always precede the test to have an updated view of all assets and dependencies.
- Develop comprehensive test reports that provide insights into the success and/or errors of each test as well as the test procedures.
- Institute regular employee training and drills for the organization to be certain everyone knows the processes and procedures in the event of any disaster.
- Integrate and/or coordinate DR planning with existing security and data protection solutions and processes.
Business continuity and disaster recovery (BCDR) should be part of a holistic approach to making sure every organization can remain operational regardless of the form of downtime or disaster. By implementing BCDR planning, organizations can align business functions to IT dependencies to reduce risk and ensure resilience, uptime, and profitability.
IT Infrastructure, devices, systems, applications, workloads, databases, and all dependencies are at the heart of business productivity, second only to an organization’s people. Having a comprehensive and constantly updated map of these assets is critical to developing a resilient and agile disaster recovery plan and strategy.
Every successful DR plan relies on visibility into the IT infrastructure, application, and services environment across on-premises, the cloud, and beyond the network edge. Having a comprehensive, automated solution for mapping and discovery is the starting point to achieving an accurate view of ongoing environment changes and an effective BCDR strategy.
To see how Faddom helps organizations ensure comprehensive and proactive BCDR through fast, secure, and comprehensive asset and dependency mapping, just start a free trial today!