Organizations must be prepared to stay operational via a comprehensive IT disaster recovery (DR) plan that meets today’s increasing threats. Everything from natural disasters, supply chain interruptions, and ransomware cyberattacks has continued to be on the rise. Although previous posts have covered IT disaster recovery, it’s important to place it in the context of the importance of IT dependency mapping.
The United Nations’ latest Global Assessment Report (GAR2022) predicts that by 2030, there will be 560 disaster events every year; that’s 1.5 disasters every day. Meanwhile, the U.S., Israel, the Middle East, and North Africa will see an increase in hacker exploits against businesses, according to the CrowdStrike’s 2022 Global Threat Report.
Although a ransomware disaster recovery plan can provide both reactive and proactive defense against potential fallout, organizations must clearly understand how to create a plan that fits their business. This starts with a detailed understanding of disaster recovery planning for all aspects of the organization in every applicable scenario.
Table of Contents
ToggleWhat is an IT Disaster Recovery Plan?
While DR enables a business to recover following a disaster, the broader plan can go beyond backend infrastructure, devices, networks, applications, workloads, and databases. In the broader picture, disaster recovery has historically required businesses to deal with the physical displacement of their workforce in the event of a natural disaster; this encompasses all equipment, communications, and a physical location from which to work.
In the post-pandemic hybrid work era, most businesses have decreased the impact of disasters on their ability to reach a central office given the predominance of work from home. This may change in the hybrid work era with significant numbers of companies returning employees to the office. Regardless, the recovery of IT infrastructure, such as networks, applications, workloads, databases and compute power, in the event of a disaster is at the heart of DR.
DR is a subset of every organization’s business continuity plan (BCP), which encompasses more than a disaster recovery plan by providing approaches for the entire business. BCP focuses on the maintenance of business operations in the event of a disaster, while DR focuses on keeping IT systems and access functioning.
Challenges of DR and DR Planning
There are increasing challenges to the creation of a comprehensive IT disaster recovery plan in the digital age, as systems and the way people work are constantly changing. One major example is the multiple IT architectures that often exist in an enterprise with everything from bare metal and physical servers to VMs and cloud systems with a combination of SaaS, IaaS, and PaaS.
The Need for Different Recovery Approaches
The hybrid world of on-premises data centers and cloud environments increases those challenges, with data centers using different operating systems and hypervisors that may require different recovery approaches. Applications and workloads also often have varied DR needs. Every organization will have mission-critical systems and data they must restore in minutes alongside non-critical assets and data they can restore in hours.
Complex Backup and Recovery Management
Managing backups and recovery can be complex since requirements can vary across architectures, applications, and environments. This can lead to the use of costly and complex backup and recovery tools to meet different recovery time objectives (RTOs) and recovery point objectives (RPOs) that call for multiple forms of backup and environments.
Constant Data, Application, and Workload Changes
IT infrastructure, applications, workloads, devices, databases, and other aspects of an IT organization are constantly changing. This makes it difficult for a DR plan to keep up when it’s tied to specific infrastructure where system recovery does not constantly stay updated to reflect those changes. This also makes it challenging to test the effectiveness of a DR plan in the event of an actual disaster when the underlying systems, applications, databases, and devices are always changing.
Applications and Workloads Across Different Clouds
As more organizations work in a multicloud world, IT teams have difficulty determining the shared responsibility rules of different cloud providers and the SaaS applications controlled by a third-party provider where guaranteed levels of recovery may vary.
Cybercriminals are targeting the remote and hybrid work world with more sophisticated attacks as regulatory compliance is increasing the requirements of audits and tests to ensure data privacy. These factors require organizations to perform extensive IT audits as part of the foundation for an IT disaster recovery plan that is resilient and agile enough to meet tomorrow’s changes.
Creating a Disaster Recovery Plan and Best Practices
A DR plan can be a complex document tied to people, processes, technology, and procedures. Creating one starts with considering several factors that help shape the recovery strategy, and that starts with identifying the personnel responsible for developing and executing it across business and IT departments.
IT Infrastructure and Application Mapping
The next step is to map IT Infrastructure, applications, and asset dependencies. This requires a comprehensive mapping tool to determine all the hardware, software, devices, systems (owned, leased, or use of a service), and databases. Organizations must identify every asset’s physical and virtual location, versions, and dependencies.
Risk Assessments Based on Critical Business Processes
A risk assessment will then help discover likely threats to the organization, as well as the most likely service disruptions to business processes and the critical systems underpinning them. This, in turn, supports the next step—determining the impact of how an organization uses its assets and defining their impact on normal business operations as high, medium, or low.
Establishment of a Recovery Process
The next phase establishes the recovery process, setup, and tooling for all assets and the goals of disaster recovery. The business impact analysis and asset ranking will guide the determination of:
- Recovery time objective (RTO): How fast an organization must return an application, system, or process to normal operations after a point of failure or loss.
- Recovery point objective (RPO): The amount of data loss an organization can experience before incurring any real damage.
Disaster Recovery Setup and Tools
The organization will then need to determine backup procedures and tools, which will include:
- A backup strategy to determine backup sites and their designation of:
- Hot (mission-critical assets for immediate recovery)
- Medium (critical but more time allowed for needed recovery)
- Cold (low-priority data storage with only long-term rather than immediate need)
The backup strategy should be based on the 3-2-1-1-0 gold standard. This requires maintaining three copies of business data on two different types of storage, with one copy stored offsite (preferably in the cloud), and one copy stored offline. The strategy must ensure that there are zero errors in all the data.
With cloud as the preferred method of DR today, organizations can use backup as a service (BaaS) and/or disaster recovery as a service (DRaaS) to simplify the management and complexity of backup and recovery. While both BaaS and DRaaS back up to a cloud environment, BaaS only backs up data while DRaaS backs up data and infrastructure with replication (continual copying of data changes) to ensure the latest recovery or failover to the recovery environment.
Budget Considerations
The budget for an IT disaster recovery plan and strategy depends on the unique needs and approach of each organization. This starts with understanding the organization’s compliance requirements based on its sector. IT then calculates the organization’s cost of downtime per hour based on revenue and productivity loss.
RPO and RTO have a major impact on revenue and productivity loss while also affecting the costs of the backup, such as BaaS or DRaaS. Third-party DRaaS providers have SLAs for uptime and recovery; cloud providers also have uptime SLAs (based on a shared responsibility model). These can determine the percentage of costs the business will incur from downtime.
The Uptime Institute reported that 2021 saw over 60% of outages costing businesses at least $100,000. These statistics provide an indication of what’s at stake when it comes to comprehensive DR planning supported by IT infrastructure and application mapping.
Backup Testing and Restoration
A DR plan and strategy are only as good as the testing and restoration that ensure it works in the way intended. Both restoration and testing require that the workforce and DR team have logical processes in place. Many organizations will define restoration processes and procedures by the SLAs set up with a BaaS or DRaaS provider if they exist. Others will need to determine restoration step by step if the organization is managing all aspects of the DR recovery.
Testing the plan requires regular drills to ensure that the people, processes, and technology are working as expected. It’s best for most organizations to test every quarter and larger enterprises to test monthly to confirm the plan reflects current needs, IT infrastructure, applications, and workloads.
Lanir specializes in founding new tech companies for Enterprise Software: Assemble and nurture a great team, Early stage funding to growth late stage, One design partner to hundreds of enterprise customers, MVP to Enterprise grade product, Low level kernel engineering to AI/ML and BigData, One advisory board to a long list of shareholders and board members of the worlds largest VCs
Tips from the Expert
In my experience, here are tips that can help you improve IT disaster recovery (DR) planning:
-
Use dynamic asset and dependency mapping
Regularly update your DR plan by mapping your IT infrastructure and application dependencies to ensure no critical assets or relationships are overlooked during recovery.
-
Tailor recovery approaches by priority
Prioritize assets by their criticality (hot, medium, cold) and ensure backup strategies (e.g., BaaS or DRaaS) match recovery time objectives (RTOs) and recovery point objectives (RPOs).
-
Automate testing and monitoring
Automate DR testing and system monitoring to catch issues early and ensure that your recovery plan remains relevant to the latest infrastructure changes.
-
Integrate DR with security
Align your DR plan with cybersecurity protocols and data protection strategies to handle threats like ransomware alongside other disasters.
-
Review and refine regularly
Perform quarterly DR tests (or monthly for larger enterprises) and adjust your plan based on test outcomes, evolving business needs, and changing IT environments.
Best Practices
Every organization can establish a solid disaster recovery foundation when they follow a universal set of best practices for DR planning. This becomes a way to reevaluate the plan, strategy, and testing to ensure that they constantly meet the needs of the organization. A good list of best practices includes:
- Continuously iterate the process to keep up with business and IT changes by performing regular IT infrastructure and application mapping.
- Maintain a readily accessible disaster recovery playbook so everyone knows their roles and what procedures will take place based on specific triggers. A DR plan must also be easily modifiable since it is subject to change with every iteration.
- Create a plan for backup work locations with secure system-access procedures for a hybrid workforce.
- Schedule DR plan testing each quarter at a minimum, with large enterprises performing monthly testing. The IT infrastructure and application mapping should always precede the test to have an updated view of all assets and dependencies.
- Develop comprehensive test reports that provide insights into the success and/or errors of each test as well as the test procedures.
- Institute regular employee training and drills for the organization to be certain everyone knows the processes and procedures in the event of any disaster.
- Integrate and/or coordinate DR planning with existing security and data protection solutions and processes.
Meet Faddom
Business continuity and disaster recovery (BCDR) should be part of a holistic approach to making sure every organization can remain operational regardless of the form of downtime or disaster. By implementing BCDR planning, organizations can align business functions to IT dependencies to reduce risk and ensure resilience, uptime, and profitability.
IT Infrastructure, devices, systems, applications, workloads, databases, and all dependencies are at the heart of business productivity, second only to an organization’s people. Having a comprehensive and constantly updated map of these assets is critical to developing a resilient and agile disaster recovery plan and strategy.
Every successful DR plan relies on visibility into the IT infrastructure, application, and services environment across on-premises, the cloud, and beyond the network edge. Having a comprehensive, automated solution for mapping and discovery is the starting point to achieving an accurate view of ongoing environment changes and an effective BCDR strategy.
To see how Faddom helps organizations ensure comprehensive and proactive BCDR through fast, secure, and comprehensive asset and dependency mapping, just start a free trial today!