In: Computer Science
Voice, data, internet access and other network services often share the same network resources. A network disaster recovery (DR) plan ensures that all resources and services that rely on the network are back up and running in the event of an interruption within certain a certain specified time frame.
Such a plan usually includes procedures for recovering an organization's local area networks (LANs), wide area networks (WANs) and wireless networks. It may cover network applications and services, servers, computers and other devices, along with the data at issue.
Network services are critical to ensuring uninterrupted internal and external communication and data sharing within an organization. A network infrastructure can be disrupted by any number of disasters, including fire, flood, earthquake, hurricane, carrier issues, hardware or software malfunction or failure, human error, and cybersecurity incidents and attacks.
Any interruption of network services can affect an organization's ability to access, collect or use data and communicate with staff, partners and customers. Interruptions put business continuity (BC) and data at risk and can result in huge customer service and public relations problems. A contingency plan for dealing with any sort of network interruption is vital to an organization's survival.
Some important caveats to consider when preparing a network disaster recovery plan include the following:
What to include in a plan
Network disaster recovery planning provides guidelines for restoring network services and normal operations following a disaster. The plan outlines resources needed to perform network recovery procedures, such as equipment suppliers and information on data storage. It describes how off-site backups are maintained, and it identifies key staff members and departments and outlines their responsibilities in an emergency. The plan spells out responses unique to specific types of worst-case scenarios, such as a fire, flood, earthquake, and terrorist attack or cyberattack.
A network disaster recovery plan also identifies specific issues or threats related to an organization's network operations. These can include interruptions caused by loss of voice or data connectivity as a result of network provider problems or disasters caused by nature or human activities.
Like any other disaster recovery plan, this one should include information about contacting key staff members in case an emergency occurs after business hours, such as late at night or on weekends.
Some specific sections that should be included in a network disaster recovery plan include the following:
Who should be involved in creating/implementing it
An organization's network administrator works closely with network managers and other IT staff to create a network disaster recovery plan. Get other IT staff to get involved early in the process, including IT operations, data center and data processing managers.
Finance and budget managers should be looped into the process to ensure the financial implications of the plan are fully understood.
Business managers must be consulted to determine any RTO and RPO relevant to their part of the business. They also can contribute valuable information on how their staffs work and communicate. That information could become critical in the event of a disaster. The needs of support staff must also be considered when creating a network disaster recovery plan.
Outside vendors, service providers and suppliers should be consulted to understand how their operations might be affected by certain types of disasters. Will their local operations be functional in the event of a local disaster? What sorts of disaster recovery plans do they have in place? They can provide valuable information on how they can contribute to the organization's recovery.
Once a plan is drafted, it must be reviewed and approved by senior management. It's critical that all financial aspects of the plan be discussed at this point to minimize surprises in the middle of a disaster situation.
Common mistakes
Creating a network disaster recovery plan is a complex, time-consuming effort with lots of different pieces and people involved and many ways for it to go wrong. Among the common mistakes are:
Foregoing regular reviews. A network DR plan is not a one-and-done effort. Instead, it's a living document that must be reviewed and updated regularly to take into account changes in the organization, including more reliance on data and computers, new products and technologies in use, and changing processes and business objectives. The threats an organization faces also change over time and must be regularly reviewed.
Inadequate funding. Cutting budgetary corners in the planning process is a huge mistake. Taking time to educate senior management on the value of having a plan can help ensure adequate funds are allocated both for the planning process and for the implementation of the plan should it ever be needed.
Skipping the drills. Practicing the network DR plan is critical to its success. Staff members need to know where to go and what to do each step of the way before they have to do it in an emergency. Again, this is another place where it's tempting to save money and time, but that could turn out to be a costly mistake in the long run.
New technology replaces DR planning. Vendors tout resiliency, high availability and cloud-based disaster recovery as technologies that cut back on the need for DR planning. However, they are not the same, don't apply to the full scope of a network infrastructure and don't make business continuity planning irrelevant. The vendor hype is often just that: hype that won't help in a disaster situation.
Overlooking the details. The more detailed your network DR plan, the better. Documenting all network hardware, including model, serial numbers and vendor support contact information will save time if replacements or repairs are needed. Include configuration settings for all the networking hardware in your data center as backup in case imported settings don't work with replacement equipment after a disaster.
Backup types
The network disaster recovery plan doesn't exist in a vacuum, but rather is part of an organization's broader IT disaster recovery plan. Data backup is a key part of both the overall IT plan and the network plan, and information on an organization's backup policies and procedures should be included in DR planning.
The world has recently seen catastrophic natural disasters which caused loss of hundreds of thousands of lives and destroyed millions of houses as well as the communication infrastructure in the affected regions [1]. Failure in communication and information exchange leads to further heart-breaking crises to human beings [2]. Recent tragic disasters, such as the Great East-Japan Earthquake in March 2011 [3] and the Haiyan typhoon in November 2013 (in The Philippines), show the limitations of current communication technologies in the event of disasters.
Safety information including the number of wounded people, their locations and real-time health status are essential for rescue and crisis mitigation. It is necessary for people to access the Internet to share their safety status with rescuers as soon as possible. Our experience from analyzing disaster recovery efforts suggests that the first 24-h represent the “golden time” for emergency relief. However, recovery of disaster-damaged communication infrastructure is complicated and prolonged which is not suitable for emergency response. Strategic approaches should be proposed considering the following essential requirements:
(R1) Quickly re-establish Internet connectivity: Immediately after a disaster occurs, users need Internet access to data such as information about the disaster, evacuation notifications, their families status, etc., using common Internet-based applications (e.g., email, web browsing, Skype). Quickly providing Internet connectivity is a hard but essential requirement which must be satisfied by the proposed approach.
(R2) Leverage commodity mobile devices: Commodity mobile devices (laptops, tablets, smart phones) carried by disaster victims should be leveraged in the process of re-establishing Internet connectivity when part of the network infrastructure is down. This feature becomes very useful, for example, in and around evacuation centers where people gather after a disaster.
(R3) Configure and extend the network in an easy way: The network must be configured easily, requiring no technical skills from the disaster victims. Ordinary users should access the Internet as easily as if they are connected to conventional WiFi access points (APs). Furthermore, once joining the network, users should automatically contribute to the extension of the network coverage as per (R2).
In order to satisfy these requirements, a wireless multihop communication approach to reach the still-alive Internet gateways (IGWs) or Internet connected WiFi APs is the best solution. This work aims at quickly extending Internet connectivity from surviving IGWs/APs to victims by leveraging their mobile devices to form multihop wireless access networks.
Existing multihop ad hoc network approaches face difficulties in real-world deployment, especially in emergency response situations since they require dedicated hardware (e.g., additional mesh routers or network interface cards—NICs), complicated routing protocols, and IP address allocation and network configuration mechanisms to be installed on each mobile node (MN) in advance. In addition, it is still too complicated for ordinary users to change their devices into ad hoc mode and configure ad hoc networks.
We are pursuing the idea of quickly setting up wireless multihop access networks for disaster recovery utilizing wireless virtualization techniques [4], [5], [6], [7]. Concretely, a novel approach to on-the-fly establishment of multihop wireless access networks to extend Internet connectivity from surviving APs to disaster victims using their own mobile devices has been proposed in [6]. The network is set up on-demand using wireless virtualization to create virtual access points (VAPs) on mobile devices which greedily form a tree-based topology to bridge far apart victims with a surviving AP. Ordinary users can easily connect to the Internet through the established network as if they are connected through conventional APs; the users also contribute to increase the network coverage, which is essential in emergency relief situations. A proof-of-concept prototype for this approach has been built and demonstrated in practice. However, this approach still lacks a high-level fundamental communication abstraction that can simplify network establishment and configuration, a more rigorous design, and a thorough analysis of its effectiveness in different real-life settings.
This paper overcomes these drawbacks and presents the following main new contributions:
(i)
The wireless multihop communication abstraction (WMCA) is devised as a fundamental communication concept for the design of a practical tree-based disaster recovery access network (TDRAN). This concept helps to hide the inherent complexity of multihop communication establishment as each node is simply aware of only its associated AP (or VAP) using one of its virtual WiFi interface (WIF), and serves as a VAP using another WIF.
(ii)
A full system design and implementation of the TDRAN scheme, which details the new features in this work, as compared to those in [4], [5], [6]. We also propose the software-based WiFi access node (SAN) concept, which involves the software-based implementation of network functions that run on mobile devices without the need for additional hardware. In addition, we propose a mechanism for auto-reconfiguration of link failures to improve the usefulness of the proposed network establishment and configuration approach. This mechanism has been implemented to display the connectivity status table (CST) at each node.
(iii)
A thorough feasibility and performance analysis based on medium-scale field experiments, including indoor and outdoor setups. The experiments were conducted at two different locations seriously affected by the Great East-Japan Earthquake, namely Iwate and Miyagi prefectures, Japan. The analysis of the results provides a comprehensive understanding of the effectiveness and feasibility of the proposed approach. These new experimental results reveal that the proposed network is capable of extending to 20 hops by 15 m-distance and 16 hops by 30 m-distance networks, which result in 300 m and 480 m (respectively) in radius or about 1 km in diameter (much larger than that of 7 hops obtained by previous experiments in Iwate prefecture [6]). This coverage is large enough for disaster recovery and evacuation centers.
It is worth noticing that the simple yet practical mechanisms for auto IP addresses configuration and IP address conflict avoidance, as well as routing and DNS resolution in the tree-based networks [5], [6] are carefully integrated in the design of TDRAN proposed in this paper.
The rest of the article is organized as follows: Section 2 reviews the related work revealing the necessity of this research. Section 3 presents the problem definition and introduces the WMCA concept. The details of the TDRAN design and implementation are described in Section 4. The field experiments and result analysis are described in Section 5, and Section 6 concludes this paper.