In: Operations Management
In this discussion, you are the IT manager for a four-year university in New Orleans, Louisiana. The IT department supports the computer network requirements for over 6,000 students, 300 faculty members, and 200 staff members – plus a conference network for distinguished guests. See Exploration 2 for tips on how to complete this discussion.
Instructions
Exploration 2:
The first task in risk management is to identify the possible risks based on location, use of Internet services, the sensitivity of the data, and the other business factors that affect the network.
Once you have a list of possible and probable emergencies, did you include a few that rarely happen, but are catastrophic to the business when they occur? These include hurricanes, tornadoes, floods due to plumbing failures or natural disasters, fire, ransomware, rootkit attacks, and more. Why do we care about location? We rarely experience hurricanes in Denver, Colorado, but we need to plan for them if our network is in Florida.
The relationship between the likelihood that a threat will occur and the impact is called Risk Exposure (RE). How often will these problems occur? When they occur, how much will it cost to address them? Our goal is to list the risks and classify them so we can prioritize them by their impact. If it is hard to assess them using specific numbers, use a scale from low, moderate, or high for probability and for impact.
The most likely candidates and the most costly problems feature prominently in our additional Emergency Action Plan sections. The plan may use brief scenarios to describe the situation, followed by what to do, who to contact, and how to evaluate the next steps to get the network back online. For example, one solution might address several risks and support a variety of use case scenarios. Cloud storage backups help to ensure data integrity in the event of equipment failure, accidental misuse, intentional internal misuse, cybersecurity threats, and to avoid a loss of data during a mass disaster.
The process of documenting a risk management and IT Disaster Recovery Plan includes listing all of the existing equipment and where the resources are located. If we cannot find it, we will struggle to fix it. Next, do we have a data backup plan (DHS, n.d.)? How do we backup the data servers and where do we store the backups? How do we handle duplicate backups and where is our off-site storage? For data backup recovery strategies, how do we access the off-site storage and cloud resources?
In the event that the IT team is no longer able to respond, we need documentation that describes the equipment, the resources, their locations, and how to bring them back online.
A common problem that IT faced in the past was the inability to restore data from backups. It is wise to test the ability to restore data from backups and to periodically assess the quality of the backups. A periodic simulation of the possible problems, as well as an audit of the Emergency Action Plan, the actions, and the countermeasures recommended in it increases the likelihood of recovering quickly from problems and restoring the network.
Identifying who to contact helps us satisfy company and federal requirements for the protection and use of data. The EAP also supports later efforts to define the budget needed to prevent risks. Lastly, it is not feasible to prevent every risk, so the EAP also helps us to document how we plan to address risks when they occur.
Common questions that we ask involve use cases based on realistic scenarios. How do we shut down network services in the event of a fire or a flood without endangering our IT team? How do we safely restore communications and network services after a power outage? Who do we contact during a disaster?
A failure to plan for emergencies may lead to a costly disruption in business services and could put the organization out of business
THAT IS EXPLORATION 2 ^ THERE IS NO NEED FOR MORE INFORMATION. YOU DON'T EVEN NEED EXPLORATION TO TO BREAK DOWN THE ACTION PLAN. THIS IS ALL THE POSSIBLE INFORMATION GIVIN ON THIS ASSIGNMENT, THERE IS NOT A NEED FOR MORE.
a). Major areas needed in an IT Emergency action plan : The EAP must be made available to all relevant personnel through multiple sources, offline copies, printed versions, online versions, cloud backup version etc. An IT EAP is required to ensure the safety, security and privacy of all data stored in the university. This could be records going back years relating to student/faculty/alumni/donors etc., operations information, staff and other related data, scholarships/donors etc., all of which is crucial to a university's continued running.
1. IT emergencies and the risk they pose : Emergencies can be of different types - natural, environmental (offices) and man-made.
A. Natural disasters such as earthquakes, tornadoes etc. are hard to predict in most cases and can cause immense damage to hardware. This can risk wiping out all the data stored by the university. To mitigate this risk, alternate non-hardware solutions should be maintained to ensure business continuity. One way to do so is to invest in cloud storage as a back-up option. As this cloud storage is likely to originate in another physical location (e.g. AWS offers out of country storage), the impact of the disaster at the university might not reach this secondary storage and ensure the safety of all data.
B. Environmental issues such as fire, accidents originating from within the office premises that store the hardware and software systems can also damage the system and the data it holds. One way to mitigate this risk is to have secondary/tertiary backups of data in different office locations e.g. if the university has 6 buildings, each can have a minor server room that stores some of the data. If a fire occurs in one or more buildings, the others will remain untouched and so only a smaller portion of data might be lost.
C. Man-made emergencies can be caused by malicious attacks (physical or online), viruses, internal/external leaks. To reduce their occurrence strong antivirus protections and firewalls are needed to prevent online and external attacks. All data should be encrypted to minimise the chances of data theft. Appropriate security of the server/computer rooms must be arranged to reduce incidence of intentional physical damage such as manned security for server rooms, CCTV in computer labs etc.
2. Notification or alarm systems : Once an emergency is noticed, it should be notified to relevant people by the observers who can be human or AI. If the IT system identifies a threat to its security, it should send an immediate notification to all key personnel such as the IT manager, the on-call team and proceed to shut down (if possible) the areas of entry. Human intervention can be in the form of notifying the on-call team/IT manager/Duty manager
3. Contact personnel and their roles and responsibilities : Who to contact in an emergency is key. It is advisable to have one on-call team that should be called for any emergencies. If students or faculty are using the systems in non-office hours, they should inform this team using email/phone to let them know if they perceive an anomaly or a risk or some kind. Having a single email and perhaps 2 phone lines will make it easier for students/faculty to report issues and for the team to continuously monitor. The on-call team should try to understand and resolve the risk and if unable to do so must inform the duty manager(s) who are the next level of escalation (IT manager can be one of them). These managers can advise more drastic steps such as shutting down applications, rebooting the systems, ad-hoc cloud updates in the event of an unrecoverable disaster etc.
4. Risk responses : Responses can range from mere notification to automatic shutting down of sections of the IT system. These can be pre-programmed for low impact risks or left to the discretion of the duty manager where system-wide impact is perceived.
5. Daily risk mitigation activities :
Daily/frequent risk assessment should be done. Periodic checking of the systems via systems diagnostics run, ad-hoc checks on access roles, authorisation violations, external threats, firewall checks, anti-virus upgrade status can help identify any new risks that could impact the IT systems. These checks need to be performed daily or at least frequently to ensure the current risks are assessed and accounted for.
b). Yes, location of the university will impact the natural and environmental risks that the IT system can face. Regional risks can play a big part in the creation of an EAP due to higher incidence of specific risks that need to have stronger mitigation or prevention plans. Some locations might be prone to higher rainfall resulting in uprooted trees falling on buildings or damage to the fibre optics and additional planning would be needed to protect the grounded wires and key building sites. Force majeure incidents like earthquakes, tsunamis, hurricanes that the location might experience should be planned for in the EAP to ensure physical protection of the hardware of the IT systems.
Each of the types of risks (A,B,C) should be categorised based on probability of occurrence, incidence and potential impact. The higher the potential damage and probability, the higher priority for the risk and increased planning and mitigation assessments should be. e.g. Hurricanes are rarely experience in Denver and so though the damage is high, the probability is low and so this risk will be a low priority.
c). Three greatest threats :
- Cyber attacks : A malicious attack from an external source can be initiated from any source that cannot be tracked quickly enough to be stopped, which is why this is the biggest type of threat there is. First mitigation check is to use a strong firewall and antivirus protection. The system can be programmed to detect any invasions and notify the IT on-call team about the breach. The section of entry can be shut down by the system if the breach is big enough, else the on-call team can try to resolve it. The IT manager can choose to bring the entire system down to prevent a further breach causing an outage for all users. If the breach is into the cloud storage then the service provider will take steps to stop it. In the event of a shut down, a local backup of the day's data should be maintained if possible to ensure no data is lost.
- Fire or accidents or intentional damage in the server buildings : Server rooms should be housed in separate floors or buildings with restricted access and manned security and CCTV. This can catch potential issues at the earliest and allow for quicker resolution. Having a fire/accident response team on site will help to speed up the resolution.
- Natural disasters such as hurricanes can damage all physical storage so a daily cloud backup of data will be a good way to mitigate any region specific emergencies.
Ensuring a strong backup of data is maintained and current at all times will help with all kinds of threats. A mix of secondary/tertiary/cloud backup solutions is best to cover all bases in case of cyber attacks and physical emergencies. Daily backups should be taken and periodically checked that a restore can be performed when needed.