Custom Omega Disaster Recovery Plan essay paper sample
Buy custom Omega Disaster Recovery Plan essay paper cheap
According to Rittinghouse and Ransome (2005), identifying the costs that the firm would have to incur if the system went down is a major part of developing an efective disater recovery plan. Given the situation with the Omega Information system, it was necessary to come up with a contingency plan to make sure that the problem does not repeat itself. The Business Impact Analysis clears way to make sure that the business knows the threats that surround its information systems. According to the Business Impact Analysis, Omega Research Inc. bears the risk of the following risks to its information system.
Loss of data:
Loss of data will have the following impacts to the firms cost:
The data held in the information system across its remote offices is worth millions of data. Loss of data may be caused by power outages and this means that the information is supposed to be able to protect itself from power outage. The BIA took into consideration that some of the offices lie in high flood areas, earthquake prone areas and tornado pathways. All of these natural phenomena are possible causes for power outages once they occur and therefore the business information systems need to be protected from the disasters.
Loss of customers
The other impact to the business if the SAP system may face is the loss of customers if the system is not available. If the system goes down, the firm is likely to lose customers if the customers are not happy with the services they get. System downtime will mean that the customer services will be delayed and that the customers will be unhappy with the services they get. At worst-case scenario, the loss of customer data may be a very big drawback to the customer relationships and may lead to more customer loss. The BIA realized that a single outage might lead to loss of as much as a hundred customers through undelivered services or loss of customer information. Customer information is of highest important to Omega Research Inc. because the ability of Omega to deliver depends on the ability of the firm to maintain and manage customer information and records which are then used to meet the needs of the customers.
Loss of Human Resource Information
The other impact of the system being unavailable is that the HR information may be lost. For a big firm such as Omega, HR data is crucial for the management of the human resource. This information is important not just for the management of the human resource but also for other important decisions such as financial decision. Loss of HR data is important for processing the employee’s wages and salaries; it should therefore be taken care of in all ways. Losing this information may also be a very big detriment to the motivation of employees because it lost information about employees may mean that the firm will not be able to handle HR functions such as promotions. This may lead to the delay of the HR development and the firm will have to incur indirect growth.
The other impact of the SAP system being down is the fact that the down time or loss of data may lead to loss of payroll data. Omega Research Inc. depends on field marketers who go out there into the market to market the firm. Having reliable payroll data will depend on having the information about the field marketers keep in line with the payroll system. Marketers are paid based on the billable hours and these needs to be captured and kept in place. The unavailability of the system may lead to difficulties in managing the workforce of the firm.
Apart from power outages, the other risk that system faces is data loss. Any type of disaster could cause this. The loss of data through natural or manmade disaster or through errors in operation will be detrimental in the system. The impact of lost data will lead to the unavailability of services because Omega depends on massive data and the services of the system to be able to deliver its services to the clients. Failure of the system would mean that the clients would not be served in the way they should. While it is not easy to put a price tag on data, the value of the data lost by the system can be measured by the amount of revenue the firm will lose for not delivering the services and for the loss of customers. At the same time, data loss will have even more impacts on the firm, loses that may not be measureable in financial terms such as the loss of goodwill from customers, loss of potential customers, deterioration of the firms image and the lack of motivation of the employees. According to Bajgoric (2009), most businesses will lose more from data loss more than from the loss of hardware because data of an organization has an intrinsic value that may be easy to put a price tag on.
As the world is becoming more information driven, it is becoming very important for organizations to be able to manage their information and the infrastructure that supports this information, such as the networks, systems and hardware (Ingersoll, 2004). Information is very important to the existence and performance of Omega Research. In that regard, the information systems that store, process and deliver this information for the firm are just as important to the firm because without them, it is not possible to deliver and the firm is unable to perform correctly. In recognition of this, the current system from has acquired a system that has the least possibility of facing a system degradation either by power failures, sabotage or natural disaster. However, even in recognition of this, it is not possible for the firm to be guaranteed that the system will always run without failure. SAP is a very stable system that can handle a lot of challenges (Ralph, Reynolds & George, .2009). The hardware and network infrastructure on which the system is running is also well designed to overcome such challenges. However, failing to develop a good recovery plan in case anything ever happens would be allowing another Titanic Cruiser project, a recipe for trouble. For this reason, a comprehensive recovery plan has been developed and this is a summery of the recovery plan. The plan is a coordination of resources procedures, policies and human resource (recovery team) that will be launched once a critical condition requiring a recovery action is reached. For this process to be realistic and fruitful, the recovery plan is not just a coordination of resources, but it is also a guided by a number of policies that will be used to determine when a situation is a critical situation requiring a recovery procedure. The following is a list of the components of the recovery process:
The policy will be the rules that will be used to determine an emergency for which a recovery process will be invoked. This is important since not all system downtimes or problems require a system recovery procedure to be carried out. Because the system recovery procedure is an intensive process that requires the bringing together of many resources, it is good to have very clear boundaries and directions on when a recovery procedure should be induced. Failing to do this may lead to much confusion because temporary downtimes may lead to the recovery procedures even when it is not required. To avoid this, the recover process in place has guidelines that will make sure that the recovery procedure will only be invoked when the recovery process is required and when there is a critical condition. These policies are as follows.
Expected down time:
The RTO minimum downtime will determine when the downtime will be declared a disaster that requires a recovery process to be invoked, in determine the minimum time needed for this, there were factors that were considered in determining this number. One such factor that Omega Research is very much dependent on the online resources to not only connects the satellite offices between themselves and the head office; but also to connect the customers to the WAN. For this reason, a down time of less than half an hour was deemed acceptable. As a result, the minimum acceptable RTO downtime was set to 0.5 hours and above. For this reason, any downtime lasting over half an hour will be considered a disaster situation and will need to be dealt with as an emergency.
The disaster recovery team will be selected and put in place. SAP will also provide a disaster recovery team that will work together with the in-house team at Omega team Research.
There were a number of assumptions used in the development of the system. For example, it was assumed that the system is a high impact system that will be able to handle many of the problems that would easily knock down a normal system. The implication of this is that the disaster recovery policies were designed to only deal with very high impact problems such as disasters caused by earthquakes and floods. Simple issues that would affect a normal system such as the power surges caused by loss and return of normal power lines were assumed to be well below the disaster recovery line as the system would be expected to handle this kind of a problem.
The recovery plan also assumed that the system has a redundant recovery site that would be used for the recovery of the system. For this reason, the recovery plan will only work if and only if the remote redundant site will be able to maintain properly. However, because this is part of the system management for the SAP system, this site will be maintained at all times and a high fidelity of the data image will be maintained at this site. This would make it very much easier to recover system and data using the system and data image from the remote recovery site rather than trying to recover the system from the local fallen system.
For this purpose, the San Diego Site was chosen to be the alternate site that would be used for recovery or when a relocation of the system is needed. As a result, the San Diego site would be used as the alternate site. In choosing this site as the recovery site, it was assumed that the San Diego site would make it possible to recover the system within 30 minutes of downtime while being operated remotely. The recovery plan also assumed that from the SAP end, there is a trained team that is able to handle the recovery procedure and that the team is able to invoke the contingency plan from he remote office. It is good to note, the importance of this is that the SAP recovery team is made of the employees of Omega Research Inc but are employees of the SAP, the application provider. This is important and crucial because it means that the success of the system will depend on how well the in house team will be able to work with the team from the SAP end and thereby deliver a well-coordinated recovery procedure.
To make sure that the recovery plan will remain relevant and useful at all time even as time passes, it will be increasingly important for the in house team to make sure that the SAP team will remain aware of the needs of the systems needs for the Omega Research Inc. This means that the two teams may need to continually communicate and make sure that the system recovery plain is well understood from both sides.
That the disaster recovery team will be maintained even in the event of employees moving from the firms, both from the SAP side and the Omega Inc side. This is important because employee turnover is a common thing. If any member of the disaster recovery team moves from the firm, the firm will need to replace him immediately. This will make sure that system recovery team will remain relevant and ready for action.
Coverage of the recovery
It is good to note that the system recovery does not cover full business recovery. The system recovery will only, or at least is designed to recover data but not to restore the business to its course. The business is a complex system that will require more than just data recovery to bring it back to operation in the event of a total disaster. To restore the business back to its feet once such a disaster happens, the best plan to handle this will be the Business Continuity Plan (BCP) and the Continuity of Operations Plan (COOP). This recovery plan is therefore dedicated to the recovery of the system. However, it cannot be ignored that the data and system recovery can and does have a vital role to play in both the Continuity of Operations Plan (COOP) and the Business Continuity Plan (BCP) because without data, this would be impossible.
This disaster recovery plan also does not cover employee evacuation plan in the event of a disaster such as floods or earthquakes. Although the disaster recovery plan considered these two factors and even tornadoes, it was only from a data and system recovery point of view and not from the human rescue point of view. This plan therefore assumed by default that the firm would have a separate recovery or rescue plan for such events.
The three phases of the SAP ISCP are as follows;
The activation and notification
The activation and notification part is the first part of the SAP ISCP. This includes the identification of a disaster situation that will need intervention. The Omega Research employees through the laid out policies do this. Once a disaster recovery situation is identified, the disaster recovery team will be notified and the disaster recovery will be activated at the same time. This will then trigger the processes that will lead to the processes that will tell the team member from both the Omega side and the SAP side to start on the disaster recovery process. All the resources required for the recovery will therefore be made available and the recovery process will begin.
The following criteria will be used in activating the disaster recovery procedure;
- The type of outage indicates SAP will be down for more than 0.5 Hrs.
- The facility housing SAP is damaged and may not be available within 2 hours
- If the outage causes a system slow down that is likely to cause considerable lowering of speeds achieved by the system and the system is busy
Incase of a natural disaster like floods and earthquakes, it will first be determined whether it is safe to access the premises. If it is not, the recovery process will be done from the alternate site. If the local site is not accessible, the alternate site, San Diego site, will be considered on safety. Once the recovery point is identified, the next thing will be to inform the team about the problems and the site of recovery. The team from the Omega team will be notified first. Then, the SAP disaster recovery team will be notified later.
The contacts of the disaster recovery will be acquired. These contacts will be stored at all sites where Omega has offices to make sure that they can be acquired from anywhere. Once the contacts are taken, the team will be notified using the contacts. Notification will be done using a phone call rather than emails or messages. This will make it possible for instant feedback to be acquired.
Measuring the impact of an outage
The need for the disaster recovery plan is to make sure that the data and system functions are restored to functionality as soon as possible. For this reason, once the team for disaster recovery has been informed, the first thing they will do is to assess the extent of the damage caused by an outage. The following factors will be important in determining the extent of outage
Cause of the outage
The cause of the outage will be fundamental in determining the extent of the outage. For example, if the outage was caused by a natural disaster like an earthquake, the outage may be much bigger and would need more focused attention.
The services affected
In any network, the system will have some processes that are more important than others are. By determining which services were affected, the team will be able to measure the damage caused by such an outage. If critical services were affected, this will mean a bigger damage and a bigger impact on the system as well as on the business.
The system is less active on holidays and weekends and this means that during these times when the system is not busy, an outage is likely to have a lower impact on the business. Therefore, to determine the impact of the outage of the system, time will be critical. If the outage happens on a business day, it will mean that the business will be affected even more.
To determine total impact of the system outage, the above factors will be measured and then be aggregated.
The actual recovery will take place after the activation process has been initiated, the appropriate teams have been summoned and the outage assessment has being done. At this point, the actual recovery work will be started. The teams will now be able to do an effective recovery work depending on the data front the outage assessment done in the previous task. The recovery work will be geared towards, recovering any data or network rescue that may have been affected by the outage. It will also be geared towards repairing any physical damage and replacing any resources that may have been damaged beyond repair point. This will mean that, there may be more than one team needed for each of the different recovery work that may emanate front such an outage.
The following activities occur during recovery of SAP:
- Identify recovery location (if not at original location);
- Determine if its safe for the recovery personnel to go there;
- Identify required resources to perform recovery procedures;
- Identify the required teams to do the different recovery jobs;
- Retrieve backup and system installation media;
- Put in place any new hardware that may be needed to restore damaged hardware;
- Recover hardware and operating system (if required); and
- Recover system from backup and system installation media.
The following processes will be followed:
- Identify all damages
- Classify the damages into
- If any unrepeatable damages, identify the replacing resources
- Repair and replaces damaged resources
- Recover data from all available material in the local site where the damage happened
- If local data not recoverable, use remote mirror site
- Assess the recovery
Reconstitution of the recovered system
The recovered system will be tested to make sure that it has gone to its normal performance. To make sure that all the resources of the network and system are back in place, the recovery team will have a scheme for testing functionality. In such recovery procedures, it is usually possible to make the mistake of restoring only the services that are used the most and forget the resources that are dormant most of the time. This mistake remains unnoticed until such a time when a user needs it and this can cause chaos if the resource is important. To make sure that the recovery team does a thorough job, the team will be given a catalogue of the systems resources that must be restored at such a time of recovery. This will help them in making sure that everything is in order and that everything has been restored during recovery rather than having to come again to restore services.