RAPID Thailand Project

From AmWiki
Jump to: navigation, search
Title: RAPID Thailand: Enabling continued operation of IT services and infrastructures during floods and other disasters
Principal researchers: Andréa Matsunaga (PI) and Dr. José A.B. Fortes (Co-PI)
Institution: ACISUniversity of Florida
Status: Concluded (June 2012 - May 2014)
Award #: CNS-1240171
Proposed project: This RAPID project investigates the effectiveness of virtualized Internet data centers on improving IT service continuity during and after a disaster through virtual machines (VM) live migration and backup/checkpointing, conducting research on the use of virtualization technologies. These technologies enable mitigating and recovering from the impact of catastrophic events on IT infrastructures and the services they deliver. Working with Thammasat University in Thailand (Thailand Research Fund: TRG5580020), Internet Data Centers (IDCs) will be leveraged as disaster recovering sites, where government and corporate data can be backed up and operational servers can be temporarily located in order to provide high-availability and resiliency for the organizations' operations and services.

The recent catastrophic events in the 2011 Thailand flood raised a many issues in the disruption of operation and services provided by various organizations that can contribute in handling other future catastrophic events. Several research questions have arisen: The

  • Need to assess existing infrastructures since suitable solutions have dependencies on the type of disasters and the realities of the IT environments in the disaster locations, and
  • Need to address challenges when migrating VMs across geographic locations, given that existing VM migration technology have been developed with local area network assumptions that do not hold true in disaster recovery scenarios.

Thus, the work leverages existing infrastructure and experience in machine visualization technologies and cloud computing deployments to conduct realistic experiments, to assess the effectiveness of mechanisms offered by existing visualization technologies to maximize the availability of services and minimize costs to maintain and/or recover all the services during and after disaster. Research efforts will be developed in the following thrusts:

  • Collection and analysis of data related damaged IT services due to the 2011 Thailand Flood;
  • Studies of the nature of the IT services and their infrastructure design;
  • Studies of the practicality and scalability of VM live migration and backup/checkpointing in wide-area setting; and
  • Investigation of virtualized-based resilient middleware architectures for service continuity.
Broader Impacts: This project will advance our understanding of how to provide robust middleware for protection and recovery of IT infrastructure that performs well for different types of disasters. It will also inform policy-makers and IT managers in Thailand and the USA on how to evaluate and integrate emerging commercial virtualization solutions for backup and/or recovery-oriented computing systems under the extreme conditions found during and after a disaster. The research supports a female minority PI and a US graduate student. The project has been submitted for co-funding from the Thailand Research Fund (TRF), which, if successful, would represent a precedent-setting partnership between the NSF and TRF which could be a model for future collaboration.
Project Outcomes: Catastrophic events, as the 2011 Thailand flood, cause major disruption of operations and services provided by various organizations in every aspect of society. It is expected that such events will continue to occur, potentially with increased frequency. The flood raised a number of issues and had devastating impacts not only in Thailand, but also in all countries that invest or rely on products manufactured in Thailand. Organizations increasingly rely on Information Technology (IT) for tasks that range from normal routine operation to management of organization’s information, making it essential for such services to continue operation even during disaster events to avoid further impacts with respect to economic and human loss. To minimize the impact of natural disasters on IT services and increase the continuity of services, we investigated the possibility of leveraging Internet Data Centers (IDCs) as disaster recovery sites, where government and corporate data can be backed up and operational servers can be temporarily located. As natural disasters tend to affect large geographical areas we performed this investigation assuming the need to transfer large amount of data and systems over Wide Area Networks (WANs).

The survey with educational, research and business institutions, on the impact of the 2011 Thailand flood for IT resources, shows that (a) IDCs in a flood-safe zone are available to be used as a target location to move IT services from the affected area, (b) the use of virtualization technology would be more economical and efficient than physically moving the existing hardware (the strategy used during the 2011 flood), (c) the hardware can be placed in a safer floor in the same building, (d) WAN infrastructure is in place to meet the migration requirements; and (e) floods and hurricanes offer good predictability of the time and location being affected (on the order of hours or even days), providing plenty of opportunity for complex IT services evacuation plan to be very effective. A virtualization-based middleware architecture for IT service continuity was developed, containing at its heart a controller to coordinate the migration of VMs residing in multiple hosts, between datacenters separated by WANs with varying performance. The controller is responsible for tuning the number of concurrent migrations by maximizing the use of processing and network resources without congesting or oversubscribing them. Since the optimal number of concurrent migrations depend on the distance between datacenters (latency) and available bandwidth (which varies over times, especially under disaster conditions), the adaptive controller can increase the number of evacuated VMs when compared to a static solution. Given that continuity of services depend on maintaining the network access to services, the networking service support offered by cloud providers and the use of Software Defined Network (SDN) to automate the networking reconfiguration was surveyed. We conclude that complexity increases with the distance between the source and destination of VMs, and that SDNs are viable given additional research and solutions to deal with security. This project represent a precedent-setting partnership between the NSF and the Thailand Research Fund (TRF), serving as a potential model for future collaboration.