Multi-stage migration of Capsim infrastructure to AWS
Capsim is an American company operating for more than 20 years in the business training industry based on realistic business games. The company's main activity is the implementation of complex business simulations that allow users to improve their managerial competencies. Capsim's offer is aimed at universities, companies and corporations around the world. The desire to implement new solutions, as well as the development and scaling of existing tools, made it necessary to change the infrastructure within which the client's application functioned.
Infrastructure specifics
The designed and deployed Capsim infrastructure consists of two application layers (front-end, middle-tier), a backend (DB) layer, a cache and an additional NFS layer. The front-end and middle-tier layers operate with Auto Scaling Groups (ASGs), to which traffic is routed via Elastic Load Balancers. This approach provides flexible scaling of resources to match the current traffic and infrastructure load generated by users.
Access to the infrastructure is not public and is done exclusively through a VPN tunnel (site-to-site IPsec). Instances can also be accessed via AWS Session Manager, from AWS console.
For the purposes of the migration implementation, we also gained access to the client's development servers, maintained in an external server room and physical premises owned by Capsim.
Frontend layer
Instances are launched within ASG using prepared Launch Templates, and the AMI images used in them, containing pre-installed Adobe ColdFusion software, are built on dedicated EC2 instances. This layer uses Amazon Elasticache (Redis) to store user sessions. In addition to CloudFusion, the servers host IIS which is responsible for proxying to CF applications, making these applications not directly accessible - this enables the use of additional security tools in this layer. At the same time, NodeJS applications, which also reside within this layer, have been isolated and run within a separate group (ASG). This approach made it possible to achieve flexible scaling of the resources that are currently in use, without unnecessarily scaling the entire front-end layer.
Middle-tier layer
As in the front-end layer, instances are launched based on ASG and Launch Template. As part of this layer, we have installed Java technologies, as well as PDFcreator which has been moved to a separate instance over time and is publicly available to Capsim customers, with traffic routed through AWS Load Balancer.
Backend layer
This layer runs the database cluster (HA), which is the source of data for all applications.
Migration characteristics
Some of the elements of the application we were to migrate were written many years ago, and consequently had a large technological debt that prevented us from using standard solutions. In addition, most of the application elements turned out to be inflexible. It also turned out that we could only access three service windows per year, with a maximum duration of 6 hours, which, combined with the limited bandwidth in the client's server room, made it impossible to complete the migration within only one service window - since the application consists of dozens of SQL databases, some of which, with their size, exceeded 800 GB!
The key problem related to the migration of Capsim databases, turned out to be data synchronization. Traditional replication, due to the nature of the application and the databases themselves, was not an option. The way out of this situation was to split the process into two stages. In the first phase, the migration was a lift and shift, and only later, in the second part, did we move all the client's resources to Amazon RDS.
The first phase consisted of adding another server to the database cluster, which in the process of Log Shipping, every specified interval, sent changes to the attached server. In this way, we guaranteed ourselves almost real-time data synchronization and time for updates. In the second phase, we migrated from an intermediate SQL server (EC2) to the RDS service. The intermediate step that was used allowed us to significantly speed up data migration (compared to direct migration to RDS), according to estimates by about 70%. All thanks to the use of the AWS backbone (network) and access to functions that allow importing data into RDS directly from S3, to which the exported copy of the data was made. It took us almost a year to organize the entire operation, and the migration itself, according to the client's requirements - less than 6 hours.
Summary
For Capsim, we have implemented a complex, several-stage migration of an infrastructure holding a technological debt. We supported the process of upgrading the Client's technology stack, and managed to fit the entire implementation into just three slots of a few hours, available throughout the year. After the migration, the Client has a modern, scalable and highly available infrastructure. We are currently working together on the next stage of the evolution - the replacement of blue green deployment, based on ASG, with a container-based solution.
Read also:
- Nationwide call center, based on Amazon Web Services
- Implementation of cloud infrastructure for Magic Commerce, based on microservices, containers and IaC approach
- Implementation and maintenance of cloud infrastructure in AWS for Displate
- Migrating and building infrastructure in AWS for SimpleMining.net
- Implementation of a tool that automates the construction of subsequent infrastructures for the Flying Bisons company