I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 636 posts at DZone. You can read more from them at their website. View Full User Profile

Migration to AWS: part 2

11.01.2013
| 9403 views |
  • submit to reddit

In part 1 of our migration, we set up web servers in a Virtual Private Cloud on AWS data centers, while leaving the persistent storage in the original data center.

This temporary solution let us experiment the functional capabilities of the application hosted on AWS (does it work?) while not touching any of the real traffic. However, it left several performance problems to be resolved as the long chat between web servers and databases made the HTTP traffic very slow.

The next step was then to study how to move also the databases when switching to the new web servers. At least the whole of the front end traffic had to go on the same data center as the database, while cron jobs and background processes could continue to run for a bit the old data center provided they connected to the new databases.

Need for a backup

Transitioning the database servers AWS meant creating MySQL and MongoDB installations (primary plus secondary plus the old data center replicas) identical to the current ones (same OS, same version, very similar configuration apart from mount points which can be different from physical to virtual machines). AWS offers Relational Data Storage as-a-service but you can only move to it after you already have your data inside an Amazon server. These new secondary servers in the VPC replicated asynchronously from the old data center (with less than a second delay notwithstanding the latency).

The need came to setup a backup strategy that would fit with AWS capabilities. mysqldump is not going to cut it for any non-toy database size.

Our system administrators setup a replica server configured as a secondary to be used specifically for this purpose. Since its mounted data partitions are hosted on Amazon Elastic Block Store, a snapshot of its volumes frozen for the occasion can be taken through AWS, which takes a total time of seconds to photograph the state of its storage. The snapshots are stored on S3 and can be used to startup new volumes in case of failures, even in new data centers.

Here is the script that our administrators set up to take a consistent snapshot of the volumes by using ec2-consistent-snapshot (not with half-written data):

#!/bin/bash
source /etc/profile.d/java.sh
export EC2_HOME=/home/ec2
export PATH=/home/ec2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/ec2/bin:/home/ec2/bin
INSTANCE_ID="i-12345ab"
AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
EC2_REGION="eu-west-1"
DESCRIPTION="\"mysql snapshot datadir for $(hostname) in data $(date +%F__%H:%M)\""
MOUNTPOINT="/mnt/mysql_data"
MYSQL_HOST="127.0.0.1"
MYSQL_USER="root"
MYSQL_PASSWORD="xxxxxxxxx"
MYSQL_MASTER_STATUS="master_status"

VOLUME_ID=$(ec2-describe-instances --aws-access-key $AWS_ACCESS_KEY_ID --aws-secret-key $AWS_SECRET_ACCESS_KEY --region $EC2_REGION $INSTANCE_ID | grep $(pvs | grep rightscale | awk  '{print $1}') | awk '{print $3}')


if  [ ! -z $1 ]
then
  retention=$1
  echo "===========================================================================================================" >> /mnt/mysql_old_snap_clean.log
  echo "============== snapshot removed in $(date +%F__%H:%M) keeping $1 last snapshots" >> /mnt/mysql_old_snap_clean.log
  echo "===========================================================================================================" >> /mnt/mysql_old_snap_clean.log

  /usr/local/sbin/aws --region=eu dsnap | grep $VOLUME_ID | sort -r -k $retention | sed 1,5d | awk '{print "Deleting snapshot: " $2 " Dated:" $8}; system("/usr/local/sbin/aws --region=eu delsnap " $2 )' >> /mnt/mysql_old_snap_clean.log 2>&1
else
  echo "missing retention parameter, skipping task clean old snapshot, if you want enable it, please provide extra argument such as : $0 5 (the number preserve last 5 snapshots)"
fi

#apt-get install python-software-properties
#  add-apt-repository ppa:alestic && apt-get update && apt-get install -y ec2-consistent-snapshot
# comment rows 438-485 of ec2-consistent-snapshot to avoid being seen as a slave of another master

ec2-consistent-snapshot -d  --aws-access-key-id $AWS_ACCESS_KEY_ID --aws-secret-access-key $AWS_SECRET_ACCESS_KEY  --region $EC2_REGION --description "$DESCRIPTION" --xfs-filesystem $MOUNTPOINT --mysql  --mysql-username $MYSQL_USER --mysql-password $MYSQL_PASSWORD --mysql-master-status-file $MOUNTPOINT/$MYSQL_MASTER_STATUS $VOLUME_ID


Master-master

Once we had a duplicated database infrastructure in our VPC, it became necessary to configure MySQL in a master-master configuration for the primary in Milan and the one in AWS. This does not cause problems until all client are still writing to a single master. MongoDB wins here because it can switch its primary with its secondary in milliseconds, and it can do so consistently so that writes to the old master are not accepted anymore.

To experiment the migration, our administrators tried then to change the configuration of the MySQL master connection in the applications to point to the AWS instance... only for the two masters to act as two separate halves of the same brain and start inserting duplicated values on AUTO INCREMENT columns for different data.

In fact, in our MySQL infrastructure there is no way to switch atomically the clients from one master to another. The following connections were still opened from the web servers:

  • Apache PHP processes (especially with persistent connections around)
  • cron jobs
  • long-running processes launched from the command line

All of these processes have to be restarted atomically to point to a single master when you want to change that configuration. Attempting to manually switch the configuration of each server while the processes are still running was doomed to fail due to part of them still pointing to the old master with their opened connections.

To be concluded, in part 3

Thus we were able to duplicate the whole infrastructure, but not to move the MySQL master to AWS yet in an atomic way. We started then to look for solutions that ensure the termination of all MySQL connections before the switch...

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)