One of the best features of RDS is the ability to upgrade a MySQL instance – whatever it is a major or a minor version upgrade – with the click of a button.
This option, assuming that the application is fully compatible to the new version, is very useful for not production deployments but gives no much control of the execution time (and potential downtime) in a production deployment.
What about Multi AZ?
Having a Multi AZ RDS does not really help in the upgrade scenario. Actually, as for AWS documentation, that does not actually even help for minor upgrades:
Unless you specify otherwise, your DB Instance will automatically be upgraded to new MySQL minor versions as they are supported by Amazon RDS. This patching will occur during your scheduled maintenance window, and will be announced on the Amazon RDS Forum in advance. We schedule them so you can plan around them, because downtime is required to upgrade a DB engine version, even for Multi-AZ instances.
So choosing to upgrade assuming that a Multi AZ will still reduce the downtime to 2-3 minutes is a step towards the unknown: if you test the upgrade of a Multi AZ, you notice that AWS will patch both instances at the same time.
So what are the steps to migrate an instance with a not negligible storage – about 0.5 TB (SSD)- from RDS MySQL 5.6.29 to MySQL 5.7.11?
First test with a snapshot
Let’s start playing with a backup to check how long the upgrade will take running it as a m4.2xlarge and performing an upgrade to MySQL 5.7.
After restoring the backup to a new instance, to make the test realistic, I first run a mysqldump to /dev/null to avoid the the first touch penalty that I would get after restoring a database from a snapshot.
I followed-up with a backup before starting the upgrade. Now I can proceed with the upgrade from the console, grab a cup of coffee and wait for the operation to complete. And check the logs:
13:37:16.180 DB instance shutdown 16:05:20.751 DB instance restarted 16:14:30.215 Updated to use DBParameterGroup default.mysql5.7 16:16:22.794 Database instance patched
What, hours of upgrade and downtime?
I have been waiting too long, the coffee is now really cold. What other options do I have to minimize the outage window on a major version upgrade of a Multi-AZ RDS instance from 5.6 to 5.7?
Of course if your database is significantly smaller or the above downtime is not that critical for you, the “one click” approach is the easiest one and the way to go. And you can skip the rest of the post.
How can I reduce the downtime?
There are other approaches that can help in bringing the downtime to ideally a few minutes, not significantly longer than a normal RDS failover in Multi AZ.
They are all not a “one click” operation, below are the ones I considered.
- Increase (or introduce if you do not have it) Provisioned IOPS during the upgrade and scale it down again after if you do not it. This approach requires a significant time before and after the upgrade to complete when the production instance is in a modifying status, decreased performances during the upgrade and increased complexity. Not ideal.
- Forget the comfort of AWS: run a read replica, stop the replication. Upgrade the read replica to MySQL 5.7, enable again replication, wait to catch up, promote read replica to the new master and enable Multi AZ again. Wait, why am I running RDS in the first place if I end up playing with a native MySQL?
- Use the new AWS Database Migration Service (I will come back with a separate post on this interesting service)
The magic solution
All the above helped me reducing the downtime in the test scenarios but I was still puzzled by how long the MySQL upgrade itself was taking. And after checking the logs of the MySQL upgrade, I finally run in a post on Percona blog that clarified the root cause:
MySQL changed the temporal types in MySQL 5.6.4, and it introduced a new feature: microseconds resolution in the TIME, TIMESTAMP and DATETIME types
Can I upgrade to MySQL 5.7? Of course you can! But when mysql_upgrade is running it is going to convert the old fields into the new format by default. This basically means an alter table on every single table, which will contain one of the three types.
And indeed with the provided query:
SELECT CASE isc.mtype WHEN '6' THEN 'OLD' WHEN '3' THEN 'NEW' END FORMAT, count(*) TOTAL FROM information_schema.tables AS t INNER JOIN information_schema.columns AS c ON c.table_schema = t.table_schema AND c.table_name = t.table_name LEFT OUTER JOIN information_schema.innodb_sys_tables AS ist ON ist.name = concat(t.table_schema,'/',t.table_name) LEFT OUTER JOIN information_schema.innodb_sys_columns AS isc ON isc.table_id = ist.table_id AND isc.name = c.column_name WHERE c.column_type IN ('time','timestamp','datetime') AND t.table_schema NOT IN ('mysql','information_schema','performance_schema') AND t.table_type = 'base table' AND (t.engine = 'innodb') GROUP BY isc.mtype;
I was in that scenario:
+--------+-------+ | FORMAT | TOTAL | +--------+-------+ | NULL | 2 | | NEW | 47 | | OLD | 24 | +--------+-------+
Those tables were indeed created when the server was still running a RDS MySQL 5.5, just about a couple of years ago. I then applied an alter table before the upgrade on all those tables, for example:
ALTER TABLE my_old_table FORCE;
I was of course blocking writes on that specific tables/functionalities of the product for a few minutes and the overall performances of the RDS were not ideal for some time, but I considered that an acceptable trade-off to put the database in a significantly better state before the upgrade.
And indeed when I started a new RDS from a snapshot with the simple “one click” approach the upgrade time went from more than 2 hours to less than 10 minutes on a m4.xlarge instance.
6:49:36 Finished DB Instance backup 6:47:29 Backing up DB instance 6:47:27 Database instance patched 6:47:26 Finished DB Instance backup 6:46:29 Updated to use DBParameterGroup default.mysql5.7 6:45:53 DB instance restarted 6:44:34 DB instance shutdown 6:43:11 Backing up DB instance 6:42:55 DB instance shutdown
Now it is a few minutes! Not hours anymore and no complex approach. Amazing improvement, thanks Percona. Combining all the above solutions I was indeed able to upgrade the 500 GB server in the timeframe of a normal RDS failover (1-3 minutes).
To summarize, to minimize the downtime if you go for the simple “one click” RDS approach:
- be aware that Multi AZ does not help you
- make sure you are aware of changes in the InnoDB engine and optimizations or steps you can perform before the upgrade at MySQL level
- take a manual backup immediately before the “one click” upgrade to minimize the incremental backup time taken during the real upgrade
Enjoy your upgrade and if you have comments on the topic, get in touch!
Note: this article was posted in 2016 and many things quickly changed on AWS and RDS. Check out my latest posts or watch my presentation on the future of relational databases on the cloud.