Altering a big-sized table is never a fun task. It can cause the table to lock for anywhere between a couple of minutes to a couple of hours. If the table is an essential part of your production environment, this could pose a major challenge.
Couple of scenarios for such change can be:
1. Add column
2. Drop column
3. Add an index
4. Update an index
This situation happens to me every now and then and I came up with a very easy solution and while the solution I provide is not fully downtime-free, it minimizes downtime as much as possible. You can be up and running again within 5 minutes. It is also much simpler than many other solutions I Googled. If you cannot accept even 5 minutes of downtime, continue Googling…
In this post I will explain how to alter a table while the DB is hosted as an RDS instance on AWS. This process is very easy using AWS RDS but it can be done on any other cloud provider or on premise.
Step 1: Set the Binlog Format
Binlog is where each action in the DB is saved. You can control how it is saved by changing the “binlog format”.
If you don’t already have a separate parameter group for your DB, create one and assign it to your DB. If you are using the RDS default, just clone it and assign the clone to your instance.
We now need to set the binlog format.
Statement — It will save each query, I.E: “Update users set name=’tal’ where id = 1”.
Row — It will save the data that changed to the binlog
Mixed — Let mySQL choose what to save.
Usually the default is Mixed which is ok for most scenarios, but on some occasions, you may want to change it if Mixed is failing. You will see that your replication is failing once you start changing the table. Notice that after each change you must reset the DB which will cause additional downtime.
Step 2: Replication
Create a read replica of the DB. Make sure to make the replica to be the same as your production. I.E, if you are using Multi-AZ, your replica should be Multi-AZ as well.
Step 3: Change the parameter group of the replica
Clone the parameter group of the production server and create a new one.
Change the “read_only” parameter to 0.
You can now work on the replica.
Step 4: Change the table
On the replica server, change the table. If a large amount of data is going into the table from the production server, this step will lock the table and will block the whole replication.
Once it is finished (can also take days, depends on the size of the table), the replication should continue automatically, and you should give it time to catch up.
You can monitor the progress using the “show slave status” command on the replica server or through the RDS admin.
In the command, you are looking to monitor the “Last_Error” field to see if something interrupts the replication and the “Seconds_Behind_Master” in order to see that both servers are up to date and syncing (should be 0).
Step 5: Switch replica back to the original parameters group.
Once the replica is syncing again, return the production parameters group to the replica — this will active again the read only.
Step 6: 5 minutes of downtime
It is now the moment of truth. The following process can be done in 5 minutes:
1. Modify the production instance. Change the name of the instance to be something else. This will cause the endpoint of the instance to be changed and will shut down your code environment.
2. Once the change is finished, it means that no new data is coming into the DB. Select the replica and under actions choose “promote read replica”. This will cause the replica to become a new standalone instance, separate of the replication..
3. Modify the name of the replica instance to the original name of the production instance. Once finished, everything will return to normal and you will have a new production instance with the same data but with a modified database.
4. You can now go ahead and delete the old instance or keep it for couple of days just in case.