As part of a job opportunity, the interviewer assigned some homework. The expectations were as follows:
- Redundant webservers
- Redundant database servers
- Read-write splitting between the webservers and the database servers
And ~24 hours later, I have achieved this. This is spread across six virtual machines:
- HAProxy server
- Apache server 1
- Apache server 2
- ProxySQL server
- MySQL master
- MySQL slave
Setting up the webservers was simple enough (apt install apache2).
To setup HAProxy, I followed this guide: https://www.digitalocean.com/community/tutorials/how-to-use-haproxy-to-set-up-http-load-balancing-on-an-ubuntu-vps
To set up master-slave replication, I used this guide: https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-in-mysql
To set up ProxySQL load balancing and read/write splitting, I followed this guide: https://proxysql.blogspot.tw/2015/09/proxysql-tutorial-setup-in-mysql.html and to a lesser extent http://www.proxysql.com/blog/configure-read-write-split
I’m sure there’s still some tuning and ironing out to do to make things super smooth, but the mechanics work as expected.
http://ha.travnewmatic.com/ loads a page that shows which webserver you’re using. If you refresh like crazy and the IP doesn’t change, that’s to be expected. Sessions stickiness is implemented so if you access the page within a certain interval, you’ll continue to talk to the same webserver.
http://ha.travnewmatic.com/session.php has more information about the cookie you’re getting. Both this and the previous URL are products of the Digital Ocean guide on setting up HAProxy. Its set using the “Cookie insert method.”
http://ha.travnewmatic.com/wp/ is a vanilla WordPress install sending data through the ProxySQL server to both the master and slave databases.
This screencap shows the unbalanced send/receive traffic going to and from the master and slave MySQL servers:
There were a few F5’s between issuing those two commands. Queries to the master went up by 14, while queries to the slave went up by 259.
Current limitations: In this topology, ProxySQL represents a single point of failure. This guide describes a setup involving multiple ProxySQL servers, and a Galera database cluster. While this was mentioned but not required for my assignment, this setup makes use of virtual IP’s (which at the moment I do not fully understand). The idea is that you can specify a single IP in the web app (i.e. the host that the web application looks to for database services), but depending on what’s happening, that virtual IP could refer to different hosts. The goal is that, in the event of one ProxySQL server going down, the other one takes over, but the web app continues to have database access like nothing happened.
The other limitation involves the master/slave setup and how my ProxySQL server does query routing. I’m not sure if this can be conditional (ex. if the slave goes down, use the master for everything). If either the master or slave go down, nothing works (though this could be partially mitigated by having multiple slaves, but if the master goes down, I’m screwed). Galera seems to be an appropriate solution to this problem, though I haven’t played with Galera yet. In my setup the slave is meant to be mainly read-only, and the master gets the writes. From what little I’ve read about Galera, every db server in the cluster is a master that could be used for reads and writes.
These are technologies I could use for my own self, but considering that everything I have in Dallas is on the same physical server, investing a ton of time into HA for my own services would not be a very fruitful endeavor.
Replacing VM’s with containers would drastically reduce space requirements and allow for easier backup in the event poo does hit the fan. Downtime for me isn’t a huge deal. Having to set stuff up again is a bigger deal. So far so good though. My ZFS array is healthy and things have been ridiculously stable (knock on wood) so I’m not so worried about lack-of-availability.