[DRAFT]
The challenge of building a scaling system should rarely be the first thing on your mind as you take the first steps in developing a working product. Why expend the effort and resources on building a system to handle a million users when in reality your product will only ever obtain a fraction of that scale? Instead build for the now, with sensible decisions for the near-ish future. Build fast and deliver value and if you hit the problem of scale, recognise that as the problem of success that it is.
With that being said, we need to understand how to tackle an increasing load of website visitors and API requests as with this increase comes degraded experience and potential outages. Along with the technical problems that volume introduces, are the increased business concerns of financial implementations that the technical issues cause. No one cared that the API was running a bit slow when only a few hundred people were using it, but when the service hits the limelight (due to the problem of success) you can find yourself justifying every millisecond and proving its reliability.
Single server setup
So you are starting a project and you have no users, let us kick things off with a single web server which can serve the web/app clients you are supplying.
Above is how every project starts, and for good reason, it does the job, it is easy to set up and cheap. Fundamentally the client resolves the IP address with the Domain Name Server and then the web/app client can get the HTML or JSON content via an HTTP request.
That server handles the web requests and hosts the database, which might sound like a bit of a risk, and it is it. When you have a small amount of users this risk is likely acceptable but as the popularity of your service grows, this solution will not be performant, resilient or secure.
Split the Web and Data Tiers
Okay, so we are growing and we want to start thinking about scaling our system. The first thing to consider is having a separate server for the web server and database server. This means our system can handle more load as now we have 2 servers instead of 1.
It is good to recognise the additional benefits that come with this action. Splitting the web server and database server introduces some partial protection against outage, if the database server is nuked then at least the web server could still serve the website, likewise if the web server is destroyed then the database is still intact (albeit not being used).
This would look something like this:
Vertical vs Horizontal Scaling
Separate web and data tiers are kind of nice, but it is hardly a reliable system since the web and data tiers have no redundancy for when things go wrong… And what about when 1 server per tier is not enough?
If we were only concerned with handling more load, then we could scale vertically. Essentially, we upgraded the server to have more CPU / Memory to handle the load. This can be very effective and easy to do, however, this scaling method can become very costly as CPU / Memory is a hardware-limited resource and at some point, you will be throwing more money at hardware to sustain your system than it is worth. On top of the expense, this solution does nothing to provide a fallback for when something crashes your server.
Horizontal scaling provides more combined CPU / Memory for your system without hitting the excessive prices of higher-tier servers. It also provides redundancy as you can provide your web and data tier with multiple instances of the same app/data. This requires a little bit of restructuring of our architecture to support.
Load Balancer for Web Tier
A load balancer evenly distributes traffic among a collection of web servers. The web servers are the same as each other and process requests identically. In this setup, the client no longer communicates with a web server directly. The web servers can therefore be hosted on private IPs for enhanced security since the load balancer is the only resource being accessed by the client device.
In the above system, imagine if Server1 were to fall over. The load balancer would recognise the poor health signals from Server1 and stop directing traffic to it, instead directing all traffic to Server2.
This is great, our web tier can now operate to handle failover and can scale horizontally with as many web servers as needed to serve the traffic demands. So what about our data tier?
Replication for Data Tier
Web servers are pretty easy to manage as the contents of the web server don’t change unless a new version of the web app is deployed, which is a well-controlled process. To achieve multiple instances of a database, we need to consider how data is stored across multiple instances, how do we write and read information without it getting out of sync?
Generally, we would do this by setting up a “master” database and several “slave” (A.K.A “replica”) databases. The master database is where data would be written, and the slave databases would receive copies of data from the master.
This setup can make a lot of sense for many use cases as most applications read from databases far more than they need to write to them. This setup allows the slaves to scale as required and improves performance as multiple read requests can be processed in parallel.
Just like load balancers for the web layer, multiple database servers with replicated data provide reliability if one of the instances falls over. Even if the master fails, one of the slaves can be promoted to master. In a real-world situation, this can be a tricky process as you have to take care to handle any unsynced data from the old master.
