I’ve been having a great time helping organizations take their first steps into the cloud, whether deploying new apps or redeploying existing apps. During these “first step into the cloud” conversations, organizations tend to ask a lot of questions about performance and cost. Of course they want high performance at a low cost, which is fine, but you’ll want to be sure and quantify your users’ performance and cost expectations. Their performance requirements might involve web page load times with a certain number of concurrent users, or it might be some measure of throughput like transactions per hour. Regardless, these performance requirements will come into play once you’re able to deploy your app to the cloud and execute performance tests.
It’s critical to design your app for high performance and scalability in the cloud. Unfortunately, many organizations design their apps “the way we’ve always done it,” which typically leads to apps that are less than optimal to scale for high performance and manageable costs. In such cases it can take some serious effort to achieve your performance goals. This blog post will present techniques for designing cloud native apps that provide flexibility in how you deploy and scale them. An optimal design will allow you to better control cost when scaling to meet your performance requirements.
Before we dive into the details let’s level set on some key terms and concepts. First of all, there are two types of scaling: horizontal and vertical, which are covered here. For the sake of this discussion, it is critical to design components so that they support running multiple instances, even many instances, without contention.
And then there are the very common cloud acronyms IaaS and PaaS, discussed in detail here. When it comes to scalability and cost in most cases it’s preferable to consider PaaS options. When helping organizations move to the cloud many times they will enter the discussion with a focus on IaaS, but when we help them through a detailed comparison with PaaS they are thrilled with the obvious cost savings and easier management. For these reasons the majority of this blog post focuses on designing apps using PaaS services.
Designing Cloud Native Apps
Apps designed to use PaaS services often look different from in-house apps from an architecture perspective. The differences often grow more distinct when the apps are architected for very high scalability. For example, here is a fairly straightforward web app’s architecture diagram both on-prem and in the cloud.
Figure 1: A simple website architecture on-prem vs. using cloud Paas services.
In this case, the cloud version looks simpler because the firewall and load balancer are built into the websites service. They provide fairly basic features like some denial-of-service protection. If needed, however, adding a more robust firewall such as ModSecurity can be fairly simple.
Here is a more interesting example: a background processing app that we recently migrated to the cloud. While this may be an unfamiliar architecture to some, it does a good job of helping point out many of the nuances in architecting apps for the cloud.
Figure 2: A background processing architecture
In this example, the original on-prem process begins by receiving tasks on a queue. Next, a few job processors pull messages off the queue, look up information in the database, and then write several updates to the database. Sounds pretty straightforward, and in fact it wasn’t too difficult to migrate to the cloud.
We migrated the app to Microsoft Azure using its PaaS services. We used Azure’s storage queue service, WebJobs for the job processors, and the SQL Database service. With the system in the cloud we now could scale the job processors both horizontally and vertically, and the database vertically. But it was when scaling for higher throughput scenarios that the limitations in the original architecture were revealed. The queue was no problem as it is a very robust, high performance, and even cheap service. We scaled the job processors from 3 to 10 instances and they ran just fine but they overwhelmed the database, which couldn’t keep up. We even scaled the SQL Database to a P1 instance, which is pretty beefy and costly, but it still couldn’t keep up. What’s up with that?
Some PaaS services are monitored and governed to stay within certain performance thresholds. For example, a S2 SQL Database instance in Microsoft Azure currently handles up to 2,570 transactions per minute. Scaling up to a P1 instance, however, will provide 6,300 transactions per minute. Once you exceed those thresholds the service is throttled and transactions start failing. That’s exactly what was happening – the job processors were flooding the database with too many transactions and so transactions failed and we were unable to achieve our performance targets. What to do about it?
There are a few common techniques that have been around for years, and that apply well to cloud native applications. Many of these techniques, and others, are discussed in this Microsoft Azure best practices document.
- Chunky, not Chatty
It is preferable to send fewer large messages between components than many small messages, mainly to avoid various types of overhead such as latency. For more details read here.
- Transient Fault Handling
The cloud platform is designed to handle and recover from failures. Your code should expect failures and handle them gracefully. Transient failures are the most common variety, and they are typically handled by retry logic. Most APIs have built-in retry logic, such as the Microsoft Azure SDK, but in some cases you might have to roll your own. More information here.
No surprises here. When you need to access information frequently and quickly, caching makes a lot of sense. There are excellent caching options in the cloud such as Redis Cache.
- Decomposition into Microservices
It is common for organizations who build apps “the way we’ve always done it” to architect large, monolithic, apps that do everything related to a system. Monolithic apps limit your scaling options, whereas decomposing those into microservices can provide tremendous scaling flexibility because you can scale out each service independently as needed. There is a tradeoff with added complexity, however. Here is a very detailed discussion of Microservices.
- Decoupled Communications
There are a variety of ways for component services to communicate with each other, such as communicating directly via HTTP APIs, or decoupled via database tables, file storage, queues or service buses.
- Delayed Updates
If data doesn’t have to be immediately up to date then delayed batch updates become a viable option. Essentially, delaying updates enables them to be considered “chunky updates” instead of “chatty updates,” which can greatly reduce transactions and the likelihood of throttling.
The following sections discuss how some of these approaches were used to re-architect the background processing app in Figure 2.
Chatty Database Transactions
When revisiting the job processor code, it wasn’t surprising to find room for improvement in its database interactions. It was executing multiple queries to look up the information it needed to process each job, and then it was executing multiple updates to complete the job. We calculated that there were on average 6 database transactions executed to process each job. That’s “chatty” and we needed to move toward “chunky” database interactions.
The solution was to break up the overall architecture and use some additional services. To ease the database look ups we created a Redis Cache (another PaaS service) and used that as a temporary store for the look up information. We query the cache first and then go to the database if the data isn’t cached yet. But once the cache is fully populated we no longer have to query the database at all. That greatly reduces transactions and also dramatically improves performance because cache queries are roughly ten times faster than database queries. A 1GB Redis Cache in Azure currently costs $102.68, but that’s about ¼ the price of upgrading the database from P1 to P2, so it’s really a bargain.
Microservices and Delayed Batch Updates
Next we turned to the database updates. Our client stakeholders said that the data in the database didn’t need to be real time, but could be delayed by up to three minutes. That opened up the possibility of delayed updates, first accumulating a bunch of information to be updated, and then updating everything in bulk as a single transaction. To accomplish that we stripped out all of the database update logic from the job processor and moved the logic into two new microservices, which we call data aggregators. One data aggregator service manages updating one set of data in bulk, and the second service does the same for another dataset. The job processor now completes its updates by writing updated information to two queues. It writes the information related to updating the first dataset to one queue, and the second dataset’s updates to a second queue. Next, each data aggregator queries its respective queue’s metadata periodically to see how many items are in the queue. When there are about 1000 items in the queue (or three minutes have passed – whichever comes first) it calls a stored procedure passing the queued items via a table valued parameter. That’s a massive reduction in transactions, from the initial 6 database transactions per task to about 0.002!
The new job processor architecture is diagrammed in Figure 3, below. (Note that the diagram simplifies things by ignoring the job processor-to-database interaction. Once the cache is populated the job processors no longer access the database.) As a result of these changes our performance tests are greatly improved. With 10 job processors running, our processing throughput has been increased 32X while database load has been reduced from 100% to 15%. Essentially that means we have plenty of database capacity to scale out further with more job processors if needed. It also means that a P1 database is more than adequate to handle the load, which is great because we no longer have to consider scaling up to a P2 database that costs twice as much.
Figure 3: The new decomposed job processor architecture
Another benefit of decomposing the job processor is that we now have more flexibility in scaling the various services to manage performance and costs. That is, we can scale the job processors separately from the data aggregators. As it stands we run many job processor instances for high throughput, but only a few data aggregators for redundancy.
Performance Testing and Instrumentation
The previous sections tell the short story of how we arrived at the new architecture for the job processing system. In fact those changes emerged incrementally over the course of a few months. One of the first things we did was to instrument the app with detailed metrics so we knew how long each part of the process actually took. Without those metrics we would have been playing a dangerous guessing game, and in the end we wouldn’t have been able to document actual, detailed performance improvement statistics. The instrumentation metrics proved highly valuable.
In addition we did a lot of performance testing. Our process over the course of the project involved:
- Running performance tests to gather detailed metrics as a baseline.
- Reviewing and discussing the various bottlenecks revealed in the collected metrics, as well as discussing approaches for addressing the bottlenecks.
- Making the proposed changes.
- Re-running the performance tests to gather new metrics.
- Assessing the performance test results to see if any other changes were needed.
Clearly the performance tests were critical to our efforts, but they also revealed some surprising results. For example, when running tests to find the optimal horizontal and vertical scaling we were surprised that medium compute instances performed twice as fast as small instances, even though the main difference is that medium instances have two cores instead of one. This was counterintuitive given that the apps weren’t written to take advantage of multiple cores at all. There were quite a few counterintuitive results that we would have missed without the detailed performance testing results.
For a more detailed coverage of performance testing please refer to my prior blog post on that topic.
When Is It Done?
When is your app and its architecture sufficient to meet your needs? Well, as stated at the beginning of this post you need to know your performance requirements. In the case of the job processor we worked with our client to calculate that we needed to be able to process 150,000 requests per hour. All of our performance improvement efforts were focused on that goal, and we met that goal with the architecture discussed above. Without those performance requirements, however, we would have been shooting in the dark without any way of knowing when we were done.
To Infinity and Beyond!
What if we wanted to scale much, much higher? Would this architecture work? As I mentioned earlier there are limitations to scaling both horizontally and vertically. Regardless, there are ways we could scale out the current job processors and data aggregators to a very large number of instances. We could also scale up to larger instances. But we would need to keep all components in mind, making sure that they all could handle the throughput. For example, at some point we would likely run into the same problem as before, that the database would be overwhelmed and throttled. Of course we could scale up to higher performance tiers, but those higher scales get expensive quick and there are only two higher performance levels remaining anyway. Likewise we would need to make sure the queues and Redis Cache could keep up, and would need to address any limitations.
As mentioned previously, scaling up is often not as cost-effective as scaling out. So how do you scale out massively without scaling up to the higher-cost tiers? A very common answer to that is “sharding,” which is about splitting databases to improve performance. A detailed coverage of sharding is beyond the scope of this post, but we can apply it in a general sense to the other components as well. For example, we could “shard” across multiple queues to handle more requests, or shard across multiple Redis Caches as needed. At an even higher level of abstraction, we could actually shard across many Azure subscriptions, each subscription handling a distinct set of users/clients/customers. That would be amazing!
Still one more thought on achieving very high scalability. There are an overwhelming number of cloud technologies offered by the major cloud providers, and those technologies are changing constantly. It’s important to stay up to date with the latest cloud service offerings as the providers keep introducing new services that make it easier to compose very high scale systems. For example, Microsoft Azure recently introduced its Event Hubs and Stream Analytics services, which may be able to replace our backend processing services with excellent scalability at an even lower cost.
The cloud is all about achieving high performance and massive scaling at commodity prices. Those benefits are real, but it should come as no surprise that achieving those benefits requires some effort, and thinking differently. In this post we’ve discussed approaches to designing cloud native apps to provide flexibility in scaling those apps in order to achieve your performance goals at a low cost. For any developer/architect who loves tackling big architectural challenges this is fun stuff, and so I hope you are fortunate enough to enjoy applying some of these techniques in your future work.