There is an undeniable trend of increasing cloud usage, with the promise of cost effective, high performance, highly scalable apps in the cloud. That goal is entirely achievable – really! – but obviously it’s not guaranteed. So what can you do to guarantee that your apps will achieve that lofty goal? The answer is not a magic bullet, and it’s nothing new. It’s simply a combination of performance testing, early and often, along with an emergent approach to designing the application architecture to achieve your performance goals. If that makes total sense to you then perhaps you can stop reading right now; however, that’s a very loaded sentence. Read on -- so many details still to come.
Starting with some definitions. Performance testing involves running tests on your system to see how well it performs, often starting with a light workload and increasing the load over subsequent tests to draw a profile of how it performs under various workloads. The goal with performance testing is to gather performance metrics, and to determine whether those metrics meet your performance goals. It is perfectly normal that early rounds of performance testing do not meet every performance goal. Each round of testing forms a baseline for comparison. You’ll make some improvements, run another round of testing, and then compare the results against the prior round’s results to assess whether the improvements achieved their goal.
Figure 1: Performance tuning is an incremental process
Please note that performance testing is not at all unique to the cloud. Most of this blog post applies equally well to on premises apps. However, the marketing folks have promised us that we can build highly scalable apps in the cloud, and as mentioned above that’s entirely true but it’s not guaranteed. If you are building apps in the cloud with the expectation that your apps will scale and perform, then you will need to verify that your apps meet your performance expectations. Thorough performance testing provides the information you need to tune your app so you can guarantee it will scale and perform, which is critical in building highly scalable cloud apps.
One other term mentioned above is emergent design. Emergent design means that you shouldn’t worry about creating a complete and polished application design up front. If you took that approach it would take a long time before you implemented any functional requirements and delivered working software to test. Rather, you should focus on maximizing value by building working software that meets requirements, allowing an optimal design to emerge and evolve over time. For the sake of this discussion, emergent design means that you shouldn’t focus on refactoring code to be highly performant before you even know whether the code is a performance bottleneck or not. Let the results of performance testing help focus your efforts. In fact, the tests themselves will almost certainly evolve over time as well, as the application changes or you need additional performance information.
One final term that’s used throughout this blog post is scale, or scaling. You can scale the various components of an application’s infrastructure in two ways: up and out. (See Figure 2.) Scaling a component up means that you’re making it bigger. For example, you could take a virtual machine from a single core with 1GB of memory and scale up to 4 cores and 4GB of memory. Scaling out means you’re creating more instances of a component. For example you could take that same single-core virtual machine and scale out to 4 single-core virtual machines. You would scale a component up if it needed more horsepower to get its tasks done quicker. You would scale out to get more tasks done in parallel.
Figure 2: Scaling up versus scaling out
The definition of performance testing mentioned performance goals. Do you even have performance goals? Of course you want your app to be fast, but can you be specific? And realistic? Some very common performance metrics are page load times, web service response times, query response times, and transaction throughput (e.g. transactions per minute). It all comes down to what’s really important for your application. Perhaps you have a suite of reports in your application and you need them to process vast amounts of data and still return results within a few seconds. Perhaps you have a web site with various dashboard pages, each backed by several queries that need to return their results in a fraction of a second so that the dashboards each load within a few seconds. Or you might need to process thousands of transactions per minute, or even per second. Your app could even include all those scenarios and still more. Optimizing performance can be very challenging, but as developers we all like to solve challenging problems, right?
The point is that you need to have concrete performance goals that cover the key scenarios in your application. That will allow you to test your app and know whether you’re achieving those goals. And remember that the performance goals need to be realistic. Overly aggressive performance goals can cost your team a lot of time, effort, and money. But if that level of performance is truly required then perhaps it’s time/effort/money well spent. Also keep in mind that some performance goals can compete with each other. For example it can be difficult to achieve high transaction throughput as well as fast query response times. It’s doable, but again, more time/effort/money.
Figure 3: Sample performance goals
You’ll want to start performance testing early enough in your development lifecycle so that if you uncover any issues you’ll have time to resolve them and improve performance. The earliest you can start is when one of your application’s key features is functionally complete and stable enough for testing. At that point you’ll need to decide on your approach to performance testing.
The most common targets for performance testing are web sites and background processes, but you can also directly target the various services that compose them, especially critical services such as databases. You’ll want to design your tests to answer some common questions:
- What throughput can you handle with your current architecture at small, medium, and large scale configurations?
- How do you scale your various components to achieve your anticipated throughput, and how much will that configuration cost?
- What are the bottlenecks in your architecture?
- Just how big can you scale before you need to significantly re-architect to go to massive scaling?
You’ll probably have more questions of your own.
Fortunately the cloud makes it much easier to answer these questions compared to on premises deployments. With the cloud you can configure small, medium, and large scale infrastructure in a matter of hours, if not minutes, and yet it’s still very inexpensive. You can even create massively scaled infrastructure with servers in different regions around the world. That degree of flexibility and available power is one of the big advantages of the cloud.
Just one more thing before we jump into actual testing. There is more to your application’s architecture than web sites, databases, back end processes, etc. In the end it’s all about your data. You’ll want to scale up your data to match your test scenario. If you’re testing a small scale scenario then mock up a small set of users/customers, each with a small amount of realistic data. Testing massive? Then mock up a massive data set. Generating good, realistic test data quickly is a lengthy topic deserving a blog post of its own, but before you start testing it is very important that you configure realistic data that matches your test scenario.
Testing Web Sites
Finally we can start testing. Let’s start with testing a web site. For a recent project our small scale goal was to handle 25 concurrent users with page response times of 5 seconds or less. Pretty straightforward. There are various approaches and technologies to enable web testing, but the main point is to use some form of automated web tests. As a .NET developer, Visual Studio is my tool of choice. As a quick initial test I created some web performance tests by recording various scenarios that correspond to common workflows in the web site. So, for example, I recorded a scenario where I logged in as an administrator and performed a full suite of administrative functions, then logged out. I recorded another scenario to thoroughly exercise the various reporting pages, and then logged out.
Using Visual Studio I bundled together the suite of web performance tests and in a matter of a few minutes was able to simulate 25 users executing those tests. When it completed it showed me all sorts of metrics regarding how well the site performed. The vast majority of pages met the 5 second goal for page load time, except for a few, such as the login page, which involves loading a complicated user profile. We implemented some caching of user profile-related data and that fixed the issue. Some reports occasionally surpassed the 5-second threshold, so we did some tuning there as well.
Figure 4: Visual Studio web performance test results
That very basic test took less than a day to pull together, which was a valuable quick win. But because I had recorded all those scenarios while running against our dev environment, they were hard-wired to dev and its data. As a result it would be impossible to run those same tests in another environment (e.g. the test or staging environments). Not very flexible. For a short term fix I simply re-recorded those same test scenarios but logged into other environments, then saved the web tests with different names. Eventually, however, to make the tests more flexible I changed them to read configuration and test data from files. Now we can change test environments quickly by swapping in different data files and it works quite well.
It’s important to note that testing the web site exercised more than just web servers in the cloud. Most of the pages in the site load data from the database, and so these tests stressed the database as well. For that reason when it was time to scale up the tests for significantly higher volume and throughput we had to scale up both the web sites and the database. We scaled the web sites to more instances (scaled out), and the database to a higher performance tier (scaled up). We also had to do testing to find a balance between the scaling of our web sites and the database. If we scaled up the web sites to handle a large number of concurrent users, but failed to scale the database to match, then the database would be unable to keep up with all of the requests, and we would see many database timeouts and deadlocks occurring.
Testing Background Processes
Many cloud applications also include one or more background processes. Fortunately testing background processes is a simpler matter than testing web sites, because background processes require no automation to drive their processing. You will still need to generate realistic test data to be processed, and that’s typically the most difficult part – besides tuning. With one recent application the critical background process pulled data off of a queue, ran it through a workflow and made various updates to the database. It needed to process queued messages as fast as possible without stressing the database. Maximizing throughput was quite a challenge.
As mentioned above, generating test data – realistic test data – was difficult. As part of our efforts to performance test our message processor we wrote an application to generate test data and write it to the queue. It worked quite well for our initial testing. When we deployed the application to a beta group of customers, however, that was the first time we saw live, real world data. There were multiple surprises with the real world data that caught us off guard. As a result not only did we change the message processor to handle those surprises, but we also changed our test data generator to reproduce those scenarios.
Full System Performance Testing
As mentioned previously, when we executed web performance tests we were actually exercising both the web servers and the database. Likewise, when we performance tested the message processor we were exercising both the message processor and the database. It is useful to test individual components of the overall system in isolation in order to establish performance baselines. It is a painful lesson, however, to learn that when you execute those varied components together as a complete system, most likely they will not perform as well as they did when running in isolation. For that reason it is critical to performance test the system as a whole, running performance tests on all major components of the system simultaneously. That will reveal any bottlenecks caused by interactions between components.
In our case, when we performance tested the web site and the message processor simultaneously, performance dropped significantly. It turns out they were thrashing one particular database table, with the web site reading data and the message processor writing data. That kicked off an effort to tune each process to lessen their impact on that table (e.g. rewriting queries, tuning indexes), as well as a longer term effort to create a data mart for reporting purposes.
Performance Testing in Production
So let’s say you’ve performance tested your app under various workloads, and you’ve tuned it until it meets all of your performance goals. You’re all done, right? Well, you’re definitely in a good spot and you should feel confident about deploying to production. You should also have a solid grasp of your application’s performance trajectory through increasing workloads, as well as a good idea of how big you can scale until you might need significant architectural changes. Still, once you’ve deployed to production you will want to be sure that things are performing as expected, and so you’ll want to gather real time metrics to assess production performance. In addition, it would be ideal if you could be notified automatically if performance strayed outside of configured thresholds.
There are many options here. Your first option is to use the diagnostics made available by your cloud hosting provider. (See Figure 5.) Of course the diagnostic features vary between providers, and even between the various cloud services provided, but the diagnostics essentially come free with your subscription. If the native diagnostic features lack what you need, there are also third party add-ons available for the various cloud providers, such as New Relic, AzureWatch, and others. These add-ons provide many more features than the native diagnostics, but require a subscription fee.
Figure 5: Microsoft Azure's web site monitoring
You can also use web analytics tools, such as Google Analytics, to track page response times, page hit counts, and other metrics. (See Figure 6.) These will involve some configuration and coding, but they provide truly valuable information for a relatively small effort.
Figure 6: Useful information from Google Analytics
You can even write your own custom background jobs to execute small performance tests periodically. For example, you could write a job to call some important web services and record their response times. If a response time exceeds a threshold the job could send an email to administrators notifying them about the issue. Some other examples: query a queue for its depth, execute a database query to measure its duration, or ping a web page to assess its response time.
One reason companies build and deploy applications in the cloud is to take advantage of the cloud’s scalability. There is much more to scaling an application than merely turning up the dial, however, and for that reason it is important to verify that your application scales as you expect. That’s where performance testing comes in. As we’ve discussed here, performance testing provides the information you need to find bottlenecks so you can fix them, and eventually arrive at a state where you feel confident in deploying your application to production, knowing that it will scale when needed.