7 reasons why application deployments fail

Aternity June 9, 2014


I have been involved with many pre-production load test engagements over the past half-decade or so.  Some of them have been wildly successful, some of them not so much.  This blog post will share my experiences with these activities with the hope of helping others avoid the same mistakes.

There are many articles out on the Internet about deployment problems that are related to change management issues.  While they are absolutely valid, this post focuses on deployment and launch failures due to performance issues.  It assumes, for example, that the correct code is deployed for testing, the code published to production is the same that was tested in pre-production, the actual publishing mechanism works, and there is adequate capacity in the load generation tool to achieve desired load.

Why would someone go through pre-production performance testing under load?  The answer is quite simple – to avoid any performance problems in production.  A recent blog post from Riverbed’s own Clay James does a great job defining the problems with poor application performance, and there are many other posts on this blog that also give proof to why this is a good idea (see below in the related reading section).  Riverbed Professional Services has an Application Performance Testing service focused on this very issue, in fact.

This blog post will focus on seven reasons why application deployments fail: 1) not enough testing time, 2) not enough time to resolve findings, 3) poorly defined success criteria, 4) inadequate test setup, 5) lack of familiarity with environments, 6) inability to verify responses, and 7) failing to observe performance test results.

1. Not enough testing time

One could argue that everything can be solved if you give it enough time and resources.  The counterargument to this is always that you never have enough time and resources.  Fixing problems before they’re really problems should be one of those goals that gets time and resources allocated to it, though.

One thing that has stuck with me from high school physics class is a sign on the wall that said:

Now what does this quote from John Wooden have to do with physics?  Nothing, really, but that’s where the sign was.  Add to that question, ‘And how much is it going to cost you?’

There are several articles you can find via Googling about the cost of problems. Some suggest exponential increases in cost the longer an issue exists, while others suggest flatter cost curves are the norm with better feedback loops (read: production monitoring) in place. Those articles don’t necessarily take into account the cost of poor performance, though, which is what some other Riverbed blogs discuss in more detail.  The overall message is still the same: poor performance has a cost, so doing what you can to reduce poor performance before it’s a problem is imperative.

One of the key problems that lead to not having enough time is having an aggressive launch schedule.  There are some times where the launch date is immutable (launching a new retail store after your historically busiest sales period, for example, is probably not a good business plan). In those situations, there is only so much that can be done.

Because the Affordable Care Act Federal Exchange launch is such a rich source of material for examples on application performance, here’s a good article from NBC News.  Of particular relevance is the quote from Joanne Peters (spokeswoman for HHS) saying, ‘there’s no question we wish we had done more testing.’

2. Not enough time to resolve findings

Putting in a product lifecycle schedule with one or two weeks set aside for performance testing is really nothing more than performance validation and hoping service-level agreements are met. It gives no time to take any sort of corrective action if issues are found.

If performance is considered throughout the entire lifecycle, and if developers write code with performance as a priority, and if system administrators provision servers with plenty of capacity, and if product architects plan things from the start to be scalable and perform well, then the concept of the pre-production load testing will essentially become pre-production load validation.  But there are a lot of ‘ifs’ here for that dream environment to be realized.  Until then, the schedule needs to be flexible enough to handle problems that occur (including, perhaps, delaying feature launches).

3. Poorly defined success criteria

The prior concepts were related to having enough time to meet your goals.  What if those goals aren’t properly defined? You may have great results in the performance test, but find that when the product goes live, performance is horrible.  Having well-defined and accurate goals to meet is critical.  This includes both knowing what load to expect (preferably, with a comfortable margin of error of – say, 150% of what’s expected) as well as which functions that load will hit.  Usually, this issue pops up when there is a significant difference in expected load versus actual load, and sometimes those are simply unavoidable as you can’t always predict the future.

Part of the problem may be translating business expectations into hard technical requirements that can be measured by the load test apparatus and performance monitoring tools.  One of the most common examples I run into for this is having a goal of N users arriving on the system at any given time.  That’s a perfectly valid requirement for business management, and can even be used as validation for whether or not a system can pass the load test.

But while doing performance analysis when something doesn’t pass, I don’t care how many users are ‘on the system.’ I care what they’re doing on the system at any given moment in time (Riverbed SteelCentral™ AppInternals  goes down to the one-second granularity, so that’s how I define a moment in time).  There may be 1,000 user sessions active in the application, but at any given second, only 50 are doing any work.  So is the requirement to support 50 users or 1,000? Is there even a difference in those two numbers, depending on how you define the requirement?

This topic reminds me of a situation from many years ago at a prior employer.  Due to a major TV event promotion, we were asked to support hundreds of millions of requests within a very short timeframe.  There was no actual load testing that could be done, as we had no load test infrastructure at all.  Some napkin math told us that in order for the load we were told to handle to occur, every man, woman, child, and dog in the country would have had to click Submit in the same one-minute time period.  We tried to point that out, but in the end, those were the numbers we had to meet.

At any rate, 100 hundred additional servers were rapidly spun up, hoping that would be enough, because that’s all we could get in the time allotted.  Unfortunately (or perhaps fortunately), the actual traffic we received was in fact less than a typical peak workday, and all of those extra servers actually broke the content publishing system (requiring some poor administrator who now writes blog posts to periodically do a manual content sync whenever something was changed, every few minutes).  Rumor has it that if you were in one of the data centers near the racks of servers, you could hear them fighting each other for the next request.

4. Inadequate test setup

This problem is one of the more insidious ones.  Having plenty of time to test and well-defined success criteria do no good if you aren’t testing the right things.

This can present itself in a variety of ways.  Making sure function X is blistering fast when function X is rarely called, while not even testing the most frequently requested function Y, may give great performance test results, but they’re irrelevant.

The test deployment architecture can also impact this, where something had to be stubbed out or otherwise bypassed, perhaps due to the complexities, security concerns, or size of the live application (e.g., an authentication system with 500,000 users defined, or a database with a few million rows).  While the data content may not match, the data quantity in the load test system should be similar to what is expected in production.  If there is a non-linear breaking point somewhere in there, pre-production load testing will never find it until you size the test system to be past where the breaking point is.

5. Lack of familiarity with environments

Knowing what’s going on both in production and the load test environment is critical, as is having a deep understanding of the actual architecture and sizing requirements of each component (including external dependencies).  One example of why this matters can be found from a dependency tier that normally would not be thought of – reverse proxies.  If they’re undersized in the load test environment, they may become the bottleneck that kills performance.  Replace reverse proxies with authentication servers or load balancers and the result is the same – a failed run due to incorrect or inadequate architecture that may or may not be directly related to the application being tested.

6. Inability to verify responses

Not being able to verify that the correct responses are being returned is another problem.  This can be as simple as not noticing every request returning an HTTP 404 or HTTP 500 (even if they do so at supersonic speeds), or a request returning an HTTP 200, but still not returning expected content.  Sometimes this is more functional in nature, which the warm-up smoke test can detect, or it could be a built-in, self-throttling mechanism that is not outwardly visible.  Reason #5 can come into play here, too, to know what to look out for in terms of verification.

7. Failing to observe performance test results

This one shows up when enough testing has been done to clearly show the application won’t perform well in production, but it gets launched anyway.  This is worse than not doing any testing at all, as all of the effort to resolve the problems get ignored, and all of the negative effects of having the problems will remain.  That’s an extra cost on both ends, and you’ll still have to fix the problem.

The only corrective action here is to pay attention to the performance test results, or be prepared to live with the consequences.

What have your experiences been with application performance load testing?

Related reading: