Some things never change: application performance issues
Flashback to twelve years ago when I was working at Mercury Interactive on their managed service for application performance management (APM). That year, Italy won the World Cup (again) and Enron went down in flames. It was cutting edge at the time to use synthetic monitoring to measure response time for the end-to-end transaction from fixed Points-of-Presence (PoPs) globally. We talked about things like moving from reactive to proactive modes when managing application performance issues and the difficulty in finding the needle in the haystack.
Fast forward to today and we are still talking about moving from reactive to proactive performance management only our jobs have gotten a lot tougher. The proverbial needle in the haystack has become the needle in thousands or even millions of haystacks. Consider the following:
- DevOps approaches and agile development cycles that have improved alignment with business initiatives have dramatically increased the rate of change and associated risk.
- Distributed, dynamic architectures that rely on microservices, containers, virtual machines, APIs, and cloud services make monitoring the application ecosystem, troubleshooting problems, and understanding dependencies ever more challenging. Components are constantly spun up and down or moved around, making it tough to diagnose problems especially intermittent or environmental ones.
- Establishing and tracking SLAs becomes more difficult since provider agreements don’t take into account the last mile.
Below I address five common application performance issues and how to proactively address them. They may sound familiar but the devil is in the details—new ways to approach old problems.
Issue 1: It takes too long to troubleshoot incidents in complex, distributed environments
Downtime and poor performance of critical applications have a direct impact on the business. When application issues occur, you’re on the hook to detect, isolate, and fix the problem quickly. But with siloed, domain-specific tools, it can take days—if not weeks—to isolate and fix performance problems. To resolve complex and intermittent problems, it is critical to have all the data at your fingertips. APM big data (i.e. all transactions and all user metadata) can provide the context that transaction sampling strategies can miss. It’s especially useful for troubleshooting in dynamic and distributed environments such as those that leverage containers, microservices, virtual machines, and public cloud services.
Issue 2: You find out about performance problems from your users
IT Ops and DevOps teams still usually find out about performance problems from end users (according to Gartner, up to 70% of the time), after the business has already been impacted. However, these teams need to detect problems proactively so that they are working on isolating and fixing the problem before the business is impacted. While internal users may become numb to these chronic problems, external users typically have a larger selection of vendors, sites, and applications and can just switch to the competition. For both internal and external users, these problems can have a significant impact on business. Problems are often only discovered after end-users report an issue. That is why it’s important to measure and monitor all your business-critical applications (web, mobile, etc.) from the end user’s perspective and capture every transaction on every type of device imaginable. That way, you can surface issues early before revenue streams or brand reputation is impacted.
Issue 2: You can’t get to the root cause of recurring or persistent performance problems
Intermittent and chronic performance problems negatively impact end-user satisfaction and productivity and can prevent DevOps teams from focusing on new initiatives. That’s where analytics—including machine learning and artificial intellience (AI)—come in. Unlike humans, they can identify anomalies and patterns across petabytes of APM big data to quickly surface the most important issues across extremely large data sets. And, new types of visualization can help you better understand underlying application dependencies so that your team can focus on the specific fixes that will have the greatest impact on your business. As discussed above, data is key since any analytics are only good as the underlying data (and metadata).
Issue 4: New IT initiatives, new application performance issues
When businesses virtualize, containerize, consolidate, or migrate their data centers to the cloud, they expect to improve flexibility, cost, and control. They don’t expect to negatively impact application performance. When rolling out new applications or expanding existing deployments, it’s critical to ensure that the performance required by the business will be delivered. It’s also critical to manage and predict the effects of such infrastructure or application changes on application performance. However, businesses often find themselves dealing with unforeseen performance problems and lack the ability to baseline performance and highlight deviations. That’s why it’s important to monitor end-to-end application performance before, during, and after your migration, in order to manage expectations and minimize any application performance issues that can arise when migrating to the cloud or adopting new technologies.
Issue 5: Silos vs collaboration
Application performance isn’t a single group’s responsibility and has broad implications across IT operations, application teams, and business owners. It’s imperative in this environment of heightened application focus that the team is able to communicate about application performance broadly and in languages tailored to technical and business audiences. Role-based reporting can provide the necessary level of granularity for each specific audience and provide a way to communicate application performance issues in business-friendly terminology. It’s about reporting “Which customers are impacted by the performance issue?” vs “my query is too slow.”
What are your top 5? Let me know what you would add to the list or check out everything you ever wanted to know about APM big data in our white paper “Why Big Data is Critical for APM“.