Using APM (Application Performance Management), NPM (Network Performance Management) and EUM (End User Monitoring) Tools.
Ensuring good application performance in organisations is becoming increasingly difficult. The consequences to organisations of bad application performance are also becoming more significant, whether they affect employee productivity, B2B communications or the consumer directly.
The tendency is that the performance problem resolution process is taking up more resources as well as taking more time, i.e. cumulative time generally taken from initial identification of problems to their resolution. The problem resolution process typically requires going through the steps of detection, triage and diagnosis. The objective generally with using APM, NPM and EUM tools is to make these steps more efficient.
This is done by gaining a level of visibility into the underlying user experience, application components, infrastructure and networks. This visibility is usually provided by measuring different aspects of performance, gathering various metrics and parts of transaction flows as they traverse application components, also including some business context.
The measurements would then have some thresholds related to them, either manually defined or calculated from automatically determined baselines, which when breached would result in alerts being triggered, notifying the relevant support teams, probably via some messaging approach and/or visually via some appropriately located real-time dashboards. This would drive the detection part of the process.
The triage part of the process revolves around determining what area or areas the root cause of the problem probably stems from, sometimes referred to as isolation, together with an element of prioritisation, i.e. focusing on the more critical issues first and then going through the ones having less impact later. This would be done via drilldown with the tools in the dashboards and/or from the messages in the alerts.
Finally, the diagnosis part of the process is pulling together evidence from often various sources to confirm the definitive root cause of the problem. This diagnosis step of the process is very much like detective work, often piecing together the parts of the puzzle to determine the definitive cause of the problem. Sometimes more sources of data are useful just to double check a particular hypothesis, as resolution steps can sometimes take significant resources to implement and be disruptive in themselves, so the more certainty around the cause of the problem, generally the better.
Using APM and NPM tools to help ensure good application performance by making the performance problem resolution process more efficient is relevant throughout the organisation and throughout the application lifecycle. This applies for example in production environments to maintain good application performance operationally and in pre-production environments such as development and QA/performance test to ensure high-quality performing applications are delivered ready for release to production.
Ongoing approach for success
How good the organisation’s application performance continues to be and the actual effort involved in the performance problem resolution process will be dictated by a number of factors. Only one of the factors is the choice of the actual APM and NPM software tools, although in that regard the most important aspect is how good a fit the tools are for the technical and organisational environment.
The other very important factors are having appropriate people involved in the problem resolution process, and ensuring the process itself is efficient, employing the tools effectively. The quality of the initial implementation of the tools and on-going maintenance/refinement of them is also extremely important, in this way the information provided by the tools will continue to be trusted, otherwise there is a high risk that implementation will deteriorate and the tools not be used. In this case you risk failing to obtain any significant value out of the investment in the longer term.
In a typical organisation, there are usually a number of different monitoring tools and often used silo by silo, in that each team has a specific tool, for example, DBAs have DB tools and network administrators have network monitoring tools.
When parts of the IT function are outsourced to third parties, disparate monitoring tools become even more of a problem, as it is not necessarily in the third party’s interest to demonstrate that the area they are responsible for is the cause of an issue. Instead, it falls back on the organisation not the third party to police the service levels that they should be providing. Some organisations just use simple infrastructure monitoring solutions and try to infer application performance from the state of the underlying infrastructure it is running on. Whilst a reasonable approach many years ago, this is not practical with the distributed nature of applications nowadays.
With the focus recently on the cloud, hybrid cloud deployments, micro-services architectures, as well as agile methodologies, together with DevOps approaches, outsourcing or near shoring certain IT functions, applications are becoming more distributed and their components’ ownership more fluid. This results in management of their overall performance becoming increasingly difficult. When considering application performance, it is necessary to look at processes more holistically. With APM, NPM and EUM tools it is possible to piece together the bigger picture relatively easily, especially if there is some level of integration between the tools. This may be done through a level of application or transactional context.
Implementation and on-going use
The implementation of the chosen tools will need to be done as a project. Sometimes it can piggyback other activities for efficiency, for example, the introduction of APM agents for applications together with a new release or introduction/connection of NPM appliances with network changes.
It is important to understand the environment as much as possible before implementation, as there are typically many technical and organisational dependences. However, the process of implementation is also very likely to result in a much better understanding of the IT infrastructure generally, particularly how different parts of the infrastructure are used by various applications and corresponding services.
Any initial implementation of tools may well result in associated refinements in processes in the organisation and potentially for third parties. The right people need to be identified to carry out these refined processes and may quite possibly also need some relevant training.
The refined business as usual (BAU) processes will generally be split into 2 categories:
- Keeping the tools running, and,
- Effective performance problem resolution using the tools.
The first category of BAU processes will be new and related directly to the tools, whereas the second category of BAU processes are likely to augment existing performance problem resolution processes, generally making them more efficient with better visibility provided.
The refined project related processes will also have 2 aspects:
- Ensuring the tools, following any infrastructure, application or network changes, continue to run and include all relevant existing and possibly new data, and,
- Ensuring any required new/refined monitoring artefacts are created, modified and work for the changes introduced with the project.
The first category of project-related processes are associated with checking the tools, whereas the second category is focused on adding and modifying monitoring artefacts such as dashboards in the tools, as necessary, e.g. for new applications, transactions, etc. The project-related process refinements should augment existing processes to ensure the additional few steps related to the tools require as little additional effort as possible.
APM vs NPMD – EUM common to both
The terms APM and NPM are used quite broadly, often meaning significantly different things. Of the Analysts, perhaps Gartner does best in defining both of these areas by having APM and NPMD (Network Performance Monitoring and Diagnostics) Magic Quadrant analyses respectively. The most recent versions of these Magic Quadrant analyses can be found via the useful APMdigest website link http://apmdigest.com/gartner-2015-magic-quadrant-for-application-performance-monitoring-suites-is-published – click through to one of the vendor’s complimentary APM MQ reports – and use the following https://www.riverbed.com/gb/forms/Gartner-Magic-Quadrant-for-Network-Performance-Monitoring-and-Diagnostics.html link from our partner Riverbed for a copy of the NPMD MQ report.
This is generally the case, in that if the data sources are network packet or flow data, then the tool is generally in the NPMD domain, whereas if the data sources are agent based then the tool is associated with the APM domain.
EUM tools can be considered the bridge between NPMD and APM domains and thus common to both. The EUM tools can be used to initially detect application problems users are experiencing from the client perspective and give an initial indication as to whether it is a network, application client or application server based issue. But it is the NPMD solutions that will be required to effectively triage and diagnose network-oriented problems, and APM solutions that will be required to efficiently triage and diagnose application-oriented issues.
Choosing the right solution
Depending on a number of factors, it generally makes sense to choose to take a top-down or bottom-up approach with starting or extending an implementation. With a bottom-up approach, concentrating on depth rather than breadth, it would be appropriate to focus on a critical application with reasonable complexity, perhaps with a history of performance problems, that has a good technical fit for an APM solution. This usually means a Java, .net or similar technology based application. An APM solution would allow for improving all the steps throughout the problem resolution process for that critical application.
The alternative top-down approach, concentrating on breadth rather than depth, would focus on an EUM solution for a logical group of critical applications, allowing for good detection of performance problems. Covering a number of applications initially would allow for a more informed targeting of more in-depth implementations, such as APM on the more problematic applications later, or NPM if network issues seem to be more prominent than those internal to the application clients/servers. However, with the EUM first approach, there may be some frustration in only being able to detect problems without the ability to diagnose the problems easily at the outset.
How good is your critical application performance at your organisation?
Perhaps this is known already through an EUM solution or maybe it is only partially known due to feedback from some users explicitly contacting the support organisation. If the end user experience is not accurately measured, then there is obviously a risk that there are unknown application performance problems that can have a significant impact on the organisation’s performance. That might be direct revenue impact through poorly performing consumer/B2B applications, or indirect revenue/cost impacts driven by poor productivity through badly performing internal applications.
How efficient is the application performance problem resolution process at your organisation?
This is possibly a more difficult question to answer, obviously if the end user experience is already being measured with an EUM tool then it is easier. There are 2 aspects to measuring this, one would be mean time to resolution (MTTR) and the other the mean effort to resolution (maybe METR).
Some organisations may measure one or both of these metrics in some way. Obviously measuring both would be more meaningful as with additional effort, the time to resolution should normally be shorter and conversely the time would be longer with less effort. Other organisations might have a general understanding of typically how long it takes to resolve an issue and the effort involved. It is often still common to learn of application performance problems that take weeks and months to resolve, expending significant effort by experts.
Using EUM, APM and NPM tools efficiently, it should typically be possible to reduce MTTR to minutes usually and hours at worst with limited well-targeted effort. It should in fact be possible to eliminate performance problems entirely by proactive monitoring, recognising developing problems before they have an impact in production and using the tools effectively in pre-production, ensuring quality before deployment to production.
How to ensure good application performance using APM, NPM and EUM tools?
It is clearly not easy to use APM, NPM and EUM tools effectively. Due to the complexities involved, quite specialist skills, both technical and soft, are required to implement and manage the use of the tools. The tool vendors can obviously provide resources to help or some specialist consultants can be recruited for this purpose for the period of implementation and initial use, although once that person leaves the organisation it would be a challenge to keep the implementation current.
However, another option is to turn to an organisation that can help implement the tools and then continue to help use them through a managed service, ensuring value is delivered from them on an ongoing basis. Teneo is an organisation that can provide such initial implementation services and ongoing managed services.
As the APM Practice Manager at Teneo, having over 12 years of experience specifically using all the most prominent APM tools, I’ve worked on successful implementations, from small niche projects to complex enterprise-wide projects across many different industries. I’m well placed to understand an organisation’s requirements as well as their environment, providing guidance in establishing and sustaining an appropriate solution to ensure good application performance on an ongoing basis.