Quality of Experience Management: The (Next) Holy Grail of IT Management- PT 2

 

By Michael Halperin

 

In our last installment, we introduced the concept of Quality of Experience (QoE) Management.

In this installment, we explore how QoE Management differs from traditional IT monitoring and how adding just one component can change everything.

Let me first say that the issue isn’t with existing IT monitoring tools and approaches.  For over two decades, IT departments have been relying upon an incrementally improving set of monitoring tools to identify and understand the incidents and events that occur within the IT infrastructure.  These tools provide visibility– the ability to see what’s going on.  Every year, new tools come out to provide better visibility to the ever-changing array of devices, systems and services that comprise the IT environment.

While this ongoing change in visibility tools is evolutionary, the advent of Quality of Experience (QoE) Management will be a revolutionary change.  Yet it requires the addition of just one more component to the traditional monitoring approach. This powerful change – the difference between traditional monitoring and Quality of Experience Management – can be summed up in just one word: context.   That is, a foundational understanding of how each component of the infrastructure is being used.

Context itself has two perspectives.  The first is the usage of the infrastructure over time by users performing various functions.  The second is the usage of the infrastructure at a particular moment in time (specifically, that moment when an event occurs).

But understanding context is the hard part.  Most IT departments will tell you they do understand their users.  But most likely, that knowledge is in the heads of the individuals within the IT department, not within the systems that monitor and manage the IT environment itself. And while that institutional knowledge can be vital to providing a good level of service, the true power of context comes when it is combined with traditional IT monitoring to enable a revolutionary change in how users experience IT.

Definition of context within a systematic framework requires a structured approach, and there are three components that together give us the three-dimensional view required to understand the context of user activity as it relates to the IT infrastructure. These three components are:

  • Business
    Services
  • Infrastructure
    Mapping
  • Benchmarking

We use the term Business Services as an umbrella term that includes a wide variety of end-user facing functions.  It could be an application flow (e.g., all the hops in the transaction when a user accesses an ERP application – and the underlying database structures – from their desktop). It could be a functional protocol like SIP, point-to-point VPN or VoIP. It could be an extensible functionality like remote access, and it could be a combination of all of the above like Unified Communications.

Infrastructure Mapping is just what it sounds like. Every Business Service relies upon one or more components of the IT infrastructure, including end-user devices, WAN, LAN, security, servers, storage, applications and middleware.  Once we know what Business Services we support, we must understand how specific Business Services use the underlying IT infrastructure.  For instance, if our user is accessing an ERP application from home, then their local device, their internet connection, their VPN connection, the WAN link, the LAN, the app server, the database server and all the supporting storage are all links in the end-to-end Business Service, and therefore part of the Infrastructure Map for that Business Service.

Benchmarking is a data collection function that answers the question “what does normal look like?”  By definition, it can’t be a snapshot, but requires monitoring over a period of time to understand the normal ebbs and flows of performance.  Benchmarks include performance of specific individual components (e.g. a server), interfaces between components (e.g. the response time of a database server to an ERP application) and transactions across multiple components (e.g. normal response time of the company portal to a remote user across the VPN).

Traditional IT monitoring tools can measure the performance of specific components of the infrastructure.  But when all three of the components of context are in place, QoE Management compares performance against benchmarks and maps performance to specific Business Services. If a specific infrastructure component begins to behave outside the normal benchmark, a flag is raised and our Infrastructure Map identifies all impacted Business Services. We then have the information we need to warn or re-direct users who are engaged with that Business Service, and therefore most likely to be impacted by the event.

Over time, IT is able to evaluate where variances from benchmarks are most prevalent, then use the Infrastructure Map to identify which of those are most critical to users.  That allows IT to focus resources on those “hot spots” most likely to result in future issues.

The impact of all this on users – and the Lines of Business those users support – is obvious.  Also obvious is the question “why isn’t everybody doing this?” That answer will be provided in the third and final installment of this series.