Identify Performance Anamolies Before Impacting the Business
By Graham Gillen
Director of Marketing, Netuitive
If you were the Regional GM for a major wireless carrier, wouldn't you be upset to learn that, while you were on vacation, no phones were sold in your retail channel for 4 hours? That concern drove the company – with 94 million subscribers and 2,000+ retail outlets - to expand its emphasis on IT Operations Analytics (ITOA) as the key component of their existing business performance monitoring architecture. More than ever, today's wireless providers are depending on their Point-of-Sales (POS) systems to satisfy enormous demand for phones, devices and new services through in-store, online, and channel sales partners. For one global wireless carrier this growth represents a business opportunity - but it also posed a technical challenge.
The carrier's POS systems supporting the various sales channels had become so sophisticated, complex, and distributed that ensuring or improving on Service Level Agreements was becoming problematic. And when outages or degradations did happen, there was no way to understand how this impacted the business. The challenge was to take a fresh new approach to application performance monitoring - one that leveraged advanced ITOA processes, technologies and services.
Why traditional tools fell short in dealing with these challenges
The wireless carrier had deployed tools from leading APM vendors such as CA, and Compuware. Their own business data metrics, such as devices sold, accounts activated, services added, and more, also provided crucial information on how well they were performing. And while they were collecting all of the data necessary for effective APM, the monitoring team had no way to make sense of it all. There was no help for understanding what constitutes normal behavior of components, infrastructure, or applications; which of the thousands of IT metrics are most meaningful in terms of business performance, or whether IT performance even has any impact on business metrics. These existing monitoring tools provided neither the cross-platform visibility, automated analysis, nor alarm accuracy required for proactive performance management. There actually was one occasion when an outage in the POS system covering an entire state went undetected for four hours until a business executive contacted IT to inquire if the total absence of sales activity was in fact true.
So the remaining technical challenge was to automatically analyze and correlate IT, customer experience, and business metrics in real time. This would help either to exonerate IT as a cause of reported business deviations or, when IT issues were impacting the business, allow IT to proactively alert business managers of the problem, rather than being unaware of it until called by irate users.
How ITOA technology helped to overcome the challenges and the resulting benefits
Following a successful pilot program with an advanced predictive IT analytics software platform, the company increased its emphasis on automated IT analytics to help overworked IT Operations staff make sense of it all. Or as one industry analyst puts it, IT Operations Analytics represents a "technology or service that collects, stores, presents, and performs deductive and/or inductive inferences about large volumes of IT operations data."
The wireless carrier chose to leverage its existing set of monitoring tools to provide key IT infrastructure, application, and business performance metrics to the ITOA solution powered by the new predictive IT analytics software platform. In just a few weeks, a service model of the POS system was developed incorporating over 6,000 components and tens of thousands of metrics. The holistic monitoring solution correlated the behavior of performance metrics from different sources such as:
- Customer experience metrics incorporating response times for various transaction sets
- Mainframe performance metrics for client-side transaction queues and transaction gateway performance
- Application server and load-balanced cluster metrics (for Java Server Page response times, app server and app cluster resource consumption, etc.)
- Sales data (including devices sold, smart phones sold, sales transaction amounts, etc.)
In one incident during the initial weeks of deployment, the predictive IT analytics platform detected a workload imbalance in the application servers supporting the POS system. This caused a slowdown in response times and degradation in key POS applications that ultimately lasted about 45 minutes. The correlated negative sales impact was tens of thousands of dollars. The predictive analytics platform has since been fully operationalized in one region as part of the ITOA solution and is now providing advanced warning allowing IT to take proactive action and prevent business-impacting incidents from occurring. The company reports that the solution is:
- Bridging the gap in analysis between IT and business KPIs
- Transforming the business by achieving 99.99% service availability for their POS systems
- Proactively detecting and sending early warning of performance anomalies before they evolve into outages
- Improving customer experience and reducing business impacting outages and degradations by at least 10%
Can predictive analytics really help maintain highe performance and stay ahead of problems?
Click to share your thoughts in The Forum
About Graham Gillen
Graham Gillen is the Director of Marketing at Netuitive. He has over 10 years of experience in Enterprise software in the areas of IT systems management; application performance management; middleware, and IT security. Prior to Netuitive, Graham held product management and marketing positions with VeriSign and webMethods. He also authors a blog (www.blackbookninja.com) that provides lighthearted career guidance to young product management and marketing professionals. He believes life is too short to work with boring products or rude people.