Text Analytics for Incident Ticket Classification in IT Operations
By Vishnuteja Nanduri,
IT Operations Analytics (ITOA) is a relatively new domain that has come of age over the last 3-4 years. ITOA deals with analysis of server and application logs, sensor data from IoT devices, incident ticket data from IT service management systems, along with other machine and human generated data. It is well known that several gigabytes, and in some cases terabytes of data, are generated in a typical IT system during a day. Many have noted ITOA as the quintessential Big Data Analytics use case. This article is focused on the problem of IT incident ticket classification. According to the Information technology Infrastructure Library (ITIL) an incident is defined as an event that is out of the ordinary course of operations in an IT system . Typical examples of incidents are service not being available, disk-usage threshold breach, application log-in failure, etc. IT incidents occur on a continual basis within any IT infrastructure. A typical IT system may have thousands of incident tickets with different levels of service-impacting severities (Severity 1 or Sev-1 being the most critical to Severity 4 or Sev-4 being the least critical).
Brief Problem Description
IT infrastructure of a large company may have hundreds of servers, sometimes even thousands. Such a system may result in thousands of unforeseen incidents (and resulting tickets) with different severities each month. In most IT incident management systems incident tickets would be handled as follows . [Let's focus only on user-generated tickets and not auto-generated ones]. Support teams have the ability to identify incidents and log them. The system would then allow them to categorize the incidents based on severity or priority and add an initial diagnosis of the incident. This is followed by escalation of the incident based on the service level agreements, which lead to follow-up investigations and diagnoses of these incidents. This is a largely manual process. The incident may then be resolved and the support team enters the summary of the resolution applied into the tool and the ticket is closed.
Trends and Insights
When IT managers or other executives wish to get a pulse of their IT system to understand hotspots and see where the resources are spending their time, they use standard reporting techniques. Reporting using standard analytics techniques can usually only get them information on the number of tickets, severities, different owner or resolver groups, and other basic trends. How about the wealth of information hidden in the text comments of incident tickets? This information can and should be mined to obtain valuable insights that are otherwise lost in the quagmire of data.
Several IT giants have employed numerous, in-depth, text analytics techniques for extracting information from service management tickets. This is nothing new. In this article, I will very briefly discuss one such approach known as topic modeling using latent Dirichlet allocation  to extract insights from text information in the tickets. (The description below is at an admittedly high-level. I have skipped some gory statistical details about LDA for the sake of brevity and ease of understanding). Text analytics of information in ticket data can help IT managers understand what the frequently occurring issues are and where resources and time are being spent so that broader preventative measures can be initiated. For example, increase server capacity, better workload balancing, resource scheduling, IT resource refresh, and so on.
High-level description of the text analytics approach
The text fields within the incident ticket data are isolated and pre-processed by removing extraneous numbers, punctuation, and stopwords (a, an, the, for, is, as, was, before, after, etc.). This pre-processing is intended to make the analytics more meaningful and results more understandable. Topic modeling is an unsupervised machine learning approach that identifies 'themes' within text data. For example if some newspaper articles describing the travels of Pope Francis, the NFL, and the recent resignation of John Boehner, are put through the LDA model, it would categorize these documents thematically into those topics. The output of the approach would be a set of words (see Figure 1 below) for each topic, which can be labeled with the help of subject matter experts later. This approach can be used for the summary text fields within incident tickets and then the resulting topics can be labeled. This labeling can also be automated using a supervised learning approach (I will leave that for a later article).
So what's the big deal?
These topics would come in very handy in providing the IT managers and executives with a picture of hotspots within the system. Imagine a scenario with tens of thousands of tickets being categorized into 10 or 20 topics to be examined by subject matter experts and IT managers. Isn't that better than having to manually examine tons of text data or missing out entirely on insights by not analyzing the text data at all? Absolutely yes…
Figure 1: High-level view of text analytics on Incident tickets
Opinions expressed in the article are those of the author and not of his employer.
1. Information Technology Infrastructure Library (ITIL), http://www.itlibrary.org/, last accessed on Sep 21st, 2015
2. ServiceNow Incident Management Procedure: http://wiki.servicenow.com/index.php?title=ITIL_Incident_Management#gsc.tab=0, last accessed on September 25th, 2015.
3. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.
About Vishnuteja Nanduri
Vishnuteja Nanduri is the Practice Leader for Data Science and Engineering in IT Operations Analytics at IBM. He currently leads a team of talented data scientists and data engineers in ITOA. He holds a Ph.D. in Industrial Engineering from the University of South Florida, and has been an active researcher in the domain of analytics and machine learning for over a decade. Follow him on twitter @drvnanduri.