Download the
Join VisionCritical and Forrester's webinar exploring the playbook of insight-driven business on April 4

The Challenges Of Text Analytics From Clients In The Trenches

text analytics


Editor’s Note: Text Analytics is a technology that is poised to cause great disruption (and create real sustainable value) for many industries, especially MR. But despite all the potential, how are client’s using it and what challenges are they facing? Today’s guest post by Alesia Siuchykava of the Text Analytics Summit West answers some of those question with a series of interviews on the challenges that clients are facing today and their hopes for the future.

By Alesia Siuchykava

We recently sat down with some forward thinking text analytics professionals speaking at the 12th Annual Text Analytics Summit West 2013 to see what challenges they are facing, how they are overcoming them and their thoughts on the industry as a whole. You can read their responses below.  

What is the main challenge in your work with text analytics and how are you overcoming it?

Mark Pitts, Senior Vice President, Analytics, SourceHOV; Former Director, Data Science, Solutions & Strategy, UnitedHealth Group:
I would say the main challenge in my work with text analytics is in obtaining and understanding the textual data.  Obtaining the data can be a challenge since the data typically reside in source systems that were not created with analytics in mind – electronic health records are a great example of this.  Extracting the data can be a burden on the performance of the source system, and can take a long time. Understanding the data can also be a challenge, since many interesting data sources are rife with jargon, abbreviations, and specialized language.  You often need a subject matter expert to determine which terms are important, which are synonymous, and so on.

Gabor Szabo, Senior Data Scientist, Twitter:
The challenge in my work in text analytics is to validate the algorithms, as usually it’s hard to get labeled data and otherwise you need human judges.

Rahul Saluja, Manager – Web Analytics, Home Depot:
In my opinion there isn’t a single challenge rather a set of interconnected reasons contributing towards the slow adoption of text mining.

-Lack of skilled resources: It is very hard to find skilled and experienced resources in text mining area; resources can either be found with little or no experience or with deep experience applicable to a specific business situation in an industry

-Lack of appetite to digest text analytics outcome: Text analytics require significant time commitment; many corporations in general choose to take a shorter route of basic reporting on text data rather than doing advanced data mining.

-Infrastructural constraint: Advanced text mining tools are still pricey thus inhibiting many companies to make an investment

Allen Thompson, SVP Corp & Comm Analytics & Reporting, Bank of America:
Most of the problems are around data:

-Where is it – so many departments have this data and those departments can be spread across a large organization – so even knowing who to contact. Also a lot of this data can be outside the company.

-What are they collecting – many times there are no standards from department to department on what the collect from customers.

-Where do I put it – again this type of information is very different from normal transaction type data – so it needs to be stored in a way that its easy to use and cull information from

-Whose is it – how do I ensure I am tying all the right data back to the specific customer it pertains to.

-How do I analyze it – the final point again is how do I take all of this data I have found, collected and stored.

Judy Pastor, Principal, Operations Research & Decision Sciences, American Airlines:
After deciding on which software to purchase and use, convincing our business unit partners that using text mining to discern customer interests and attitudes about travel from social media posts is not an invasion of privacy

Dave Tomala, Sr. Director of Analytics – Knowledge Solutions, Express Scripts Inc:
-Being a healthcare company, HIPPA compliance is a top priority.  We’ve therefore gotten really good at de-identifying structured data over the years.  Unstructured text data is a different animal.  Consider the ambiguous fragment “ROSE COLORED EYES”  Is this an irrelevant statement about a patient named Rose?  As in “Rose colored eyes.”  If so, inclusion of this statement could render the entire dataset protected health information (PHI) and would place restrictions on the use of the data.  On the other hand, if this is the description of an anonymous patient presenting with ocular rosacea (“rose-colored eyes”) then this statement alone does not qualify the data as PHI, but millions of other such statements in the text corpus might.

-Some options for overcoming this problem exist today.  For example, ontologies that include robust lists of names.  But today’s creative names and vast diversity of traditional names renders any of these lists far from perfect. You can get a little better results by including syntactic and contextual elements (a single hyphen or period in our example would settle the issue). But the real world nature of the data gathering process means that these cannot be relied on to be there always.  We’re confident that as progress continues in this area, this nut will be cracked.

Janine Johnson, Director of Analytics, ISO: 
-Educating business leaders about opportunities and challenges of using unstructured data. There is not a great understanding of what text can add to a process or how to make this happen. It is constantly necessary to keep evangelizing the topic and promoting the benefits of text.

-Stemming from the first point, it continues to be a challenge to demonstrate a clear ROI for text related projects. If the project does not include an explicit cost savings, but is reliant upon increased revenue, then the business case continues to be difficult to make. 

What are the major trends currently in text analytics?

Mark Pitts, Senior Vice President, Analytics, SourceHOV; Former Director, Data Science, Solutions & Strategy, UnitedHealth Group:

-The advent of hardware/software that enables, rapid, parallelized extraction and loading of data with little impact on data sources.

-The advent of high-performance analytics, particularly the capability to calculate Singular Value Decompositions (SVD’s) on large term-document matrices in a distributed fashion.

-The advent of high-performance machine learning algorithms that enable rapid training over very large, high-dimensional vector spaces.

Gabor Szabo, Senior Data Scientist, Twitter
-Algorithmic understanding of social media comments

-Finding trends and trending events in text streams and news

-Identifying population sentiments about given topics in social media

Rahul Saluja, Manager – Web Analytics, Home Depot:
-Companies are slowly recognizing the importance of analyzing text data to understand customer sentiments, however currently it is limited to creating high level reports. 
-Several big name analytics tool providers have come-up with text mining engines thus leading to a future with availability of low cost text analytics tools. 
-In recent years, text analytics has made a significant footing in social media analytics impacting brand management and viral marketing promotions

Allen Thompson, SVP Corp & Comm Analytics & Reporting, Bank of America:

-Government – the government has gotten into text mining in a big analyzing unstructured data for national security.
-Uncovering new trends – historical data is good for helping to predict future things – as long as the environment is consistent.  Text analytics help me uncover new trends.
-Customer sentiment – how do my customers really feel?

Judy Pastor, Principal, Operations Research & Decision Sciences, American Airlines
-Opinion versus sentiment analysis
-Mining intent to purchase
-Combining topics and sentiment

Dave Tomala, Sr. Director of Analytics – Knowledge Solutions, Express Scripts Inc:
-Fusion of text with structured data to achieve results that cannot be achieved by either data set alone.  For example, we see increasing use of ensembles that combine text predictors (eg: naïve Bayes classifiers) with predictors based on structured data – often with superior results.  We also see structured data being used to enhance the unstructured analysis.  A good example of this are cell phones that interpret ambiguous sounding names by referencing the structured contacts list.

-There is a proliferation of domain-specific ontologies.  Naturally, problems solved in different industries attach different meanings to the same labels.  Think of the vastly different meaning of the word “short” to an options trader and a pediatrician respectively.  There is performance to be gained by creating domain-specific ontologies.  But there’s a cost.  As people get more comfortable with these techniques they will naturally look to collaborate on cross-industry projects.  

-Text data scientists are rare and increasingly in high demand.  Certainly supply will grow, but the trends in text data accumulation and analysis will probably continue to outstrip supply for some time.  If you are lucky enough to have some good text data scientists, treat them well!

Janine Johnson, Director of Analytics, ISO: 

-Additional and more sophisticated use of social media to understand the voice of the customer

Better visual tools for understanding relationships in text

-More automated methods for extracting insight from text

Please share...

Join the conversation