The research & insights industry — that’s market research and consumer insights — is having a hard time coming to grips with social media: chaotic, unreliable, hard to quantify… and yet an incredibly rich source of unscripted conversation. As a researcher (or a research client), how do you make sense of social, particularly when you’re accustomed to methods that allow you to ask direct questions (via surveys) and guide conversations (in focus groups) and observe and measure reactions in controlled settings? We have yet to crack construction of scientific samples of social-platform users, lacking which we can’t report statistically significant findings.
Nonetheless, research & insights professionals are working to modernize methods, to accommodate social insights. TNS data scientist Preriit Souda — 2011 ESOMAR Young Researcher of the Year — is on the front lines of this work.
Preriit graciously submitted to an interview — hard for him to find time, given a grueling schedule — in the run-up to the LT-Accelerate conference, taking place November 23-24 in Brussels. Preriit and other insights, customer experience, media & publishing, and technology leaders will be presenting on applications of language technologies — text, sentiment, and social analytics — to meet everyday business challenges.
Here, then, is Preriit Souda’s explanation on how to obtain —
Deeper Insights from Networks PLUS Content
Seth Grimes> You have remarked that too much of today’s social media analytics relies on antiquated methods, on little more than counting. So you have advocated studying networks and content in order to derive deeper insights. Let’s explore these topics.
To start, could you please describe your social-conversation mapping work, the goals and the techniques you use, the insights gained and how you (and your clients) act on them?
Preriit Souda> Networks give structure to the conversation while content mining gives meaning to that structure.
People talk about structures of conversation styles based on network analysis. I have used networks to better understand conversations on Twitter, Facebook, Tumblr, Twitter + YouTube, Weibo, etc. While these are good analyses, if you look only at a graph, often patterns formed don’t make sense. Unless you add content mining to understand these structures, you get wrong interpretations. When you use content analysis to guide network analysis, a complete picture emerges.
In addition, clients get excited when seeing the networks (because it looks cool), but then they ask why/what/how. To answer you need content mining. For any significant insight, you need both.
For example, I worked on a campaign analysis. The campaign was handled by a big ad agency and its success was reported in a big advertising magazine. The network graph showed a decent amount of volume. But certain patterns raised questions about the conversations between certain tweeters. We looked at our text-mined data and found that these guys were artificially inflating the tweets and hence the impressions. Using both network and text mining together helped us uncover that the actual volumes reported were much less.
Further, we use text mining to understand sources of negativity or positivity. We use text mining to measure volume of brand imagery and perception changes with time and then use network graphing to see spread.
Seth> Alright, so networks plus content. Any other insight ingredients?
Preriit> Apart from studying networks and content together, use of social meta-data in collaboration is quite important. Also the idea of analysing different social networks differently (because each has a different character) and then merging “findings” is important but missing today.
Finally, clients need to use social data in conjunction with other sources of insights — survey, CRM, store data, e-commerce etc. — to get the complete picture. When social is understood in conjunction with all these pieces of the jigsaw puzzle, true impact is realized. Social media analytics needs to up its game to be a part of a larger overall picture.
We need insight-oriented analytics and not simply counting of likes and shares.
You referred to “sources of negativity or positivity.” What role does sentiment analysis play for you and for TNS clients?
I will try to answer this question using a broader term — content analysis — and then delve into opinion mining. (I like calling it tonality analysis).
Content mining is the most important part in any social media analysis we do. If you do the conversion of unstructured data accurately and insightfully, subsequent analyses will make more sense and be quite robust. Else, if your content mining is crap, all your following analysis is better not done! The basic pillar of any analysis is data. Unstructured data can’t be used directly. It has to be converted into structured data and hence your text mined data becomes data feeding your models. Nowadays, I have seen people in analytical/consulting firms building econometric models based social data. When I question them on their content mining I realize that I can’t rely much on their analysis because the very conversion of unstructured to structured data is faulty.
If you don’t spend time in being creative, insightful, comprehensive and accurate at this stage, I doubt your analysis.
Coming to your question on sentiment analysis: We look at sentiment as a part of content analysis. In some cases, clients need simple +/- while in some cases clients are more insight focused and need to understand different shades of opinions with respect to different entities (brand, product, services, etc.) while some want to further understand shades with perceived linkages with different attributes and imageries.
We create customized opinion mining algorithms for every project, client, and sector because every situation is different. Machines can’t understand the difference between someone speaking about nuclear topics from a political angle vs. a scientific angle vs. an educational angle.
Clients expect insights as robust as from traditional research methods like surveys or focus groups and other forms of research. While in a survey/focus group, you are explicitly asking people questions, in social you are mining people who are speaking in a natural environment. So we have to understand context and how what people say can be linked to answer explicit questions otherwise answered via a survey. For example, in survey people are asked questions like “Do you associate Brand X with trustworthiness?” while in social no one will use that lingo. So I have to find ways how people refer to such concepts. And then link it up to quantify opinions. So for us opinions are not simply +/- but much more than that. These things make our life difficult but also exciting.
You advocate use of text mining for meaning discovery, to get at explicit, implicit, and contextual meaning in customer conversations. Could you please give an example of each type?
Well, different people use these words in different manner. Some people might disagree with my definition or some may call it differently but what I am referring to is as follows.
Explicit meaning: Say, people using the word Barclays and talking about its bad service
Implicit meaning may be broken out as —
- Referential Implicit: People don’t use the word Barclays but share a URL (about Barclays) and express their opinion with respect to Barclays.
- Operational implicit: Saying something after seeing a YouTube video or in reaction to a Facebook post.
- Conversational implicit: Talking to people who have a very high probability of being linked only to the topic you are mining for. They might not use the words you are looking for, but there is a very high probability that they are talking about things of your interest.
- Using images to express: Sharing pictures with minimal words to express their opinion.
Contextual meaning may also be broken out —
- By Geography: Certain words mean differently in different geographies and hence the importance you give to them, in order to understand intensity, varies. Plus often we need to tweak our algorithms to take into consideration different lingo styles of people from different origins within a given geography.
- By Sector: Certain phrases or words mean differently by different subjects and context. When interpreting those words or phrases, context has to be properly understood by our algorithms.
- By time: Meanings of certain words/phrases change by time or are influenced by ongoing events. So one algorithm is right at certain times but at certain times it can be wrong. For example, when people say positive things about Lufthansa airline staff, that translates to goodwill for the airline. But during adverse times — in most cases is negatives expressed against management or the brand in totality — staff may be misperceived negatively.
What text analytics techniques should forward-looking researchers master, whether for social or survey research or media analysis?
I think I am using up a lot of your time, so I will try to keep it short. Without going into any technical details, I think linguistic library based techniques are useful along with machine learning techniques. So someone trying to enter in this area should be aware of both and be ready to use both. I feel that nowadays lot of people have a bias towards ML which is right in some cases but in some cases I don’t feel that it gives desired results. So I believe that a more combinative approach should be used.
What best practices can you share for balancing or tempering automated natural language processing, including sentiment analysis, with human judgment?
Different people look at this problem in different ways. I can talk about certain overarching steps which involve humans at different steps to improve results.
Start with a good desk research by the content analyst followed by inputs from a subject matter expert. At both stages create and refine your mining resources. Bring in social data and then further refine. Create your model and get it checked by a linguist along with the subject matter expert. Both will give their own perspectives and sometimes differences between them can help you refine your model. Test with new data across different times. (Social data is often influenced by events — some known and some unknown.) Monitor your performance till you reach around 70-90% perfection on agreed model outputs.
You’ll be speaking at the LT-Accelerate conference, topic “Impact and Insight: Surveys vs. Social Media.” What are the key challenges your presentation will address, and could you hint at key take-aways?
It’s been almost 3 years that I have been using social media data alongside surveys. It’s been a challenging ride and continues to present new challenges.
I will talk about some of things that I have talked about in questions above. I will talk about my personal experiences using social to answer client questions and possible solutions that I have found to work nicely in my context. I will also talk about some of the problems I face. I will try to use examples while trying to protect client privacies.
People can look at my past work to get a sense of my approaches and challenge me or make suggestions. My talk will be informal and I would prefer the audience be open in sharing thoughts. Here are a couple of items:
Finally, what’s on your personal agenda to learn next?
Learning Econometric Modeling and sharpening my skills in certain scripting languages.
Again, meet and hear from TNS researcher Preriit Souda — and research/insights leaders from Ipsos, DigitalMR, Deloitte, Xerox, and other organizations — at the LT-Accelerate conference, 23-24 November in Brussels.