Editor’s Note: This post is part of our Big Ideas series, a column highlighting the innovative thinking and thought leadership at IIeX events around the world.
Let’s face it – No captured text, be it from a survey form or on social media, can be analyzed with 100% objectivity. Still, it’s obviously useful to analyze text quantitatively and market researchers have used text as input for a long time, due to its versatility and breadth. But we cannot pretend that any Text analysis is free of ambiguity.
Reasons for this uncertainties are
- The text itself doesn’t contain the full information/context or
- The person or AI tool analyzing the text is either biased or inconsistent
The Source of Issues
Often, these issues are interconnected and occur together: The lack of context in short texts makes biases in the analysis more apparent. For example, one could understand the statement “Good service” in a Telecommunications context as “Good customer service” or as “Good network service”. A system or a person that would always assign “Good customer service” would be consistent but highly biased, shifting the analysis results in a specific direction, in turn causing the research buyer to think that customer service is more important than network service. Recently, AI-based automated systems have emerged that are at least in principle able to analyze text more consistently as they don’t get tired or distracted.
When evaluating the correctness or the accuracy of such automated systems, market researchers often compare against manual coding which is the current gold standard in text analysis. However, they tend to forget that manual coding is also biased and inconsistent, especially when coders need to keep track of hundreds of codes which sometimes are notoriously difficult/impossible to distinguish. We compared the results from different professional coders with the exact same codebook on the exact same data and found surprisingly low agreement across a variety of studies.
Keeping it Up to Code
In our anecdotal evidence, consistency can be greatly improved by a good and concise codebook. Bias, on the other hand, can be reduced intuitively by letting many different coders work through the same data and then averaging the results. However, this is very tedious and also prohibitively expensive. I would argue that a better, much faster and cheaper option is to use an AI system that learned from as many different manual coders as possible. AI systems are well known to be biased, especially when being trained on a single data source  but by learning from a diverse set of coders with different biases, the AI can learn to act as an “average coder”, resulting in an analysis with reduced bias compared to a full analysis with a single coder.
Join our talk at IIeX North America to find out how we compared human coders and different AI-based systems for a large-scale study in Latin America and discuss novel ways to improve quantitative text analysis.