In a fictional letter to her grandmother, Natacha Dagneaud explains the complex topic of artificial intelligence. She whittles the nuances of AI down to an understandable and enjoyable read - easy enough for even a grandmother.
Editor’s Note: In a fictitious letter to her grandmother, Natacha Dagneaud, founder of Séissmo, explains the use and benefits of artificial intelligence in the analysis of texts. We recently announced an informal agreement with Marktforschung in Germany to share blog posts with each other. Below is the second in our series from Germany.
You know that I have such a funny job: I watch people for hours, I ask questions and then interpret the answers – while I pay close attention to the silence as much as the loud laughter … Because most people are lying. You taught me that early.
Imagine: Today there are computers that allow us to take a fresh look at how we do our work. In these long interviews, a lot of words are spoken. For example, we had 27 women in the midst of menopause keep an electronic diary (via an online blog, imagine such a thing) for seven days, and suddenly had to evaluate 189 full pages of answers. That was 92,669 words! By the way, the counting of words is not rocket science, it can be done using the program Microsoft Word – this should not be confused with Artificial Intelligence.
Yes, you taught me to read so early on, (do you remember the forbidden loan of Boris Vian’s novel “I’ll Spit On Your Graves” before my 15th birthday?) But to read so much and dissect everything? And in a few days?
Or like the other day: we interviewed 40 people who do their grocery shopping in different supermarkets, using the “cognitive interview” method, which we ironically stole from the police. In these interviews, a person recounts for over an hour and very freely (but always focused, immersed in his world as if he were going through the episode again), which he himself perceived as a very concrete event. This is a scientific way to increase memory to get more accurate testimony. I’ll do that with you when you’re looking for your keys again.
Anyway, on average, we have 4,500 words per transcript – that’s 180,000 words (with no superfluous characters!) that my brain needs to process. It is clear that I filter a lot – of course, completely unconsciously. I notice certain aspects better because they trigger something in me. I know that I have to be very vigilant. My colleagues too – nevertheless we are humans and not robots.
That’s why, a year ago, we sought contact with providers of “smart systems”, and have now been working together with a company called Synomia on specific projects, for more than eight months. They are based in Paris but can handle many languages in their system (we currently use English, German and French). So far, they are working with our quantitative colleagues, who process rather less complicated sentences, but more people. For us qualitative researchers it is the other way around: we interview fewer people, who produce more text.
We work very closely with their R&D to clarify our needs and requirements. For example, the context from which each word and every sentence originated was very important to us. They have built an extra feature for us, which allows us to get at the original transcripts of a consumer to easily see the previous sentence. We get access to a very private platform where our texts – our subject interviews – are “stored” until we “code” them, thus assigning them to a meaningful topic. We can sort, search, group, categorize topics and sensations. The chaos can be mastered.
But why should it be interesting to count words at all? After all, not all people talk the same way. And with this system, there is a risk that the talkative will be heard more. But we usually don’t have much to say when the topic leaves us cold. Do we not instinctively know that we can talk a lot about things that matter to us? Yes, it does not matter what people may actually think about advertising, as long as they fiercely argue a lot – because while they “say a word about it (!)”, they bring out the message again!
That is why we have to respect the “crowd” – especially as qualitative researchers.
I would like to tell you how the introduction of this software (they are called SaaS, sort of like your subscription to Reader’s Digest, only virtual) changes how I do my work. It is a bit like the washing machine when it became used widely – you have to sort the laundry carefully and set the right temperature yourself. The machine does not replace your thinking, but it does the dirty work and can be pretty sensitive if you control it correctly. But if you pack the wrong thing in, nonsense comes out.
It’s the same with Artificial Intelligence and syntax software (that just means the computer understands the sentence by its grammatical structure). The clean analysis of verbs, adjectives, subordinates, pronouns, syntagmas … makes me smile that Latin lessons are not in vain. Above all, this presents us with some big tasks:
According to which principles do we sort the “laundry”, ie the texts? May we include the sentences of the interviewers and moderators or will they “rub off” on them? Because some interviewers talk more than others – sometimes too much.
How do we get from audio recordings to electronic transcripts? This costs time, money and requires a great deal of training from the loggers.
How do you insert punctuation marks? The machine considers sentences or phrases as a unit and “cuts” them along with the punctuation. When a person describes an experience (“and … mmmh … then …”) is it all a sentence, so something unified? Or may our transcribers (and should they) put dots or semicolons in between? At the end of the day, this affects the total number of verbatim comments we use to weight the topic. Not an insignificant issue at all!
How do we explain to our customers and clients that the now appearing percentages are related to words, not human beings? There can’t be any talk of statistical relevance there.
Through this machine process, words are recognized, classified and counted. Up until now, we’ve done this with our heads: we’ve identified meaningful units, written them out, and then written them in reports that usually included long sentences. Our evidence for our conclusions was the consumer verbatims and our interpretation. No one would have thought to count individual words or phrases. It would have been completely insane to do that by hand. And that’s the interesting thing: our previous kinds of analyses are still preserved, it is still necessary, but the machine opens up further possibilities for processing our raw material and allows us to look deeper. For example, I personally love focusing on the verbs while analyzing because they betray the intention of the person.
How do we solve the possible unequal treatment of talkative and taciturn respondents? By comparing different topics, everything stays relative. If in the test of a new, natural, hair coloring product, the theme of “henna scent” or “farm scents” suddenly and spontaneously appear in the reproductions, then the same proportions between the two types – the quiet and loud interlocutors – can be found. This justifies the legitimacy of these topics and their relative importance in the overall picture.
Finally, I have to tell you a few funny anecdotes, which I discovered thanks to the AI, which had not captured my IQ.
Once it was about the meaning of bread. It was the eighth most common word in our recent food purchasing study. Of course, I knew that bread was very common in shopping, and is very important in a French-speaking country. Nobody needs an expensive computer to figure that out. And yet it once again put the importance of this product category in the right light, especially if this information could be coupled with the sensory impressions in the store. Because then it was clear: Without the smell of freshly baked bread, there is no joy in the store!
On another occasion, I was amazed when I discovered the value of a negative. I tend to pay more attention to statements about what a person has done, chosen, taken in hand … But the program is so clever that it picks up meaning from negatives that occur in a sentence. Because of that, I was able to find out in which shops the shoppers showed some kind of avoidance behavior. In the more expensive supermarkets, they practiced a kind of self-restraint, so they reported disproportionately often “do not buy/do not take/put in my cart”.
In an anti-dandruff shampoo, there was a formula with orange-shimmering color and one with a white, creamy consistency. The manufacturer wanted to know if the orange color was a problem. Instead of asking people directly, we let them share their hair-washing experiences and immediately saw how product color was a remarkable pillar of product identity with strong associations.
Voilà, Grandma, there is still much to say and I promise to keep you informed – but again, in this letter, the words are numbered.