For background, I am a marketing science and analytics consultant working primarily, but not entirely, in marketing research. All of my work is tailored to specific client needs, and I do not use standardized proprietary methods I have developed or licensed. Customized analytics is by no means rare and, in that respect, I am not an oddball. Why Customize? explains why decision makers frequently need customized analytics. The methods page of my company website will give you some details about the sort of work I do and statistical and machine learning tools I use.
There are at least six stages in most of my projects, and I would like to take you through each of them to see which aspects of my work can be automated.
Defining the Problem: This is the most important and usually most challenging part of my job. Requests I receive, either directly from clients, or via other marketing research agencies or data science consultancies, are typically vague. Who will be using the results of my analytics, how they will be used and when they will be used is frequently unclear. At times, requests are at the other extreme and consist of a very narrow question such as “How much do you charge for conjoint?” (To an analytics person, this can be likened to walking into a restaurant and asking, “How much do you charge for dinner?”)
Defining the problem is essential for me to determine if there is even a role for me in the project. It often requires considerable patience and tact, and this is true when I work with my fellow Americans too, let alone when language and culture are barriers. How can this part of my work be automated? If it can, then clients will have also been automated.
Deciding on the Analytics Method: It is not at all unusual for me to use many statistical methods in sequence or in combination for one project. In fact, this is typical. Once I have a good grasp of why the research or analytics is being proposed, I can then think more specifically about the methods I will use. The method or methods may not be what the client originally had in mind, but through experience I have learned to focus on decisions, not technology. This part of my job cannot be automated either.
Deciding on the Data: The required data may already exist, but this is rare. More commonly, we need to explore the interrelationships among many variables and, because of this, there will be gaps in existing data. What we have may also be too old to be useful, or we may have no relevant data at all! Customized analytics usually involves collecting new data suited to specific business objectives or assembling data of various types from several sources. I am normally heavily involved in this step, including sample and questionnaire design when they are applicable. Automating this part of the process is only possible when we are repeating a project that has been successfully completed. It cannot be done for the benchmark. Machines 0 – Me 3.
Data Cleaning and Setup: Exploratory analysis is an opportunity to kill several birds with one stone. When we are setting up the data for analysis, we also are usually cleaning it, recoding it and exploring it. We are learning from the data. Once again, the first time around, this step is not possible to automate, nor would it be wise to attempt to do so.
Data Analysis: Statisticians normally have expended at least 70% of their time budget before reaching this point. Though obviously undesirable, in some cases we may have already exceeded 100%! Analysis is typically squeezed, as is interpretation and reporting, and part of the reason is unrealistic planning or misunderstanding of the project’s objectives. Here, once again, only when the analytics are being repeated with new data is automation feasible.
I have heard it claimed that methods such as cluster analysis lend themselves to automation, but nothing can be further from the truth. There are a gigantic number of clustering methods (for example) and different implementations of the same method. There is no single measure of “fit” that will tell me (or AI) which is the best method, implementation or options to choose, let alone which variables to consider using in the first place. (In an “ideal” cluster analysis, each clustering variable’s score would differ in just one cluster and thus each variable would have low overall “importance.”)
In multivariate analysis, it is more the rule than the exception for competing models to provide nearly equal fit to the data but suggest very different courses of action to decision makers. More fundamentally, any model provides a simplified representation of the process or processes that gave rise to the data – “Essentially, all models are wrong, but some are useful” in the immortal words of legendary statistician George Box.
Greater variety of data and an explosion in the number of analytics tools now available has actually made automation more difficult, not easier. In his excellent book Statistical Rethinking, Richard McElreath of the Max Planck Institute makes a very important observation: “…statisticians do not in general exactly agree on how to analyze anything but the simplest of problems. The fact that statistical inference uses mathematics does not imply that there is only one reasonable or useful way to conduct an analysis. Engineering uses math as well, but there are many ways to build a bridge.”
So, who programs these tools? How do they decide which procedures or options are best? The more choices there are, the more complex the programming task becomes. AI cannot decide how to program itself. There is also a heightened risk of bugs. Yes, AI does have bugs.
If the project is going to be repeated again and again, and once we have decided which data and analytic methods are sufficient, then it definitely makes sense to try to automate these stages as much as possible. But not the first time around, and human checks will still be needed at least periodically. Machines 0 – Me 5, and my future is starting to look bright.
Interpretation and Reporting: This is the final step and raison d’etre of most projects. There is also implementation and assessment post-implementation, but these are big subjects and worthy of their own article. (Hint: The human stuff gets very hairy…) Unless the project is very simple, with a limited number of variables and very basic analytics, this final step is very hard to automate.
Tracking studies in which software can pick out basic trends lend themselves to automated reporting, but the opinions of a human analyst can greatly enhance the final deliverable. The vast bulk of my work, however, is customized and also too complicated for this to be feasible. An AI would need to be developed and trained for each of my projects. Let’s not forget that the core of AI are computer programs, not magic.
So, at best, my job could only be partly automated. Let’s be charitable to the machines and make the final score Machines 1 – Me 5.
I’m safe, for now.