DATA, AI & the Population

[Big Ideas Series] For data collection to positively and effectively advance an AI-enabled world, data suppliers must be transparent about their work, and the public must be informed about the data-handling process.

Editor’s Note: This post is part of our Big Ideas series, a column highlighting the innovative thinking and thought leadership at IIeX events around the world. Preriit Souda will be speaking at IIeX North America (June 11-13 in Atlanta). If you liked this article, you’ll LOVE IIeX North America. Click here to learn more.

Over the last few weeks, there have been countless instances when I was asked about the Facebook scandal, implications of the new European regulation on data privacy (GDPR) [1][2][3] and other similar topics. While I will not elaborate or discuss the widely discussed Zuckerberg hearings [4], Cambridge Analytica or confusions around GDPR, I will express my feelings and observations on the general topic of data, its importance to the success of AI (Artificial Intelligence) and the relation between these with the wider public.

We are at a unique junction. On one side is Europe who has taken a path of caution by putting restricting checks on data usage. On the other side is China, who is trying to link up all available data sources to control its population [5][6][7]. The US is in a middle position, unable to go to the extent of China but also not going in the European direction, due to its belief of having minimum regulations and due to commercial interests of its home-grown tech giants and related lobbies.

As a practitioner of data science, I would consider the Chinese scenario the best way forward. However,  from a citizen’s point of view, considering the values of free speech, equality and democratic ideals, I feel that these trump over possible advantages. My fear is that AI-enabled Chinese products and services, powered by vast amounts of rich data, might have an advantage over other democracies without access to richer datasets. While the bigger economic powers like the US, EU, Japan, Russia, India might limit access of such Chinese products into their markets, smaller economies might not be able to resist. Given such a scenario, Chinese products might end up taking a large part of the global digital ecosystem and create future problems. Coming back to the non-Chinese part of the world, I feel that there are a lot of misconceptions held by the general population and regulators because of the lack of transparency over data collection and processing and because of their lack of understanding on processes used in data science. While this is a constantly evolving area, I have jotted down a few steps which I feel can help AI lead to a better tomorrow while still being commercially lucrative.

Data

To start, data is the key to any AI-enabled analysis or product. While data is often called the new oil, I believe the value of this oil differs by the refinery that mines it, meaning it is the creativity of the analyst that can make data more precious than gold or can make it a loss-making storage hogging liability. There is an evident lack of understanding in the market as some enterprises overprice access to their data assets while others do not even realize the potential of their data assets. There is a need for proper valuation of data assets enabled by appropriate pricing mechanism.

A lot of available data is often disjoint, and with new regulations in Europe, data linking will become more difficult. In addition, access to the degree of rawness varies by the data source; for example, one can get raw tweet text data with several metadata attributes which enables a variety of analysis. On the other extreme, one can get only aggregated indexed insights for search topics from say, Google trends. Given these differences between data accessibility and data sources, there needs to be the development of new thinking around research design and analysis. Some have started, but it’s mostly a rarity.

Going back to the data, there is an urgent need for creation of data markets and exchanges that are transparent (in cost and access) and allow access in accordance with respective legislations. These data markets should grant access to companies, big or small, looking to buy data and allow those who want to sell data.

Regarding the point of websites and services selling data, there is a need for transparency from the service holding the data with its users and a crystal-clear clarity on what they are collecting. These services need to offer its users an easy method to opt out and for the users who do not opt it, clarity on what they get in return. Each service can sell either raw data or aggregated insights suitable for their legal and perceptional liabilities and commercial benefits.

AI & Data Science

Talking about the processes, I feel that the market is flooded with over-optimism about possibilities of AI which may lead to many disappointments. One of the biggest issues is the belief that AI technology will be the end to every problem. There is a need to recognize that technology is just a tool and not the end in itself to achieve real results. Human ingenuity and robust procedures are also very important to real problem-solving.

There is a need to educate the general population and regulators about the possibilities of AI. A nuanced practical and honest approach to such an education is required; an approach that makes it easier for the general population to know about the goods and the bads of AI-enabled products and services. In this endeavor,  there will be a need for global and regional organizations focused on technology, research, analytics, and advertising to come together to raise awareness and remove misunderstandings.

Continuing on my previous point, there is a dire need to educate regulators. I stress regulators because the recent Zuckerberg hearings (in the US) and several similar events in other countries have revealed that a large number of our regulators have no idea about the basics of this new data-enabled world. If they are not educated, they might end up creating regulations that might cause massive damage to industries and society in general.

In conclusion, while Europeans tackle the confusions around GDPR implementation and Americans get a grasp of the Facebook scandal, this is the right time for different organizations and individuals to turn this time of uncertainty into a moment of course correction for a better tomorrow.

Disclaimer: The above blog is a personal opinion and does not reflect those of any of my past, present or future employers, clients or affiliations.

References:

[1] https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/

[2] http://www.wired.co.uk/article/what-is-gdpr-uk-eu-legislation-compliance-summary-fines-2018

[3] https://iapp.org/news/a/how-gdpr-changes-the-rules-for-research/

[4] https://energycommerce.house.gov/hearings/facebook-transparency-use-consumer-data/

[5] https://www.independent.co.uk/life-style/gadgets-and-tech/china-social-credit-system-punishments-rewards-explained-a8297486.html

[6] http://foreignpolicy.com/2018/04/03/life-inside-chinas-social-credit-laboratory/

[7] https://www.volkskrant.nl/buitenland/china-rates-its-own-citizens-including-online-behaviour~a3979668/

Please share...

Join the conversation

Preriit Souda

Preriit Souda

Consultant: Data Science