PureSpectrum - Schedule A Demo
Qualtrics: Here to Help

We Should Adopt Open Data, With Caution

Unshackled data, such as government-owned datasets on rates of illness in discrete populations, are rich repositories of hidden gems of insights, from which citizen researchers, provided they have access to the data, can investigate in order to relieve human suffering.



By Neil Seeman

A protean army of computer scientists, hackers and citizen researchers think we are living in an era of data access prohibition. February 22 was heralded as international open data day. The unofficial mission objective was to ‘liberate data sets’ — a phrase popular with those whom we call ‘data absolutists’.

Their argument is compelling. Unshackled data — such as government-owned datasets on rates of illness in discrete populations — are rich repositories of hidden gems of insights, from which citizen researchers, provided they have access to the data, can investigate in order to relieve human suffering. Philosophically, openness fulfills the grand vision of the Web, which has always been to break down information hierarchies: to make all information elegantly structured, equal, free and useful.

‘Data absolutists’ believe that citizen researchers — not just University academics with specialty access to taxpayer-funded data sets — can infuse their wisdom into the data, and thereby ‘mash up’ information on, say, hospital patient safety records, using location-based data, or ‘patient experience stories’ on open-access blogs. We side with the data absolutists, not the ‘restrictionists’ — to a point.

Open data has as its mission a twin goal: to not only enable disenfranchised citizen researchers to legally reuse and redistribute the data, but also to enable researchers to access that data in easily manipulated file formats for analysis. Combining ease of access with free access drives greater participation.

England is leading the way. Hosted on October 31, 2013, the Open Government Partnership London Summit with 61 representatives of member states was hosted by David Cameron, the UK Prime Minister. Open data, he said, is not a “nice to have” but is “absolutely fundamental to a nation’s potential success in the 21st century…it is a vital part of any country’s plan for prosperity.”

Data absolutists champion a kind of virtuous feedback loop. First, the analysis and ‘mash ups’ of these data sets offer social and commercial value to all citizens. Second, increased awareness of this value leads to improved participation and engagement by citizens who then demand more openness.

In 2009, Sir Tim Berners-Lee, inventor of the word wide web, put out a plea for raw data. The armies of open data enthusiasts are stepping into the breach and advocating for sweeping change across the globe. The Open Government Partnership now has 63 countries signed on and is “committed to making their governments more open, accountable, and responsive to citizens.”

But pay attention the data restrictionists, who want to limit access to data. Why so? They are well-intentioned. Consider the dangers of a rogue citizen researcher potentially de-anonymizing data sets; or manipulating data such that it is possible to publish online information about who suffers from chronic illness in tiny communities. More than 40 per cent of Americans and more than 40 per cent of British citizens are very concerned about how their personal data is used, according to new insights from the Global Business Research Network. Yet there is large variation in what people consider sensitive personal data; some think the past websites that they visited are sensitive data; others do not.

As the Global Business Research Network data discussed at the IIeX Amsterdam data conference recently, there is a need for permission-based explicit consent, anonymization (“the right to be forgotten”), and transparency in how any public data will be used (e.g., data linkage). Yet the risks of linked data sets, we believe, can be solved through rigorous encrypted de-identification.

Just because there are grey areas of dispute does not melt away the strong arguments of those who are lobbying for more open data. Using data from the UK government’s open data website, researchers were able to analyze family physician prescription patterns which may help guide decision makers in identifying cost-saving measures. When the earthquake hit Haiti in 2010, people collaborated on geospatial data sets for risk assessment. At, locations of health facilities and Cholera Treatment Centers surfaced in files that could be easily mapped. That helped aid organizations collaborate and use their resources and donations to ensure maximum impact.

We believe making data free and open needs to be guided to ensure high impact and meaningful engagement. Guided engagement can play a part in defining a critical set of questions that need to be answered. For example, global pandemic surveillance data, perhaps the most closed data base in the world whilst an epidemic is emergent, needs interpretation guides by expert public health authorities who can point out to citizen researchers the potential use and abuse of such data. To save lives and relieve human suffering, we need people not only to use the data, but also to suggest improvements to the data sets, to collaborate, and, through collaboration and refinement, to get that data quickly into the hands of decision-makers, such as the WHO or the CDC.

For the International Open Data Day Hackathon there were 194 ‘hackathons‘. Many of these cities have municipal or provincial engagement. There have also been national hackathons such as the Canadian Open Data Experience (CODE). Hackathons are bringing together governments, designers, hackers, and citizens to generate ideas of how the data case be used. But focus and collaboration are critical.

The Sunlight Foundation has been using open government data to create “technology to enable more complete, equitable and effective democratic participation.” Careful planning of the hackathons has amplified the potential impact of the data by focusing people’s energy on key issues.

Another example of guided engagement was the development of DSM-V, the manual for diagnosing mental illness. The American Psychiatric Association sought out views of the public, including patients, researchers, and physicians, to update the definitions of certain illness categories and incorporate into new diagnostic criteria the range of patient and caregiver insights that had been historically ignored.

Successful open data initiatives show that artfully “guided advice” by researchers on how to use the data is important. We cannot let “open data hype” get in the way of the real goal: engagement and mashing up data to deliver high ROI. Serendipitous discovery does occasionally happen when the data “hang open,” yet serendipity can be accelerated; it can be gently shaped to ensure the right confluence of players communicate better to solve the wickedest problems of our time.

This article was co-authored with Sabrina Tang, a Junior Fellow at Massey College in the University of Toronto and graduate student in biomedical engineering at the University of Toronto. It was originally published on the Huffington Post and is reposted here with permission by the author.

Please share...

2 responses to “We Should Adopt Open Data, With Caution

  1. Neil, the nice thing about open data enthusiasts is their unflagging optimism. They fully believe the world is good, data belongs to all (no matter who collects it), and that ordinary citizens can only benefit by having access to anything and everything. Were that this were true. Until a new method of data security is created that ABSOLUTELY guarantees anonymity, both in reality and perceptually, lots of people are not interested in sharing. Those who push for open data are creating a chilling effect on legitimate data collection efforts that I think will have bigger repercussions for our industry than we probably realize now. The extremists in the open data movement speak as if they have a right to this data. Not true and not likely in the forseeable future.

  2. This article also indirectly raises the age-old question who owns the data, the client or the research firm. I have always believed that since the client pays, the client owns the data. However, in situations where research companies are using proprietary techniques or invested money in development of certain procedures, the research firm should through contract make sure the client understands and agrees that the technique and procedures still belong to the research firm post the project. In cases of data which is made public there is also an obligation through the codes of ethics of AAPOR and CASRO to respond to inquiries about method, questionnaire and data sets, so others can examine, use and attempt to duplicate the data results. I like the idea of giving access to the public and others, but permissions need to be ascertained first from the client or research firm, before data is used or quoted by others.

Join the conversation

Neil Seeman

Neil Seeman

Founder & Chief Executive Officer, The RIWI Corporation