PureSpectrum - Schedule A Demo
Our new GreenBook Directory site is live!
COVID-19 guidance, tips, analysis - access full coverage here

A Reality Check For Online Data Quality Best Practices

The next time someone discusses their data quality initiative and elaborates on a couple of the supposedly innovative things they have in place, invite them to have a real conversation about research quality. I’d be happy to have that discussion with anyone who asks.

best practices



Melanie Courtright

Over the last few months, I’ve been reading some of the online data quality posts that have appeared here.  At times, I’ve been genuinely shocked by the naïveté, while other times I’ve found myself screaming at my computer, railing against the rhetoric and misinformation. At the end of the day, every conversation about data quality is a good conversation.  But rather than helping to address a noble concern, these inadequate conversations creep in when people are trying to deflect from a product deficiency or compete with each other.  These conversations use fear-based language to create pain points (real or phantom) in an effort to scare people into reactive choices.

The truth is, solving quality issues is, and has always been, a major part of our industry.  All modes of data collection have their own biases. In-person surveys have interviewer bias, mail surveys have non-response bias, phone surveys are biased due to issues surrounding interviewer quality, online surveys are biased due to internet penetration, and mobile surveys are biased due to smartphone usage rates. And you can mix and match their respective biases in a multitude of ways.

Sustainable data quality isn’t about cool new techniques (although some of them are increasingly useful) or the latest technology (although that too can help). When done right, data quality is an end-to-end monitoring and vigilance process, with initiatives and metrics all along the 5 Rs of sample: Recruitment, Registration, Respondent Management, Research, and Rewards. It requires checks and measurements at every point in the research lifecycle. If anyone is talking about quality, and they’re only talking about one area of quality, they are doing you (and the industry) a real disservice.  Great companies understand that quality is a huge investment. You need to partner with companies that aren’t trying sell you the “flavor of the day” quality story.


  1. Recruitment Source Long Range Planning: A formal recruiting strategy that ensures consistency over time. Look for companies with the structure and expertise to ensure reliable, scalable recruitment that results in sustainable feasibility.
  2. Traffic metrics: Volume of traffic coming from every source, by demos, ensuring predictable volumes. Watch for shifts in data quality, minority representation, or technology ownership, to name a few.
  3. Partner Comparisons: Brands, web sites and memberships should and will have unique characteristics. Look at the unique attributes of each traffic source, and figure out what that means to your sample frame and the resulting data.
  4. Blending Strategy: Combining all the data sources into a single panel. Using all of the information above, decide: what is your strategy for blending data from these multiple sources, and how do you ensure consistency over time?
  5. Diversity & Breadth: Offset bias and increase representativeness. It’s crucial to have a broad set of sources that drive people from all walks of life so you must think beyond just demographics to psychographics. You won’t find everyone you need on a single site or a few sites. They’re in remote corners of the web, and you have to reach them where they prefer to live.


  1. Fraud Prevention: Tools that require human eyes and fingers to answer questions, along with interpretation and logic, to participate in research communities.
  2. Digital Fingerprinting /Geo IP/Proxy detection: Tools that look at computer identities, and the network path they came from, that reach beyond deletable cookies and survey tags.
  3. Email/Username/Password Scans: Accounts with the same or similar email addresses or passwords are a red flag for fraudulent accounts.

Respondent Management

  1. Profile Traps and Consistency Checks: Do people overstate illnesses or list too many ethnicities in an attempt to qualify, have data that is inconsistent with previous questions or visits? Are they paying attention and being truthful?
  2. Length of Interview Scans: Watch the speed in your own surveys, and have clients send speed information back to you so that repeat offenders can be flagged.
  3. Client Survey Invalid Rules & Scans: For all clients who use data cleansing and traps, request as much information back as possible, and in real time where feasible. Any time you see a daily increase in invalids, investigate immediately.
  4. Automated Data Quality Practices: By now, every sample company should have trap questions built into the system using randomization and intelligence. It shouldn’t be manual and it shouldn’t be predictable, or it won’t work.
  5. Sampling Protocols/Rules: One of the most important — and over-looked — steps in the process is rules and standards related to the actual sampling. How are the invitations selected? Is there consistency between Project Managers? Between waves of surveys?

 Research Management

  1. Design Partnership with the Client: We are in this together so let’s work as a team to reduce survey lengths, increase engagement, and make the process work better. An important part of data quality is keeping the members who provide meaningful data.
  2. Member Services Approach to Problems and Complaints: Track every interaction with members, and handle their complaints quickly. Look for problem themes and use them to improve the systems and the surveys. Watch for frequent complainers, and use that as a red flag.
  3. Replicable Survey Assignment Process: When using a router, be sure that the routing system doesn’t introduce bias. Routing should have a strong element of randomization which leads to higher replicability.
  4. Device Compatibility – Understand consumers so well that you can anticipate their device practices and design surveys that don’t create instrument bias.


  1. Reward Relevance – Use a variety of rewards that require strong identity validation and that motivate people for all of the reasons that they would participate in surveys: to give back, to get back, and to get a pat on the back.
  2. Community Aspect and Sharing Survey Results: When possible, share survey results with members as part of their reward. Reinforce the importance of their participation and their response quality. Help them become passionate about their involvement, and make them feel part of a community of valued people.
  3. Reporting of redemption anomalies: Watch reward redemptions for unexpected changes. Shifts in incentive choices or a sudden increase in redemption can be important indicators of a potential threat.

Regardless of the recruitment or sampling method, whether it’s river, router, or programmatic, you have to do it all and watch it all.  When you do, you will notice shifts as they happen, rather than after they’ve impacted the data. You will be able to intervene and make changes. That’s where technology steps in — to implement gates and solutions. Watch and react every day. Stay watchful at every point in the survey and respondent process. You can’t rest. You can’t get comfortable. What worked yesterday won’t work tomorrow.

So the next time someone discusses their data quality initiative and elaborates on a couple of the supposedly innovative things they have in place, invite them to have a real conversation about research quality.  I’d be happy to have that discussion with anyone who asks.

Please share...

7 responses to “A Reality Check For Online Data Quality Best Practices

  1. Thank you, Melanie. In some countries the situations is worse than others and there have been times when I’ve needed to set aside half the sample when building a statistical model. Doing things on the cheap doesn’t pay.

  2. Thanks for reminding us that data quality is far more than “Stick a red herring in there and delete anyone who picks it.” Data quality is a three? four? five? way street among the vendor, the client, the responder, and anyone else involved. Each person needs to do their part to generate great quality data.

  3. Great post, thanks for sharing. Because of the rising importance of data-driven decision making, having a strong data governance team is an important part of the equation, and will be one of the key factors in changing the future of business. There is so much great work being done with data quality tools in various industries such as financial services and health care. It will be interesting to see the impact of these changes down the road.

    Linda Boudreau 

Join the conversation