Researchers focus a great deal of attention on the quality of the data being used in their companies, and for good reason. Reliable data is essential in supporting informed decisions. It’s critical that all research is built upon a solid data foundation. The foundation becomes shaky when data quality suffers, or when stakeholders lose confidence in the data that underpins the decisions they make. That means researchers often play both offense and defense, safeguarding the quality of the data they use, ensuring it is fit for purpose, and that they can adequately contextualize and defend the data (as well as the methodologies) that they use.
As more companies move to agile workflows that require a greater volume of rapid feedback for day-to-day decision-making, researchers are finding ways to conduct agile research that meets these needs while optimizing the quality of the data that is produced. With the right tools and processes in place, it’s possible to effectively conduct research that’s quick, but not dirty.
The advantage to online data is that it can be generated quickly and at scale. But let’s be honest: ensuring that quality data is collected and delivered when conducting online research is a serious challenge. In fact, issues surrounding online data quality are by no means unique to research. Data quality concerns are essentially the cost of doing anything online. For example, in the ad tech world, there is a continuous struggle to validate and ensure that the people clicking through on ads--and costing ad buyers money--are legitimate, and not bots or bad actors. The online survey world has similar, ever-evolving challenges to address, ranging from sophisticated fraud aimed at generating survey incentives at scale to survey participants that just aren’t paying much attention. Threats to data quality are manifold no matter the sample source, and no sample provider is immune.
This sounds scary and it’s true that it would cause chaos if insufficient measures were taken to address these issues. In some cases, incomplete data quality programs do allow for a problematic level of bad data to seep into what is delivered. Even with rigorous application of data quality measures, there will be some level of noise in the data from online sources.
One of the facts of conducting research is that researchers deal with trade-offs every time they choose a methodology for a study. For example, a researcher could reduce the risk of poor-quality data by conducting face-to-face interviews, which are much less prone to fraud and related issues. However, conducting hundreds and thousands of interviews at scale takes an exhausting amount of time and is extremely costly.
Online research enables speed, reach, and scale that can’t be accomplished with other methods. Researchers realize that in order to achieve that speed, reach, and scale, they are making a trade-off by including an ever-present vigilance on data quality along with a tolerance of some level of less-than-ideal data.
It’s worth taking a moment to reflect on the fact that while challenges in conducting online research abound, the vast majority of survey traffic is legitimate. Most people taking surveys are who they say they are and genuinely want to provide their feedback. It’s not a sea of bad actors from which we’re looking to fish merely a few worth keeping. Rather, when we cast our nets, we find that most are worth keeping, though some should certainly be thrown back. This means that while it’s easy to find a participant here and there who provided poor responses, or what appears to be a bot that made its way into a dataset, one shouldn’t think of tossing out the whole sample.
No online sample collected is perfect, nor will it ever be. It just means that researchers must work with tools that help them ensure the best possible data.
At Feedback Loop, we have a data quality program that we’re continually investing in and improving. It’s not easy, but it’s rewarding. Just as technology creates challenges, it provides tools to combat these challenges. We employ both technology and human-based solutions to ensure data is collected at scale and with speed while maintaining a high standard for quality that ensures researchers can rely on the data we provide.
At Feedback Loop, we think of the data quality process as being like a funnel through which data passes, using a series of screens to sift out what’s not wanted with finer and finer detail along the way. Point solutions aimed solely at specific data quality challenges almost always fall short. Because of this, we take a holistic approach, starting with how we work with our audience suppliers all the way through how data is delivered to users on our platform. Our approach focuses on four key areas:
The top of the data quality funnel starts with our relationships with the companies that provide us with research participants. Our supply-related objectives are threefold: 1) obtain potential research participants (“sample”) as fast as possible by using efficient direct API connections, 2) connect to broad and diverse supply sources to reach as many people as possible, and 3) be selective in engaging with only vetted and trusted suppliers. In order to best achieve these objectives, we partner with Lucid and Cint, the world’s two largest online sample marketplaces. Respected as industry leaders in fueling technology-driven research, both companies are focused on enabling speed and efficiency.
Moreover, working with them allows us to reach participants through a network of diverse suppliers, ranging from open and closed recruitment double opt-in panels to non-traditional mobile app based panels. Diversity in supply is incredibly important to consider when sourcing audiences. All discrete supply sources are, by nature, biased because of how they recruit, maintain, and engage with their audiences. Thus, sourcing participants from a diverse group of quality suppliers reduces bias. And on the most basic level, relying solely on one supply source will only allow you to reach a fraction of the people that can be reached with an extensive supply network. However, our goal is not to work with just any supplier we can find.
Feedback Loop selects suppliers that meet the quality standards set by our supply partners, such as those who prove to be reputable through their participation in Lucid’s quality program. While both Cint and Lucid maintain standards in their exchange/marketplace, we add our own layer of due diligence and continually monitor individual suppliers. We make decisions to exclude certain suppliers because of data quality concerns, or accept new suppliers who have been proven to meet our standards.
Our decisions on which suppliers to engage with as well as which participants we accept into our studies is informed by our proprietary tools as well as a critical partnership with SampleChain. SampleChain’s technology platform tracks participant activity across the market research ecosystem (not just within one supplier or one buyer) to eliminate fraud, bots, and low quality survey participants. SampleChain combines digital fingerprinting technology with a combination of machine learning, regression techniques, and third-party fraud prevention services to identify and eliminate bad participants before they ever answer one of our questions. As an additional check, the Feedback Loop platform collects data that allows us to dedupe internally and confirm that survey participants only enter a study once, even if they attempt to take the same survey from a different account.
The next step in the data quality funnel addresses finding the right people to participate in specific research studies. Just as products are targeted at specific consumer groups or target markets, research is often focused on people who meet certain relevant criteria. In many cases, feedback from a truly “general population” audience is sufficient. But oftentimes we need to focus on people who meet more specific criteria. For example, let’s say you are developing a product targeted at millennial renters. You’d want to ensure that only people who are in a certain age bracket who rent a house or apartment are asked for their feedback.
We target test participants using predefined criteria maintained by our supply partners so that we can efficiently and accurately reach the right people. If, for example, we know that we want to reach only women between 25 and 55 who are employed full time and make over $50,000 per year, we can use the profiling parameters stored directly by our partners to ensure that only people meeting these criteria are invited to participate. To use another fishing metaphor, we could cast any old net into the sea and hope we come up with the people that we need, but undoubtedly we’d end up with many that we don’t want and some that we’re not quite sure about. Then we would waste our efforts and the participants’ valuable time by screening them for no reason. Our ability to programmatically target groups based on known demographic and behavioral characteristics increases efficiency and allows us to use screeners more effectively.
Targeting oftentimes only gets you part of the way to the people you want to engage with. Let’s say you don’t just want feedback from people who meet specific demographic criteria, but you also need people who have used online banking in the past 30 days. You would want to ask potential participants this question directly in a screener. When asking any screening question, the Feedback Loop team vets the approach ensuring we aren’t asking a question in a way that leads participants. You simply won’t find us asking something like “Have you used online banking in the past 30 days?” with “yes” and “no” answer options. Our research team employs industry best practices in properly asking questions and masking answer choices to ensure we are doing our best to weed out anyone who might be trying to “game the system” and qualify for a study, or opt in someone who simply isn’t paying attention. On top of screening questions,our team may employ other data quality check questions that serve to weed out problematic participants before they ever answer the actual questions we want their feedback on.
Ensuring that we find the right people at this point in the funnel involves using technology, reducing question burden for participants, and asking the right questions. But it’s not just asking the right screening questions that helps maintain quality. That’s still just the beginning.
The initial two steps of the data quality funnel let us target appropriately and ask effective screening questions. Step three is the point in the research process where the fun really begins. After all, we’re conducting research not just to find the right people, but to get feedback from them. However, if we find the right people and ask the wrong questions, we’ve put in a lot of work only to come out of the process without the information we need. And if people don’t want to and/or can’t complete our surveys, we’ve failed both survey participants and anyone counting on the data we’ve set out to collect. This means it’s critical to focus on two things: 1) asking the right questions and 2) ensuring a positive participant experience. In essence, we want to make it as easy as possible for people to provide the feedback that our platform’s users need.
All good questionnaire design begins with a focused objective. At Feedback Loop, our tests are laser-focused, containing under 10 questions. With an agile research platform purpose-built for iteration, you’re given the ability to continually test and learn by conducting multiple, focused tests. This relieves the all-too-familiar pressure to cram a bunch of tenuously related questions into a 25-minute “Frankensurvey” that is unwieldy to design and painful for a participant to take. Sorry, we’ve taken too many burdensome surveys ourselves, so we just don’t do that here.
Knowing that we’re intentionally narrow in the scope of any one test, it’s easy to pay the proper attention to asking not just the right questions, but doing so in the right way. Poorly-phrased, biased, and off-target questions can be the bane of anyone conducting research because they yield unreliable data. Thus, our research team ensures questions are asked following industry best practices to ensure that we reduce bias by communicating clearly and making it easy for people to provide feedback.
Asking questions in an easy-to-answer manner is critical, of course, but we take it a step further by focusing on the participant experience within our survey software. Acknowledging that over 60% of people who take our surveys do so on a mobile device, we take a mobile-first approach to both how we ask questions and how they are displayed to participants. To be clear, designing with a mobile-first mindset does not mean excluding desktop participants. Rather, simplicity and streamlined design ensures that every participant has a positive experience. Does this mean that we flat out won’t support a gigantic, intimidating 20-row grid question? Yes, that’s true, and research participants everywhere can rejoice.
Ultimately, it’s critical not to lose sight of one basic fact: real people provide real feedback and it’s important to be respectful of and grateful for the time and effort they spend telling us how they feel and what they think. We should strive to make participating in research as easy and engaging as possible. Real people like you underpin the research we do at Feedback Loop. Ensuring they have a positive experience helps immensely in ensuring that researchers and those who rely on research data also have a positive experience.
Now we’ve arrived at the final step of our data quality funnel. After data has been collected, it is often simply shuttled directly over to researchers. Researchers then spend time doing additional data cleaning, determining which records to keep, which to filter, and which to throw out. It’s understood that researchers will have different criteria on how much cleaning is required depending on how the data will be used. However, from an agile research perspective, putting undue strain on researchers to do massive cleaning after data is delivered simply isn’t right. In addition to taking the aforementioned steps, we reduce this burden by applying a combination of automated and human-based checks to ensure that data quality meets our standards.
At Feedback Loop, we use our technology to make automated decisions on which records to remove based on behaviors like speeding and copy/pasting. Additionally, we use proprietary tools built by our data science team to clean and remove problematic open-ended data. We filter out the usual suspects like gibberish and profanity, and then take it a step further by filtering out off-topic or out-of-context responses with our AI-based tool. Before our users receive any data, our research team performs a final review of the filtered data and makes any needed adjustments to the automated filtering to further enhance the quality of our data. This last step also happens to create the feedback loop (see what we did there!) that our machine learning model needs to continually improve.
Will our data be completely clean and 100% free of any responses you might not want to use? No, and no dataset ever is. Will we commit to our comprehensive approach? Will we stand behind the quality of our data to inform rapid decision making? Is it a priority for our company to continually improve and look for new ways to provide high quality data at speed? Yes, yes, and yes! With the right quality measures in place, agile research can supercharge product development, accelerate insights programs, and enable organizations to have continuous learning at their core. Feedback Loop is the platform your team can rely on for comprehensive data quality at agile research speed.