You’ve just finished a series of in-depth customer interviews. Your stakeholder asks, “So, what percentage of customers were successful?” Be careful how you answer. That number could be dangerously misleading.
It’s a common scenario for customer experience professionals. You conduct qualitative research—like usability tests, customer interviews, or co-creation workshops—to uncover deep insights into customer needs and behaviors. These methods are invaluable for understanding the why behind customer actions.
But then comes the inevitable pressure to quantify the findings. Stakeholders want numbers. They want to see metrics like success rates, task times, or satisfaction scores. You might be tempted to report that “80% of participants completed the task” or “the average satisfaction score was 6.5 out of 7”
These statements, while factually correct about your small sample, are statistically deceptive when generalized to your entire customer base. To understand why, we need to look at a fundamental concept in measurement: true-score theory.
In any research, you’re trying to understand a characteristic of your entire customer population. This could be the true success rate for a specific task, the actual average satisfaction with a new feature, or the real Net Promoter Score (NPS) across your whole market. This population-wide value is called the true score.
Since it’s impossible to test every single customer, you use a sample to estimate this true score. The result you get from your sample; for instance, the success rate among your 10 study participants—is the observed score.
According to Classical Test Theory, the relationship between these two is simple(1):
Observed Score = True Score + Measurement Error
If the measurement error is small, your observed score is a reliable estimate of the true score. But if the error is large, your observed score is mostly noise and tells you very little about the reality of your broader customer base.
As a rule, the smaller your sample size, the larger your measurement error. This is because each participant brings their own unique context and variability into the study. One customer might be having a bad day, another might be exceptionally tech-savvy, and a third might be trying to please the moderator.
In a large study, these individual quirks tend to cancel each other out. But in a qualitative study with only 5-10 participants, a single outlier can dramatically skew the results. Your observed score is more likely to be a reflection of the specific individuals in your study than a true representation of your overall customer population(2).
Statistics provides a tool to quantify this uncertainty: the confidence interval. A confidence interval gives you a probable range for the true score based on your observed data.
Let’s say you test 10 customers and 7 of them successfully complete a task. Your observed success rate is 70%. That might sound good, but the 95% confidence interval for this result is approximately 35% to 93%.
This means the true success rate for your entire customer population could be as low as 35% (a failure) or as high as 93% (a success). The range is so wide that the 70% figure is practically meaningless for making business decisions. Your measurement error is simply too large for the number to be trustworthy(3).
Beyond sample size, the very nature of qualitative research makes its numbers unreliable. Qualitative studies are typically formative; their goal is to explore issues, generate ideas, and inform design improvements. The study protocol is often flexible by design. A good moderator might probe deeper on certain topics, ask unscripted follow-up questions, or even alter tasks between participants to explore emerging themes.
This flexibility is a strength for uncovering insights, but it introduces variability that contaminates quantitative metrics. One participant might succeed only after receiving a hint from the moderator, while another succeeds independently. Reporting that both were “successful” ignores the critical context that the qualitative method was designed to capture.
In contrast, summative research, like a large-scale quantitative study, uses a rigid, standardized protocol to ensure that all participants are treated identically, thereby minimizing measurement error and ensuring the internal validity of the results(4).
When stakeholders press for numbers from your qualitative research, it’s your job to frame the data correctly. Never report percentages or averages from small-sample studies without providing the statistical context.
Instead of this: “70% of users were able to complete the task.”
Say this: “In our study with 10 participants, 7 were able to complete the task. However, due to the small sample size, the true success rate for all customers could be anywhere from 35% to 93%. The key finding isn’t the number, but the specific issues we identified that prevented the other three participants from succeeding.”
Instead of this: “The new design’s ease-of-use rating was much higher than the old design’s (6.2 vs. 5.1).”
Say this: “While the new design scored higher on ease-of-use among our small sample, this difference was not statistically significant. This means we can’t be confident that the new design is actually better for our entire user base. The qualitative feedback, however, suggests that customers found the new navigation clearer, which is a promising directional insight.”
Qualitative research is a powerful tool for exploring the complexity of the customer experience. It provides the stories, the context, and the “why” that numbers alone can never reveal. But it is not designed to produce reliable quantitative metrics from small samples.
As a CX professional, your credibility depends on your ability to use research methods appropriately and communicate their findings accurately. Resist the temptation to pull impressive-sounding but statistically meaningless numbers from your qualitative work. Instead, focus on the rich, actionable insights that this type of research excels at delivering. Educate your stakeholders on the difference and guide them toward using the right research for the right questions.
(1) Cappelleri, J. C., Lundy, J. J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for quantitative assessment of items in developing patient-reported outcome measures. Clinical Therapeutics, 36(5), 648-662.
https://pmc.ncbi.nlm.nih.gov/articles/PMC4096146/
(2) Faber, J., & Fonseca, L. M. (2014). How sample size influences research outcomes. Dental Press Journal of Orthodontics, 19(4), 27-29.
https://pmc.ncbi.nlm.nih.gov/articles/PMC4296634/
(3) Hazra, A. (2017). Using the confidence interval confidently. Journal of Thoracic Disease, 9(10), 4125-4130.
https://pmc.ncbi.nlm.nih.gov/articles/PMC5723800/
(4) Andrade, C. (2018). Internal, external, and ecological validity in research design, conduct, and evaluation. Indian Journal of Psychological Medicine, 40(5), 498-499.