Surveys are simple. After all, surveys are just plain questions that respondents answer with a number. Because surveys are so simple, it is seemingly very easy to adjust them or to improve them. In this article we provide quantitative proof of how such a slight adjustment can turn Net Promoter Score upside down.
Recap of the Net Promoter Score
Net Promoter Score, or NPS for short, measures customer satisfaction. It is a single question survey, resulting in a single score, which falls between -100 and +100. That score enables you to benchmark your product against anything else. This makes NPS a wonderful KPI (Key Performance Indicator), so it is “the one number you need to grow”.
The original NPS survey looks like this:
People who gave 9 or 10 are called “promoters” because they are actively evangelizing the product in question. 6 and below are called “detractors” because, well, they are probably frightening people off of the product. 7–8 are “neutrals” — not that bad, but not good either.
Introducing the bias: colors
Respondents see an uncolored scale from 0 to 10. But why not introduce colors into that answer scale? — would some researcher ask with the best intention. It would just make more transparent to the respondents how their answers will be interpreted. And that will lead to more accurate answers.
Here is the problem — when we said “more accurate answers” it also meant “different answers”. The respondents will give different answers than if they had been presented with the original uncolored survey. Unfortunately, all your reference points were benchmarked against the original survey. So now, there is no reference point one could compare the product to. Did the product score 33 in the survey? Is it good or bad? There is no way to tell.
How big is the bias?
When we first encountered the practice of colored NPS survey we suspected that it affects the final score, but we wanted to know exactly how much. So we ran an experiment to find out. We A/B tested the original and the colored version.
The research was really lightweight, just to give us a feel whether there is a serious bias here and how we should think of the satisfaction data that has already been collected with the colored version of the survey.
For the experiment we collected satisfaction data on work calendar apps from a total of 276 people. Not that we were particularly interested in calendar apps, but that’s an app everyone uses, so anyone would be able to give us a score. In our sample, Microsoft Outlook turned out to be the most prevalent work calendar app. So we compared what satisfaction scores Outlook got on the uncolored versus the colored NPS survey.
We had 124 Outlook users in our sample. The median rating was 7 on the uncolored NPS survey, and 0.5 points better, 7.5 on the colored NPS survey (a statistically significant difference, by the way, U=1473.5, p=.041). While this difference doesn’t seem like a lot, surprisingly it translates to a huge difference in Net Promoter Scores. On the scale from -100 to 100 Outlook got -31 on the original NPS survey, and -4 on the colored survey.
That is a 27 point difference. According to a research by Temkin Group (2017) 27 points is as large as the difference between industry leaders and laggards. So coloring the survey can really make you think you have best in class product, whereas in reality you have worst in class.
How can a 0.5 difference in the median lead to such a large difference in NPS? The score is calculated as the percentage of promoters minus the percentage of detractors. And that difference is changed quite much by the coloring.
Original survey: 13% promoters, 44% detractors = -31 NPS
Colored survey: 29% promoters, 33% detractors = -4 NPS
As demonstrated, a seemingly slight modification of Net Promoter survey can lead researchers to believe that a product has an industry leading user satisfaction, whereas in reality, it is among industry laggards. Therefore it is suggested to handle even slight modifications with caution.
In our experience, most survey change requests are just small adjustments that people think will improve the clarity of the survey. We think that such small potential gains are just simply not worth risking the validity of the survey.
For those cases when there is a very good reason to deviate from the original, we suggest to:
- make one modification at a time and
- run the original and the modified survey simultaneously for a while until there is an understanding of the results of the modification.
This is still way faster than putting out a biased survey and realize the bias all too late.
This article was co-authored by
Nora Miklos, UX Research Participant Coordinator at Google
Eszter Kard, UX Researcher at Emarsys