When making Contrasaurus we ran into some trouble with the score counter and health labels. They looked fine on a dark background, but on some levels there were clouds or other light colored and varied backgrounds. In order to increase visibility I added a 1px text shadow to create a distinct edge even against a light colored or varied background. That way the text should be highly visible throughout the game.
It seemed to work well for my situation, so I moved on to mobile optimization and other tasks. That is until this innocuous seeming tweet from Zed Shaw crossed my stream:
text-shadow is the <blink> tag of 2011.
Zed was probably referring to a use more akin to this one, but if I’ve learned anything from the internet it’s that opinions are cheap, evidence is rare, and experience is invaluable. So I respond with my own cheap opinion:
@zedshaw I disagree. A tasteful 1px text shadow can enable text to be better visible on a wide variety of backgrounds, aiding usability.
@STRd6 Got proof? As in, actual usability studies with real people where they read pages and pages of shadowed text that I can replicate?
Whoa, proof? Evidence? I just thought we were throwing opinions around on the internet, but if I’ve learned anything from throwing opinions around on the internet it’s that you never turn down a debate with Zed Shaw.
On a serious note, this is actually a valuable opportunity. I have this hypothesis based on personal, non-scientific data points and I can put it to the test. I actually don’t have much experience testing these hypotheses in a scientifically rigorous way, which makes this opportunity even more valuable. So rather than sit around reading all the opinions and anecdotes about how to create the best usability study, I decided to dive in and iterate.
Null Hypothesis: For people on the internet, the legibility of text is unchanged with 1px text shadow.
Alternative Hypothesis: For people on the internet, a 1px text shadow increases the legibility of text.
To test this I used Amazon’s Mechanical Turk. I’d never used the service before so this was a good learning experience on that front as well. Good statistics requires getting a representative sample of the population in question. I can’t easily get a random sample of “people on the internet”, but I can use Mechanical Turk responders as a proxy.
To test the “1px text shadow increases legibility of text” part I just had a simple and direct question “Choose the image with the most legible text”. I put up three different surveys, each one had one question with two pictures that varied only in whether or not they had 1px text shadow. The three different picture pairs were taken from screenshots of Contrasaurus at three different points in the game.
I offered $0.03 for each response and requested 100 assignments for each pair. All total it came to $10.50, not bad for a crash course in hypothesis testing via Mechanical Turk.
Now the results:
96/100 selected text-shadow was more legible.
55/100 selected text-shadow was more legible.
99/100 selected text-shadow was more legible.
So assuming the null hypothesis “for people on the internet, the legibility of text is unchanged with 1px text shadow” we would expect an outcome as or more extreme than the observed result of 250/300 nearly zero times. We therefore reject the null hypothesis in favor of the alternative hypothesis that “for people on the internet, a 1px text shadow increases the legibility of text”.
A caveat is that this is only as reliable as our research methodology, which admittedly may have some flaws and biases. A few of the ones I can think of straight away:
- The selection of survey images may not be a representative sample of the conditions in which they appear in the entire game.
- The selection of respondents may not be a representative sample of the population in question.
- There may be a bias due to the order in which the options appeared (The no text shadow option was always first as I wasn’t familiar enough with Mechanical Turk to randomize it.)
- There may be a bias if some respondents responded to multiple surveys thereby making some trials non-independent.
- There may be a subconscious bias in how I constructed the experiment itself.
Please respond with criticisms and possible solutions. I think there’s quite a bit of promise in the technique of using “real” statistical evidence to back up claims. Even best effort statistical evidence is quite a bit above the standard personal opinion. If I’ve learned anything about statistically significant evidence it’s that it’s quite hard to get right in a bullet proof manner, but like everything, practice and iteration, with feedback, can make it into a valuable skill.