Monday, July 17, 2006
usability testing isn't dead, only summative testing is dead
			  I'm hearing much about the misuse of usability testing from some big names like Don Norman, who writes in the current interactions magazine that usability testing is no more than minor activity of "catching bugs". While Norman maintains that usability testing is still necessary for clean-up purposes, he argues it shouldn't be used to "determine 'what users need'".
I have enormous respect for Norman, and love his recent contrarian views on User Centered Design, which contain many valuable insights. But on his point about user testing I think Norman is flat wrong, and out of touch with how usability testing has developed in recent years.
Norman, and a few other old-time professionals in the HCI world who I've seen be critical of user testing, reflect a dated understanding of what user testing is. They equate user testing with the bug-tallying process of summative testing, a test often done at the end of the design and development process that gives a report card on how the application works for users. Large groups of test subjects would work through uniform test protocols. In HCI, summative testing used to be holy grail of scientific respectability for the field, giving statistically measurable data on what works and what doesn't.
As a practitioner, I don't know anyone relying on summative testing to any extent-- for the very reasons Norman and others who criticize it as "too late." But there is plenty of room for usability testing to inform design -- just don't do it at the end, or try to make it scientific experiment. There is enormous confusion in the usability community because we sometimes discuss testing without being explicit whether it is old-fashion summative testing (largely a white elephant), or nimble and iterative formative testing. Both are usability testing, but formative testing is not simply about finding bugs and glitches. Formative testing can be a powerful tool for understanding user needs and preferences:
Formative testing has developed in the practitioner world in response to the inability of summative testing to cope with iterative design cycles. But there is no orthodoxy about how formative tests are done or evaluated. In many ways the lack of orthodoxy with formative testing has been a blessing, as it has enabled it to be responsive on projects, and grow creatively outside the straitjacket of scientific method. On the downside, because formative testing has developed on the margins of HCI orthodoxy, it hasn't received the recognition it deserves, and can be misunderstood by even big-name HCI gurus.
Many practicing UCD researchers and interaction designers consider statistical validity irrelevant to the value of testing. Testing is valuable because it offers insight, not because it offers data. User comments and stories about their behavior provide richer insights useful to design than bug-seeking data. Sometimes it is confusing how strong an insight is, or whether we know if we have uncovered all we want to. In these cases it can be useful to find new methods to evaluate robustness and completeness the qualitative data arising from formative testing, and how to work with this in an agile, iterative setting. I was pleased to see the beginning of such a discussion of formative testing at the UPA conference last month. (For example, check out the boundary-pushing work of the team at Alias/Autodesk). There is plenty to improve with current formative testing methods, but let's not throw the baby out with the the bath water.
			  
			
 
  
I have enormous respect for Norman, and love his recent contrarian views on User Centered Design, which contain many valuable insights. But on his point about user testing I think Norman is flat wrong, and out of touch with how usability testing has developed in recent years.
Norman, and a few other old-time professionals in the HCI world who I've seen be critical of user testing, reflect a dated understanding of what user testing is. They equate user testing with the bug-tallying process of summative testing, a test often done at the end of the design and development process that gives a report card on how the application works for users. Large groups of test subjects would work through uniform test protocols. In HCI, summative testing used to be holy grail of scientific respectability for the field, giving statistically measurable data on what works and what doesn't.
As a practitioner, I don't know anyone relying on summative testing to any extent-- for the very reasons Norman and others who criticize it as "too late." But there is plenty of room for usability testing to inform design -- just don't do it at the end, or try to make it scientific experiment. There is enormous confusion in the usability community because we sometimes discuss testing without being explicit whether it is old-fashion summative testing (largely a white elephant), or nimble and iterative formative testing. Both are usability testing, but formative testing is not simply about finding bugs and glitches. Formative testing can be a powerful tool for understanding user needs and preferences:
- Formative user testing gives users something concrete to react to. While pre-design user research can be valuable to identify abstract user needs, concrete design alternatives provide the bridge to developing optimal solutions. You often can't know all the user requirements through pre-design research. It isn't always a matter giving a design a pass/fail rating, but exploring effectiveness of alternatives that often involve trade-offs for the user (and perhaps the sponsor organization as well.) Such formative testing is becoming increasingly common, but some people doing it seem reluctant to refer to it as usability testing (perhaps because usability testing sounds cumbersome or because formative testing isn't as rigorous as "proper" usability testing is meant to be.) Even fewer people refer to this testing as formative user testing - it is an quasi-informal activity that is never given a proper name or due status.
- Users aren't idiots, and often can successfully understand and use different design alternatives, though they might not necessarily like all the alternatives equally well. A small example from my work: do users want initially to see a list of billing items in chronological or reverse-chronological order? I can ask this question orally, but get a much stronger indication of user preferences when I present alternative designs. Note that users could understand and use either one, if they were compelled to use it, but it doesn't follow they will bother to use it simply because it is usable.
Formative testing has developed in the practitioner world in response to the inability of summative testing to cope with iterative design cycles. But there is no orthodoxy about how formative tests are done or evaluated. In many ways the lack of orthodoxy with formative testing has been a blessing, as it has enabled it to be responsive on projects, and grow creatively outside the straitjacket of scientific method. On the downside, because formative testing has developed on the margins of HCI orthodoxy, it hasn't received the recognition it deserves, and can be misunderstood by even big-name HCI gurus.
Many practicing UCD researchers and interaction designers consider statistical validity irrelevant to the value of testing. Testing is valuable because it offers insight, not because it offers data. User comments and stories about their behavior provide richer insights useful to design than bug-seeking data. Sometimes it is confusing how strong an insight is, or whether we know if we have uncovered all we want to. In these cases it can be useful to find new methods to evaluate robustness and completeness the qualitative data arising from formative testing, and how to work with this in an agile, iterative setting. I was pleased to see the beginning of such a discussion of formative testing at the UPA conference last month. (For example, check out the boundary-pushing work of the team at Alias/Autodesk). There is plenty to improve with current formative testing methods, but let's not throw the baby out with the the bath water.


