Thursday, July 27, 2006
long tails need organization to happen
Like many ideas spawned by Wired magazine, the Long Tail is a vaguely libertarian notion that all anyone wants is unfettered access. Give people access, and the Tail will emerge spontaneously. The concept is argued on the basis of idealized statistical behavior and supposed transaction cost economies of data servers. From the perspective of user centered design, I find the Long Tail concept a bit naive.
Why are hits so powerful, despite the very real phenomenon that consumers have access to an ever broader range of content? The constraining factor has little to do with computers and economics, it has to do with human attention, both cognitive and psychic.
Cognitive attention is challenged the more stuff that is available. Computers have no trouble storing millions of records, but humans have trouble making sense of them. Browsing, scanning and searching are increasingly difficult the more records available. I don't want to discount the impressive progress in information architecture over the past decade, but I feel the solutions developed are still primitive compared with the needs posed by millions of records. Consider the "subject" taxonomy in Amazon's book store: it is simply too broad to be helpful in view of the millions of records. No one has developed any universally meaningful way to describe music genres that reflect the narrow-casted development of styles and approaches.
What I am calling psychic attention is grounded in the many facets of social psychology. We are drawn to things other people are buying for numerous reasons. People feel comfortable buying products that are already accepted. It is "rational" in terms of expected effort expenditure to buy something others have already tried, and presumably found useful or enjoyable. People experience social validation, extend trust, and have a basis for social connection when going for popular options. Information management has addressed the social dimension through behavioral data mining, showing connections between the purchases of different items, and through recommender systems, where people suggest items of interest, rate items, and rate each others ratings. These systems can reinforce the popularity of already strong sellers, working against the Long Tail.
There has been enormous progress in giving form to the mountains of records, but behavior and recommender systems often externalize the contradictions of individuals (especially in the low volume end of the Long Tail). Take someones "my favorite's" list: it may contain list of seemingly random items, books and CDs on unrelated topics or styles. Or people mean vastly different things by common words -- as an experiment type the word "liberal" in Amazon's Listmania. You will find recommendations for books that are far from your personal preferences (whatever they are), because people use the term in so many ways: as a positive term for either Left or Right wing politics, a derisive term for the same, as a theological orientation for various religions, etc. Sales behavior and recommendations are also not logically correlated, pointing to some gaps in behavioral classification. One Amazon reviewer noted that nearly everyone (several hundred reviewers) gave an Anti-virus software package the lowest possible rating, but it showed up as the most popular seller. A conversation with your next door neighbor might explain such a contradiction, but the user interface doesn't.
To navigate through and evaluate the long tail, people must rely on logical organization or social organization (the opinion and behavior of others). If theorists who argue that humans relate to concepts in ways similar to how they relate to people are right, then information organizations need to be smaller. You can't know everyone in a big person organization well, which is one reason organizations divide and splinter. The same may need to happen with the Superstore websites. Narrowcast marketing presumes people have a some intention behind their interest in a product, band, hobby or lifestyle. The superstores try to infer that intention by observing expressed opinions and behavior, but miss the organic aspect of collective intentions. Intentions are consciously formed, and microsites have much greater coherence in their offerings. Meaningful information management is not inherently self-organizing. When "everyone" (either the broader public or a data mining computer) tries to conceptualize and interpret the meaning of something that has resonance to a core group, the meaning gets lost.
Tuesday, July 18, 2006
New generation dashboards are now presenting data more relevant to front-line employees, particularly their KPIs (key performance indicators). The seamless corporation created by enterprise software is allowing a multitude of data indicators to be collected and presented in ways tailored to the work of individual employees. Such dashboards promise to improve measurement and awareness of activity (enabling improvement) and support long-standing goals to de-layer decision making and give more responsibility to front line staff. Dashboards have moved a big step toward relevance to employees, but few dashboards are truly user centered, because they don't address underlying user motivations.
Dashboards have received scant attention from interaction designers, and what attention that has been given tends to view dashboards as just another UI, often likened to data-rich maps. Coping with data richness is certainly an aspect of dashboards, but it can potentially focus attention of the wrong end of the user experience. The question is not necessarily how to cram more information on a dashboard, so that users can successfully discriminate between different levels and layers of information. Rather, the question may well be to make sure that the KPIs presented truly support the employee's performance. Ironically, visually rich cartographic dashboards may be distracting to employee performance, even if they present lots of data people think is relevant and even if they can be understood without difficulty. Unlike a map, where data often represents something as lifeless and impersonal has geological formations, dashboards represent data that is anything but impersonal: it reflects the incentives employees are given and how they are rated.
Dashboards are a good example of the importance of understanding user needs in context, moving beyond static understanding to explore a user's lifeworld. A recent article in the Financial Times discussed recent academic and investment research on the paradox of incentives. It notes: "It seems that incentives work well when the subject is given a repetitive, mindless task to perform, rather like the piece rates that applied in manufacturing plants. But when more thought is involved, incentives may dent performance. Our minds start to worry about the incentives, rather than the task at hand. We choke under pressure."
What research suggests is with complex knowledge work, where there are many factors mentally juggle, the more we think about multiple KPIs displayed on a dashboard, the more we are distracted from completing the task at hand. Here, our cognitive make-up collides with the business imperative to measure and monitor everything. This conflict is can be resolved different ways. Perhaps employees are being overloaded with KPIs, and so they need fewer, and therefore a simpler dashboard. Perhaps they indeed need to measure and monitor a multitude of data factors, but they should not be rated on all these factors. We could have a sophisticated dashboard of enterprise data that are not KPIs for an individual employee.
Dashboards promise to act as a window on performance, but they can influence performance as well as reflect it. Ideally employees shouldn't be thinking too much about the dashboard. Dashboards are tools that should blend into the background to support an employee's work, not be in the foreground, screaming for attention.
Monday, July 17, 2006
usability testing isn't dead, only summative testing is dead
I have enormous respect for Norman, and love his recent contrarian views on User Centered Design, which contain many valuable insights. But on his point about user testing I think Norman is flat wrong, and out of touch with how usability testing has developed in recent years.
Norman, and a few other old-time professionals in the HCI world who I've seen be critical of user testing, reflect a dated understanding of what user testing is. They equate user testing with the bug-tallying process of summative testing, a test often done at the end of the design and development process that gives a report card on how the application works for users. Large groups of test subjects would work through uniform test protocols. In HCI, summative testing used to be holy grail of scientific respectability for the field, giving statistically measurable data on what works and what doesn't.
As a practitioner, I don't know anyone relying on summative testing to any extent-- for the very reasons Norman and others who criticize it as "too late." But there is plenty of room for usability testing to inform design -- just don't do it at the end, or try to make it scientific experiment. There is enormous confusion in the usability community because we sometimes discuss testing without being explicit whether it is old-fashion summative testing (largely a white elephant), or nimble and iterative formative testing. Both are usability testing, but formative testing is not simply about finding bugs and glitches. Formative testing can be a powerful tool for understanding user needs and preferences:
- Formative user testing gives users something concrete to react to. While pre-design user research can be valuable to identify abstract user needs, concrete design alternatives provide the bridge to developing optimal solutions. You often can't know all the user requirements through pre-design research. It isn't always a matter giving a design a pass/fail rating, but exploring effectiveness of alternatives that often involve trade-offs for the user (and perhaps the sponsor organization as well.) Such formative testing is becoming increasingly common, but some people doing it seem reluctant to refer to it as usability testing (perhaps because usability testing sounds cumbersome or because formative testing isn't as rigorous as "proper" usability testing is meant to be.) Even fewer people refer to this testing as formative user testing - it is an quasi-informal activity that is never given a proper name or due status.
- Users aren't idiots, and often can successfully understand and use different design alternatives, though they might not necessarily like all the alternatives equally well. A small example from my work: do users want initially to see a list of billing items in chronological or reverse-chronological order? I can ask this question orally, but get a much stronger indication of user preferences when I present alternative designs. Note that users could understand and use either one, if they were compelled to use it, but it doesn't follow they will bother to use it simply because it is usable.
Formative testing has developed in the practitioner world in response to the inability of summative testing to cope with iterative design cycles. But there is no orthodoxy about how formative tests are done or evaluated. In many ways the lack of orthodoxy with formative testing has been a blessing, as it has enabled it to be responsive on projects, and grow creatively outside the straitjacket of scientific method. On the downside, because formative testing has developed on the margins of HCI orthodoxy, it hasn't received the recognition it deserves, and can be misunderstood by even big-name HCI gurus.
Many practicing UCD researchers and interaction designers consider statistical validity irrelevant to the value of testing. Testing is valuable because it offers insight, not because it offers data. User comments and stories about their behavior provide richer insights useful to design than bug-seeking data. Sometimes it is confusing how strong an insight is, or whether we know if we have uncovered all we want to. In these cases it can be useful to find new methods to evaluate robustness and completeness the qualitative data arising from formative testing, and how to work with this in an agile, iterative setting. I was pleased to see the beginning of such a discussion of formative testing at the UPA conference last month. (For example, check out the boundary-pushing work of the team at Alias/Autodesk). There is plenty to improve with current formative testing methods, but let's not throw the baby out with the the bath water.