Journal



Recent Entries

Buzzkill
I’ve been struggling for days to put into words my reaction to the launch of Google Buzz. But the phrase I can’t get out of my head is “HOW could they screw up THIS MUCH?” Well here’s how: Google took Gmail, one of the most widely used web services on... (Continue)
Alternate dimensions
If you’re a typical designer working in the software world, the majority of products you’ll create will have strictly two dimensional interfaces — length & width only, pixels on the screen. As interfaces have evolved over the years many have gained a very simple kind of "depth": lighting effects, drop... (Continue)
An Insurgency of Quality
Dave Hussman, one of the leaders of the post-agile movement, recently hosted a one-day conference on the topic of “Redesigning Agility”, and invited me to give a plenary talk. The focus of the conference and my talk were how to integrate agile development with interaction design. I was very... (Continue)

Conversations with machines

by Kim Goodwin on August 4, 2008

Every time I get on the phone with some corporation or other, I find myself reflecting on why voice interfaces are so uniquely infuriating. Clearly, I’m not the only one who thinks so, or sites like dialahuman.com and gethuman.com wouldn’t exist. I suspect the problem lies not only in wretched usability, but also in the fact that voice interaction sets higher expectations for reasonable, human-like behavior. If humans interact with computers as if they were also human, as discussed by Byron Reeves and Clifford Nass in The Media Equation, this seems even more true for computers other software-powered devices that accept voice input in addition to using voice output; after all, if it can understand what you’re saying, it must be able to think, right? In their very readable 2005 book, Wired for Speech, Nass and another colleague, Scott Brave, assert that this is indeed true. Hearing a human say, “I’m sorry, I didn’t understand that” three or four times in a row would be enough to inspire violent impulses in the most dedicated pacifist, and many people have similar reactions to voice interfaces. So is a more “human” interface necessarily better?

Even though we know we’re talking to a machine, we humans respond to perceived emotion even in recorded voices. For several days after we installed a new phone system in our offices, people continually commented on the doleful female voice that responded to deleted phone messages by saying “duuh-leted,” dragging out the first syllable and drooping at the end, kind of like a mopey teenager asked to take out the garbage. Discontented machines are especially noticeable, though excessive perkiness is irritating in some circumstances: “I’m sorry, you’ve been on hold for 20 minutes, so your session has expired.”

Then there’s the question of voice interface etiquette. I don’t need to feel like I’m talking to a 1970’s cylon that responds to my requests with a metallic and subservient “By your command,” but I want to smack my bank’s voice system for its presumption when it says something like, “If you’d like to speak to an agent, say ‘Agent, please.’” I believe in saying “please” to other humans, but I don’t politely ask the cat to move off the couch, and I’m certainly not going to extend the courtesy to a computer (though I think it ought to apologize to me when it can’t help). According to Nass and Brave, I’m not alone. Most people in their experiments didn’t respond well to synthesized voices using the first person,and even recordings of real voices didn’t get a warm reception when using the first person to deliver bad news. In fact, the use ofthe first person for bad news only increased listener perception of the system’s incompetence, perhaps since listeners were more likely to judge it by human standards.

Perhaps the most interesting point Nass and Brave demonstrate is how contrast of any kind draws attention to system shortcomings. This makes intuitive sense from everyday life; you might be content driving your five-year-old economy car until you ride in a colleague’s brand new sports car, or think Madonna sings well until you hear Ella Fitzgerald in her prime. In audible interfaces, the unfortunate contrasts underscore the ways in which the technology simply can’t replace a human.

Your voice system will be better received if you avoid these typical contrasts:


  • Inconsistency in personality and content. There’s a reporter on one of Bay Area TV news shows who tends to report on the death toll from the latest global catastrophe with a smile on her face, which always makes me wonder what kind of strange things are going on in her head. Similarly, people are less likely to enjoy or trust their interactions with a system that cheerfully reports an inability to help, or that seems terse or unfriendly in the course of ordinary transactions.

  • Combining high quality output with low fidelity input. If a system talks in complete sentences using a recorded human voice but can’t parse a simple request or recognize common words, it comes across not only as a technologically limited system, but as a deliberately obtuse and infuriating person. Clear but obviously synthesized speech leads to lower expectations of “intelligence.”

  • Mixing recorded human voices with synthesized output. Dynamic content—such as email, news, and Web site content—is difficult or impossible to construct from pre-recorded bits of human voices, so synthesized output is sometimes necessary. Having a human voice speak part of the content while a synthesized voice speaks the remainder is distracting.


There’s a lot more to successful audible and speech interfaces than consistency and appropriate personality, but both are essential in leaving callers with a positive impression of your brand.

If you haven’t heard much from me in the Journal lately it’s because I’ve been immersed in writing the comprehensive book on Cooper’s methods, from planning research to writing specs, and there are still about 150 pages between today and my content completion deadline at the end of August. However, I thought I’d start sharing some snippets of thinking from the book, which won’t hit shelves until January. This is my first installment.

Filed under: Service design


Kim Goodwin

As VP Design, Kim has played a major role in developing our Goal-Directed methods and turning them into the Cooper U curriculum, and she continues to work with the leaders of each design discipline to evolve and improve our practice. Her design expertise


More entries by Kim


Post a comment


Name

Email Address

Comments (Feel free to use basic HTML tags for style)

We're trying to advance the conversation, and we trust that you will, too. We'd rather not moderate, but we will remove any comments that are blatantly inflammatory or inappropriate. Let it fly, but keep it clean. Thanks.

To help filter spam, please enter the letter y here