Earlier this month, a coworker approached me about working on a voice UI project. At first I was a little bit confused because my passion is visual design and I couldn’t see exactly how that fit in with voice UI. Looking back, that was pretty ignorant.

While there isn’t a traditional user interface with voice UI, the visual design feedback aspect is actually very important for a successful product. Most products out there right now use a some variation of a simple LED light interface. Many products lights are in a circular ring around the outside of the product, most light up on top, and one voice UI product even simulates a robot’s face speaking.

I’ve identified 6 key moments with voice UI that visual feedback is important: powering on, listening, thinking, speaking, adjusting volume, and being placed on mute. Each of these “visual feedback moments” vary from one another and vary across different products.

When the product is powering on:

Until these products can be plugged in and immediately start taking directions, they need some sort of loading visualization. Think of it like the powering on screen when you turn your computer on. It needs time to turn itself on and connect to the internet before you can start using it. Google home has a great loading screen where it slowly lights up white until it’s ready at which point it flashes a colored ring around and around until you give it its first command.

When the digital assistant is actively listening to you:

It’s important for the product to be able to signal to the user that it’s listening to you. This way, you don’t finish giving it a long winded command just to realize that it’s not even plugged in. Just like a conversation with person, the product should have some way of signaling that it’s actively listening. When a person is listening they might look you in the eyes, nod their head, or slightly change their expression at an appropriate time. With a product however it seems to be much simpler with the Echo or Dot (shown below), it lights up blue with a lighter blue color pointing towards the direction of the speaker. 

However my personal favorite behavior for listening is found in Mycroft’s Mark 1 robotic face (featured below); when listening Mycroft has a squiggly line (a wave like shape) that moves up and down and inspires delight.

When it’s thinking or loading a response:

These products typically take an extra second pause after you’ve asked the question but before they answer. There has to be a visual feedback so that the user knows the product is working on getting your question answered. When FABRIQ, a speaker with Alexa capabilities, is listening to the user, it blinks its lights all the way on and off very quickly like a strobe light. This example (shown below) is actually a little jarring; it almost feels like it’s trying to tell the user that there is a problem.


When the digital assistant is speaking:

This might be the first instance you think of when considering the visual feedback from a voice UI. However, the interesting thing is, this might be the least necessary of all 6 of these moments. For most of the other moments it’s necessary to have visual feedback, but when the digital assistant is speaking it’s not necessary because you already have audio feedback. You don’t get any audio feedback from the other categories so it’s necessary to visually see that you are interacting with her.

Many voice UIs have the lights mimic an audio waveform; in the middle of every consonant spoken, the brighter the visual feedback. So between each spoken word, the visual feedback fades, but in the middle of the word or consonant, the visual feedback is the brightest. 

However, with both Google Home, Amazon Echo, and some other Alexa powered products (like GE’s LED lamp, shown below), the visual feedback does not follow the traditional audio waveform pattern. These examples simply slowly pulse their lights up and down but don’t follow what is actually being said.


When you are controlling the volume (with google and Alexa, you can either ask her to change volumes for you, or you can manually adjust the volume on the product itself):

When you want to change the volume on a voice UI, most products (like the Google Home shown below) have chosen a visual feedback rather than an audio feedback to show you how loud/quiet the audio is. If you’re listening to music through one of these devices, it would be better to have visual feedback because an audio feedback tone would distort what you are listening to. Additionally, if the user is attempting to change the audio by a voice command, it can only change the volume one unit at a time. In this case, hearing one sound at a time wouldn’t necessarily make sense unless you were changing the volume more than one unit at a time.

When it’s muted (not all voice UI has this function):

When you mute a voice UI, visual feedback is necessary to prevent confusion. Without visual feedback for mute, the user may forget they have muted and start screaming at it or begin to think that it doesn’t work. I do this far too often on my laptop: I dim my screen all the way (to the point where it’s almost black), then come back to it a few hours later and convince myself I’m going to have to get a new computer because mine won’t turn on. The strongest example of this is Amazon’s mute (featured below); it’s a bright red ring around the outside with a button in the middle.

In conclusion:

Voice UI needs some sort of visual assistance to aide in the audio experience. These 6 key moments are very important for user interaction however they may be a bit limiting to overall experience. As more and more of these products come out, we will begin to see more variations and limitations of the visual feedback that voice UI can offer. 

Will talking tech solve more problems than it creates? In my personal opinion the limitations of having no real interface will begin to wain user engagement. I think the future of voice UI doesn’t lay in giving it better visual feedback but actually in giving it an entire visual interface.