“Word lens” is lame because you’re still dumb

Recently the internet buzzed with the introduction of Word Lens, an application for the iPhone which uses the camera to perform on-the-fly translations of signs and menus printed in a foreign language. The video demo is super compelling because the translation is so fast, and the interface so non-existent, it is as if you can suddenly read Spanish.


Imagine the places you will go. The richness of your new experience, when the previously opaque meaning of foreign signs is now clear. You are no longer forced to wander the streets, wondering what kinds of shops you are passing. You can understand signs regarding public transportation, tourism and safety. You sit down at a restaurant and with the help of Word Lens you can read the menu. The waiter approaches and quickly utters something, and waits attentively for your response. You glance at your iPhone… nothing. You flash a pained smile back, mutely trying to communicate you don’t understand. Word Lens is lame because it’s only half of the solution. You’re dumb because you can’t speak and really communicate.

Don’t get me wrong, Word Lens is a great step forward. It will help with some of the anxieties of travel, in particular in using and navigating complex transportation systems. These kinds of tasks don’t really require two-way communication. Simply reading and understanding your options is a major win.

buying subway tickets in tokyo
Trying to buy tickets for the Tokyo subway, would have been nice to have Word Lens.

But, you don’t need to read to understand what a particular storefront offers. You just look at what’s on the shelves.

Hong Kong Street
Hong Kong street scene, no translation needed for understanding

You don’t really need translation help for safety related issues. These were solved a long time ago with universal picture language.

Paris "walk" pictograph
Where safety is concerned pictures have long sufficed, Parisian “walk” pictograph

The hardest part of travel isn’t understanding, it is being understood: Asking for directions, ordering food, asking for a receipt. It’s frustrating to struggle at expressing your needs.

Word Lens leaves you with a little more input, but a frustrating lack of output. Now you may understand, but you still can’t say a damn thing.

The speed and accuracy of the underlying technology is a breakthrough. The transparency and dead-simplicity of the interface is exactly how visual hand-held translator should work. As many people have commented Word Lens delivers on the promise of augmented reality. This technology shows great potential and will most certainly be adapted and built upon.

But, until we get a voice, a way to communicate back, Word Lens is little more than an amazing party trick.

Typing into Google translate lacks the elegance, speed and simplicity of the Word Lens interface, but it does get you to “speak up” for yourself. How could Word Lens improve upon this?


A couple of girls use Google Translate to order Indian food

10 Comments

J.C.
This reminds me of the people who said the iPad was useless because it didn't have an HDMI connection. I should say up front that I know the author of Word Lens, so I may be taking this review a bit personally (as I know how hard he has worked on it), but I think this review shows what is wrong with our profession. We are always tearing things down instead of helping to build things up. Word Lens was created to solve a specific problem: helping you to read signs and menus without needing the internet (and incurring roaming charges). I think it performs wonderfully for that use case. To discount it because it only works for its intended purpose and doesn't solve all problems for travelers is silly. I think the trend towards small apps targeted to a particular use case is a good one. I am glad we are moving away from the behemoth one-app-does-everything approach. I am sure there are a lot of people working on voice translation apps, etc... I should also mention that this is version 1.0 of a product. Sure there are problems, both with the translations and the interface (e.g. I have seen a lot of people get caught by a mode error with the lock icon), but it is damn good for a first version. The developer is actively testing with real users, and making improvements. I don't know what more you can hope for in the real world. Hopefully someday we will have apps which act like the universal translator from Star Trek. I know AT&T has been working on that since the early 90's, and I am sure Google is working on it as well. Until then, I will be using Word Lens to understand the menu and then point to the items I want. It isn't perfect, but it works...
stefan klocek
J.C. thanks for your thoughtful reply to my post. If the only need Word Lens was aimed at solving was menus and signs, then it is doing so in an amazing way. Perhaps I didn't give enough props (I tried, but didn't go overboard) to the brilliance of the interface, which is so simple, so straightforward, so "obvious" all future translation tools should use this as the benchmark for how translation should work. I love the interface. It makes me really happy to use it. It feels right. My intention with my post was that a brilliant achievement in interface isn't enough. Yes this is version 1.0, and I can't wait to see where they take it. I wanted to point out, perhaps the obvious, that deeper problems need to be solved in the realm of translation. This is a phenomenal step in the right direction. My post failed to generate the conversation about how to best solve the other problems (asking questions and understanding answers). Let me be clear, I am in agreement with you, "this is damn good for a first version."
Nick Myers
First, I'll say that I think Word Lens is an app I looked at the first time and was truly blown away by. It's great idea, well executed and I'm astounded by it. I do agree with Stefan's reaction. The app successfully helps a person translate something in text but doesn't yet support a conversation. I'm sure it's going to happen in future releases but it's also a good reminder that great products come from helping users achieve their goals. Even J.C.'s comment about supporting use cases to me implies more about designing for tasks and not designing for user's needs and goals. There's a subtle difference there but one that makes a difference in delivering success. Do I think it was released too early? Absolutely not. It's received so much buzz that the app achieved great promotion that I'm sure will help business goals. And it does help people already. I also believe it's a great way for the product owners to learn more about how people are using the app and what future opportunities might look like from the perspective of real users. As a side note, it's never our intention to rip people's work apart. It happens to us, we're sensitive to it, and apologize if we didn't phrase our words appropriately. We're simply trying to have meaningful conversations and advance thinking about design.
Billy
I think language dictionary's are "just a party trick" because all you can do is translate words. Your comments sound very respectful but the whole tone of the post is inappropriate. It's fine to explore future ways to improve upon the app, but for $10, you're asking for a lot with the above. The post implies that there's a better way to do what Word Lens is doing but offer no alternatives other than to state that there are other issues to be solved. I don't think it's tough to recognize that there are a lot of use cases where Word Lens doesn't solve the issue, but this article would be inspiring if it offered solutions to what could.
Doug LeMoine
Alternate title: "WordLens is awesome because it makes us want more."
Ben McKenzie
I agree with the idea behind the original post, though the language - "Word lens is lame" in the title! - makes it easy to read it as a dismissive critique. I guess the purpose of the post is unclear; perhaps it would have been better to frame it as "what else is possible?" rather than "this isn't good enough"; to celebrate the achievement of seamless text translation, and ask whether the same could be achived for speech, and whether it opens the door to other communication solutions. I find Word Lens astounding and potentially useful; I only say potentially since I live in Australia where hardly anyone speaks Spanish, so I won't have a use for it until other languages are available. I do see your point, though: Word Lens doesn't let you read other languages, it just translates for you. That it only translates text isn't the big problem either - it's that the translation is only one way. The "obvious" next step is to translate in two directions, but is that really so obvious in the context of augmented reality? Such apps take your environment and translate it or embellish it; they provide additional information to you. By their nature, AR apps are input only; the main exceptions are those which use crowdsourcing for the augmenting data - things like Urban Spoon or Google Maps. Then they offer a way to contribute to that data, though note your contribution is mainly only useful to others, since you were able to collect or create that data yourself. Word Lens could easily evolve into something along these lines, adding annotations and links for common terms and phrases which require more than a simple word to word translation (in this context it could work on English to English translation in different cultures). But it's a tool for understanding text in another language, not for communication, and I have to say I've not seen it sold as the latter. It's worth mentioning, too, that we're ignoring the fact that you *can* use Word Lens two ways; if you can produce what you want to say in English in a form readable by the app, it can translate that into Spanish for you. As the app improves and is able to read handwriting - or at least hand printed letters - with greater accuracy, you'll be able to write notes and instantly have them translated into something that someone else can read. Again, evolution could take it beyond this; it could add pronunciation, or maybe just read the text aloud (in the translation or original language) for you when you press a button. Those are interesting and possibly even likely next steps for Word lens; I'd be interested on your take on how they might work into the design without ruining it's invisibility.
J.C.
@Stefan: Thank you for your thoughtful response to my criticism @Nick: With regard to a user's goals/needs, I am saying that in a real world product you have to draw a line around what you think is possible to deliver. There are a lot of companies out there that try to do too much in a single release, and as a result the whole product becomes unusable. It is much better to have a product that does what it does well instead of trying to be all things to all people. Remember that this was developed primarily by a single person. I also think that the fact that he spent extra time making sure that the app can be used without the internet (and roaming charges) shows that he considered the user's goals and not just tasks. My main question to you is if instantaneous voice translation was currently possible, should that be added to something like Word Lens or should it be its own app? I am honestly not sure, but I would lean towards putting it in its own app. There are cases where you want to read something without making noise (e.g. a fancy restaurant), and when you are conversing with someone, you probably don't want to use up the battery by having the camera going. I would probably prefer having both apps in a "travel" folder (along with trip-planning, itinerary, maps, schedules, etc...) as long as there is a quick interface for switching between active applications. I guess the only case I can think of for combining them in the same app is if the UI required heavy integration. For example, speech translation may be difficult in noisy/crowded spaces (e.g. airports, tourist attractions,etc...) because it doesn't know which people to translate and which to ignore. Using the headphones (with built in mic) or speaking directly into the phone's mic might help, but getting a stranger (with whom you don't share a common language) to do either is a bit awkward. Perhaps pointing an app like Word Lens at a person's face (instead of text) would tell it that that is the person you want it to translate (We are mostly into sci-fi here, but I can think of a couple of ways it could help isolate the audio of the targeted person). I would probably favor integration in that case. I am curious to hear what others think about a having a lot of functionality in one app vs. splitting the functionality into a set of integrated apps (on a mobile platform).
Peter Duyan
@JC: You raise a good question about the ways in which a mobile platform requires us to think carefully about all-encompassing apps vs. targeted apps. Many successful mobile apps are surprisingly targeted and Word Lens may fall into this category - it will be interesting to see what direction they decide to take.
Stefan Klocek
@ Ben I like your suggestion of taking Word Lens further, so that it can read handwriting, so at least in theory you could communicate in writing. Assuming that the OCR of handwriting worked out perfectly, my concern with this approach is that writing, then "lensing" the writing would feel like heavy task management. The smoothness of the Word Lens experience is that it is instant, it doesn't feel like a task, it just happens. It seems like a more seamless experience would be speaking into the device and having it recognize and translate what was spoken. Which brings us to J.C.'s comment... @ JC The question of dedicated vs. encompassing app is a great one. Lots of the apps I favor on my iPhone are dedicated and task specific. The narrowness of their functionality helps deliver a focused, simple, easy to be successful experience. It makes it tempting to settle for two apps, each dedicated and great at what they do. But the experience is already video based, which means audio could be a seamless part of the experience. What if the camera continued to do text translation, whatever is picked up visually gets the current awesome visual replacement treatment. Audio works on a different task, focusing on what is picked up by the mic. It get's a bit sci-fi here because the processing speed isn't available on current hand-held devices, but assuming it is, and assuming enough isolation of voice (you point out great issues with targeting the right voice in loud public environments) how might it work? You could wear your headset and have the audio (speaking) around you translated in realtime. A little delay is inevitable, but it could work a bit like UN translators, who use earpieces to give near-simultaneous translations. Speaking presents another challenge, because your audience doesn't have your earphones in. I can think of a couple different directions for this. One is speaking in short bursts, and hearing though your headphones the translated words paced so they prompt you to repeat them out loud, leading you in your communication attempts. It would need to know your voice and not attempt to translate it back for you, it would seem halting and strange, but it might work. Another possibility is that the ubiquity of devices means all devices transmit their owner's native tongue, so each device negotiates to help it's owner communicate and understand. My device tells the Spanish inkeeper's device I speak english, and his device translates English into Spanish for him, while mine does the reverse. Each of our earbuds delivers translations, while each of our tongues speaks natively. The flow of our experience would match the seamlessness of the way Word Lens translates printed text today. It's really sci-fi, but I imagine eventually we'll get some form of mutual simultaneous voice translation.
Jenea
I have to go with the crowd on this one. How can you reconcile "amazing," "breakthrough," and "exactly how a visual hand-held translator should work" with "lame?" Word Lens fires a shot across the bow of all other translation interfaces and demands that we all think differently about the possibilities. Is it the final end-all and be-all? No. Did it completely blow my mind? A million times YES. After my first experience I wanted to send Word Lens flowers and go for long romantic walks on an exotic beach somewhere together (preferably where there would be signs in some language other than English). Sometimes game-changing and awesome really *is* good enough for now. @Word Lens: Love you man. Call me.

Post a comment

We’re trying to advance the conversation, and we trust that you will, too. We’d rather not moderate, but we will remove any comments that are blatantly inflammatory or inappropriate. Let it fly, but keep it clean. Thanks.

Post this comment