Voice — speech recognition and natural language understanding — are poised to transform our daily lives and disrupt all industries — not just mobile. Thanks to companies like Nuance Communications — which is also the speech-recognition engine behind Siri, the virtual personal assistant on the Apple iPhone — we are rapidly moving toward a voice-enabled future. We catch up today with Vlad Sejnoha, Nuance Chief Technology Officer, to get an inside track on the topics he will address at Mobilize, the must-attend ‘mobile first’ mobile conference organized by GigaOM (September 20-21, San Francisco). There are less than 40 tickets left — so register before it’s a complete sell out event!
Voice at the center
It has also secured turf in new areas of application such as cars (BMW will be making Dragon Drive! Messaging available in some car models as a connected car service that lets people speak emails and text messages); Smart TVs (Samsung’s 2012 premium Smart TV lineup is powered by Nuance allowing consumers to use natural voice commands to change channels, search for content and connect via Skype); and customer care.
To this end Nuance released Nina — the first virtual assistant customer service app to incorporate both speech recognition and voice biometrics into a single integrated solution— allowing companies to make, brand and support their own virtual assistant persona. (Smart move since surveys show that people appreciate self-service with a personal touch.)
Connect the dots, and it’s clear that human speech is on its way to becoming the no-brainer interface to smartphones, TVs, cars, computers, household appliances — and smart devices and services.
As Vlad puts it: Voice is becoming the “amazing shortcut”, bypassing the vision and touch. “We are coming from the era of the visual interface…With voice, coupled with natural language understanding, we are at a point where we can deduce your intent — what you want done. You can suddenly start interacting with things you don’t necessarily see. You don’t have to find that app in order to get something done. In an increasing number of cases, you can simply state what you want.”
Highlights:
Cutting-edge & multimodal: Voice is the game-changer, but it’s all about reaching a balance between voice and the other input methods that will ultimately lead to the fastest completion of tasks and the best user experiences. Vlad tells us that exploration into ways to use a camera to capture and read the lip movement of a person speaking into a device or at a TV are producing the building blocks for models that will impact our connected lives.
Voice baked in: In Vlad’s view it is an “absolute given is that some level of speech recognition, natural language understanding support will be now built in as part of the fabric or even the operating system.” In fact, a lot of the chip manufacturers are planning future generations of chips that dedicate co-processors to doing things like speech recognition and natural language understanding processing. Will voice be a feature of all services going forward? Let’s just say the technology and support will enable speech recognition and natural language understanding to run in the background.
The last word: Vlad wraps up our interview with a few scenarios (based on real-world progress in the industry and at Nuance) that map out our near future. In his view: “We are quickly entering a really science fiction world where we have immense power at our finger tips or, maybe I should say, at our lips that just would have been inconceivable to people just a decade ago.”
What role will voice soon play in computing, mobile and our daily lives? For the full answer listen in to the podcast by clicking the link below. A hint: “Users we will have a portable virtual environment where — through a variety of devices and context in our daily lives — we will be able to continue interacting a receiving information and communicating in a very transparent and flexible way.” As a result, the intelligence we are building will follow us and remember our preferences. We will control (and increase) this contextual intelligence using voice.