Viewpoint: The next (small) step for in-vehicle voice technology

Viewpoint: The next (small) step for in-vehicle voice technology

The National Highway Traffic Safety Administration (NHTSA) recently released guidelines for minimizing visual and manual distractions of drivers, putting new pressure on automobile manufacturers to promote the use of simple, voice-based systems to perform non-driving tasks.

This is good news.

However, even with the advanced state that voice (speech) recognition has reached, speech interfaces in vehicles continue to attract a low adoption rate among drivers, primarily due to perceived accuracy issues. Poor initial experiences by drivers usually result in their quick abandonment of the system.

Overcoming this shortcoming in reliability — acknowledged by NHTSA in its recommendations — doesn’t necessitate accelerated improvement in voice applications as much as the need to integrate voice (or speech) with other types of interfaces.

Using a multimodal approach — incorporating a mix of audio, voice, visual, manual, haptic (vibrations) and augmented reality — can immediately and dramatically enhance reliability and ease of use, particularly for first-time users.

Tap or say

A good first step for overcoming current voice system usability issues is transforming the conventional talk button functionality, normally located on the steering wheel, into an interactive task assistant through a simple “tap or say” prompt.

When a user presses the talk button, the prompt “tap or say your selection” makes it clear to the user that they can tap their selection instead of saying a command that may not be handled correctly by the recognizer. 

A common problem with today’s in-vehicle speech interfaces is that a driver often doesn’t know what to say in response to the talk button’s “please say a command” voice prompt.  

This prompt implies the car is listening for something to be spoken by the driver. If the driver doesn’t respond with a specific command, the recognizer often has issues understanding the user’s intent, and, therefore, returns errors or seemingly inaccurate results.

A related problem in this scenario: unexpected environmental sounds occur frequently when a car goes into a listening mode. These unexpected sounds are not handled well by the speech recognizer, especially when there can be over 1,000 items in the system’s vocabulary that can be recognized.

(For more on in-vehicle voice recognition, see Can voice recognition make telematics services safer?Telematics and speech recognition: Finally ready for prime time? and Telematics and voice recognition: Overcoming the tech challenges.)

Natural voice recognition no cure-all

Furthermore, the recent trend has been to extend the speech recognizer’s capability to enable the speaking of natural phrases, including fully enunciated requests. But not all requests.

For example, the car’s speech recognition system may be capable of dialing the name of a specific hotel but may fail to understand a request to make a room reservation. Such inconsistencies breed the bad user experiences that prompt many drivers never to use the system again.

Adding touch prompts to an existing speech system helps overcome such shortcomings. “Tap or say” commands provide flexibility for the driving situation at the moment and also provide a familiar and natural interface that helps novice users navigate and manage tasks reliably and quickly, without increasing the risk of distraction.

A combination of touch and voice is particularly vital when the task requires the driver to choose from a menu of options relating to destinations, songs or radio programs. The driver switches to speech input when the need arises to enter text.

Mitigating driver frustration

The “tap or say” approach also mitigates driver frustration, another potential source of distraction, often resulting when the driver encounters a system error such as:

  • Incorrect recognition (e.g., “traffic” recognized as “weather”)
  • No match (no guess at what was spoken)
  • Timeout (spoken input not heard)
  • Spurious (invalid command or sound recognized as a valid command)
  • Spoke too soon (user spoke before listening began)
  • Deletion (e.g., 66132 is recognized as 6632)
  • Insertion (e.g., 7726 is recognized as 77126)

The traditional prompting or coaching the driver what to say when a speech system error occurs extends the duration time to complete the task, increasing the potential for distraction triggered by driver frustration and/or cognitive inattention, as well as system abandonment.

When a speech error occurs with the availability of “tap or say” commands, the user is instructed to tap from a list of results without the option to speak. No extra prompting and no extra dialogue steps are required, dramatically reducing the task completion time and the risk of distraction.

One small step for voice controls

Integrating the vehicle’s talk button with the vehicle’s touch screen is just one step in discovering the right combinations of different interfaces to quickly and successfully perform a wide range of actions (task selection, list management, entering text string, warning users, interrupting or pausing a task, resuming a task, completing a task) required for a growing assortment of in-vehicle, non-driving tasks.

Identifying best practices in terms of designing automotive user interfaces requires:

  • Maximizing simplicity
  • Maximizing interruptibility
  • Minimizing the number of task steps
  • Minimizing the number of menu layers
  • Restricting typing (manual text entry)
  • Minimizing incoming messages
  • Minimizing verbosity
  • Removing the need for learning mode
  • Minimizing glance duration
  • Minimizing glance frequency
  • Minimizing task completion time

An increasing body of evidence is pointing to the growing importance of speech interfaces, but it is also demonstrating that avoiding voice menus and minimizing the amount of voice/speech should be added to the list of best practices.

Finding the right combination of interdependent interfaces appears to be where the cutting edge of human-machine interface (HMI) research is leading.

(For more on HMI design, see The smartphone as a model for telematics HMIs, part IThe smartphone as a model for telematics HMIs, part II and Telematics, HMIs and perfecting the user experience.)

Tom Schalk, vice president of voice technology, Agero Connected Services.

For all the latest telematics trends, check out Insurance Telematics USA 2013 on Sept. 4-5 in Chicago, Telematics Brazil & LATAM 2013 on Sept. 11-12 in Sao Paulo, Brazil, Telematics Japan/China 2013 on Oct. 8-10 in Tokyo, Telematics Munich 2013 on Nov. 11-12 in Munich, Germany, Telematics for Fleet Management USA 2013 on Nov. 20-21 in Atlanta, Georgia, and Content and Apps for Automotive USA 2013 on Dec. 11-12 in San Francisco.

For exclusive telematics business analysis and insight, check out TU’s reports: Telematics Connectivity Strategies Report 2013The Automotive HMI Report 2013Insurance Telematics Report 2013 and Fleet & Asset Management Report 2012.

Leave a comment

Your email address will not be published. Required fields are marked *