New technologies ensure you’re not left speechless


Puja Pankhania: What have been the major developments at Loquendo in the last year?

Paolo Coppo: This last year saw a further increase in the adoption of Loquendo Text-to-Speech throughout the PND, automotive and telematics industry. We are the choice for market leaders such as TomTom, NavNGo and now DeCarta, and can be heard on devices by major manufacturers, such as Magellan, Clarion, HP and Nextar. Consequently, Loquendo TTS has become a reference technology in the telematics sector, renowned for the fluency and naturalness of its Speech Synthesis and for its wide choice of languages and voices.

The underlying Loquendo TTS core technology and tools made significant advances during 2008. At the phonetic level, further enhancements and improvements have been made, and in terms of the tools, we have now extended control and flexibility options available to speech application designers within our "Loquendo TTS Director" prompt authoring suite. For instance, the User-Driven Unit Selection Tool gives prompt designers greater control over Loquendo Speech Synthesis by allowing them to select the phonetic output themselves – to customise their messages in terms of pronunciation accuracy and intonation. With the Reading Style feature, one mouse click enables the simultaneous adjustment of a great many TTS parameters, making the creation of prompts faster and simpler. The Voice Flavour feature allows prompt designers to hear their messages with the correct sampling and compression rate for the intended target device or application peculiarities.

The most significant breakthrough, however, is the result of Loquendo's ongoing strategic investment in its ASR technology over the past two years: the release of an ASR solution optimised for PND and in-car platforms, with enhanced performance in noisy, in-vehicle environments. Speech recognition technology has now reached a level of maturity that makes it a serious and very attractive option for use on in-car and mobile devices. Indeed, speech recognition technology will be essential for improving safety on the road and for giving drivers effortless control over the full range of on-board devices by simply using their voice.

The company's investment in ASR has already yielded results in the form of several early adopters of Loquendo ASR for Automotive – including NavNGo, MLS Destinator, Intelligent Mechatronic Systems and X-ROAD, along with a high level of demand from the market.

Navigation is ‘international' by its very nature, and Loquendo's clients and partners are able to take full advantage of our long-standing commitment to the continuous expansion of our portfolio of languages and voices – currently at 25 and 60 respectively, and constantly rising. This year we have added several new languages – including Russian – and many new voices – including two brand new American English voices.

Finally, Loquendo has also extended its support for mobile platforms – notably iPhone, Win Mobile, Symbian and Linux – making Loquendo speech technologies available to developers of voice-enabled applications for mobile irrespective of the device.

Navigation and location-based services on mobile devices is a market currently undergoing significant expansion and attracting considerable interest as more and more smartphones are GPS-enabled and offer some form of navigation. Many analysts speak of navigation applications becoming a standard commodity on mobile phones in the medium-term, and Loquendo TTS and ASR technologies are ideally positioned to meet such market demand.

PP: What are the major challenges within the in-car navigation market?

PC: The current economic downturn will inevitably have consequences for every economic and market sector, and consumer electronics – where car navigation lies – is clearly no exception.

There is also an economic aspect to the on-going PND vs. in-car debate: the slashing of prices that accompanied the huge surge in PND sales has influenced the other complementary market, and factory-fitted in-car devices are now dropping in price, making them no longer the relatively costly investment they once were, but positioning them nearer to PNDs. Since they are installed in the vehicle, they also have several advantages: they are more or less ‘unstealable' and do not have to be removed every time you leave the car unattended; while being fully integrated with in-car systems offers certain additional features, such as ‘dead reckoning' for estimating the car's position even when out of satellite range; and the larger screen gives enhanced visibility. TomTom and Renault recently announce the availability in 2009 of an integrated navigation system below the all-important €500 threshold.

There has also been significant convergence of the two markets, such as the development of the ‘dockable' navigator, where device and car manufacturers have teamed up to enable fully portable sat-navs to be fitted into a specially made slot in the dashboard and so connect with the car stereo and other car systems, giving travellers the best of both worlds. Which direction the market will take is by no means clear at this point, but one thing is certain: pressure on prices will inevitably affect every player in the value chain.

In both sectors, a fundamental question remains unanswered for the industry: which type of telematics services are drivers willing to pay for? Will they pay for on-board connectivity? A monthly fee for in-car broadband Internet access will surely not be acceptable for everyone, and most users will prefer free, advertising-supported content like they already have on their mobile phones. It is therefore vital that service providers remain flexible and keep options open for consumers.

The industry must therefore conduct more market research in order to identify the types of content consumers actually need on their navigation devices. Limited studies so far have demonstrated that drivers consider infotainment content – such as mobile TV and other media – of secondary importance over more practical services such as real-time traffic, fuel prices, low-cost hotels, etc. This pattern is unlikely to change in the current economic climate.

PP: How does Loquendo's speech technology respond to this?

PC: By providing application designers with the highest quality and maximum flexibility across our entire product range, backed up with solid customer support.

Giving application designers flexibility means Loquendo Speech Recognition and Synthetic Speech must, on the one hand, continually present new possibilities for enhancing human-machine communication by voice, and, on the other hand, make it easier and faster for our customers to integrate voice technologies into their devices. Such new features must be innovative but also practical – the ultimate goal here is a natural interface that makes navigation devices easier and more intuitive for drivers to use while operating the vehicle safely.

Loquendo TTS and ASR are fully compliant with the TeleAtlas® and NavteqTM SAMPA phonetic alphabets, enabling the correct interpretation of maps on navigation devices and the correct reading of place names, POIs, geo-tagging data, etc.

This makes Loquendo TTS the perfect in-car map reader. It is able to read out directions clearly, fluently and expressively, and can pronounce street names as the map makers intended them to be pronounced – even when abroad, which is not an easy task given the many thousands of place and street names across the world. This is made possible by Loquendo's patented Mixed Language Capability, by which any Loquendo voice can correctly say any word in any language while maintaining its native accent; or, if you prefer, you can change both language and voice, benefiting from Loquendo's rapidly expanding portfolio of languages.

On the Speech Recognition side, this feature makes it possible to deal with the pronunciation variants often found in street names (e.g. Houston, Texas vs. Houston Street, New York), giving Loquendo ASR the ability to recognise destination addresses rapidly and accurately.

Loquendo TTS is ready and able to keep pace with the increasingly complex demands and expectations of synthetic speech as technology marches forwards. In-car connectivity will require TTS that is capable of reading out dynamic content of every kind of style – email, SMS, mixed language content such as song titles or artists' names, web pages. Loquendo TTS databases are continually updated to keep up with the changing use of language around the world.

Indeed, we offer an SMS lexicon which, when loaded, enables the correct reading of text messages and Internet slang, so that Loquendo TTS has no problem interpreting, for example: r u @ home 2nite wd b gr8 2 cu? (Are you at home tonight? It would be great to see you.)

Loquendo Speech Recognition technology is also keeping up with market demands. For instance, a robust hot keyword feature has recently been introduced, which enables fully hands-free control of in-car devices since the speech recognition doesn't need to be activated by a push-to-talk button – the driver simply pronounces the chosen hot keyword and the device is ready to take orders and navigate.

Natural Language Understanding has also become an important market requirement, as it enables an ASR to perform quickly and accurately even in circumstances such as when a speaker is talking in a natural and informal way, using expressions such as "Well", "Let me think", "Um", "Er", etc. A good Speech Recognition engine must be able to differentiate between such common utterances and valid spoken commands and discard the superfluous. Loquendo ASR makes use of Garbage Rules, which can exclude any utterances not contained in the speech grammars to remove them from the recognition process. This allows the speaker to talk naturally and to interact with the device in a way that is comfortable – making the voice interface both intuitive and user-friendly.

Loquendo ASR supports large vocabulary speech, up to many tens of thousands of words, leveraging its highly efficient Neural Networks/Hidden Markov Models hybrid approach. Furthermore, Loquendo ASR is available for all major languages, and has been specifically trained for use with high levels of background noise – such as in noisy traffic, with the air-con at full blast and with the window open. The result is Speech Recognition software able to accurately and rapidly understand your every command in the language of your choice, including multilingual street names, thus giving you complete voice-control over all your in-vehicle devices and taking the stress out of your drive without taking your attention away from the road.

PP: How do you see the industry evolving over the next five years?

PC: The next five years will see the transformation of navigation devices from ‘never-get-lost products' into genuine ‘Travel Assistants', keeping drivers informed, warned and entertained for the entire journey. We will see convergence between media players, computer games, search engines, mobile phones, instant messengers and GPS into one unified and all-encompassing device. There will be an explosion of content, and drivers are going to need a simple way of managing this content without distracting them from the safe operation of the vehicle, and having this content seamlessly available regardless of the context.

Many sources expect the in-car environment to become increasingly more like the home environment: broadband connections enabling full browsing and search capability, TV/DVD, P2P applications, social networking, more sophisticated LBS, hazard-alert services, etc. available to all passengers, not just the driver. An incredible amount of local environment data will be available via GPS, uploaded from the many thousands of connected devices to produce an increasingly elaborate picture of traffic flows.

Faced with such information overload, drivers will be looking for a way to reduce the cognitive burden, to interact with services and devices while driving safely. The voice is one means of human interaction that is open to all, and speech interaction is simple, intuitive, and leaves the eyes and hands free.

LBS offers an increasingly sophisticated and comprehensive picture of our surroundings, notifying drivers of low fuel prices, heavy traffic, localised weather conditions. Thanks to push and pull dynamic content, driver data and traffic data will be continually updated for and by an increasing number of device users right across the world. We will, in the future, be able to access information on hotels and restaurants with good/bad service, available parking spaces, traffic jams as and when they happen, early warnings of accidents up ahead, etc. The possibilities are virtually endless.

The safety aspects of telematics will also become more advanced, giving drivers a full picture of road conditions, bad weather, accident hotspots – all continually updated and increasingly accurate as more and more users share and update the available information.

In-car safety systems are also being developed to monitor the state of the engine and the performance of the vehicle, and to alert the driver to any problems as soon as, or even before, they arise, improving fuel consumption, wear and tear, and resulting in a safer drive. Such systems will be invaluable in the prevention of accidents.

Still on road safety, the European Commission's eCall project will prescribe the use of in-vehicle sensors to detect an accident and automatically make an emergency call (eCall) to the emergency services. eCall makes use of many leading-edge technologies, including Speech Synthesis, and will speed up the emergency response, mitigating injuries and reducing the level of fatalities.

The future of navigation will see devices become more widespread and with more information and services available. Drivers will be looking to simplify this complexity and to filter this wealth of information, and they will be able to do so by simply using their voice. Speech Recognition and Speech Synthesis give users full control over their on-board and mobile devices, allowing them to ask for their favourite song, search for the nearest ATM, read their email, browse the web or book a hotel, without ever having to take their eyes off the road.

Finally, legislation will also play a key role. Should map display be banned during driving, as is rumoured in some countries, voice input/output would become the unique alternative for navigation.

Loquendo and deCarta will jointly present a case study – Exploring the growing role of speech technologies in navigation – at Telematics Update's Navigation & Location USA event, which takes place on December 2nd and 3rd in San Jose, California.

Leave a comment

Your email address will not be published. Required fields are marked *