Together with my colleagues and our development partners, I was responsible for the series introduction of this voice control system. I’m quite proud of what we’ve accomplished. Thanks to MBUX, this model literally becomes a smart car. And we have showed you exactly how smart it is at the CES in Las Vegas at the beginning of January:
MBUX Voice Assistant
It understands you perfectly.
This article was originally published in the Daimler blog.
The introduction of our MBUX Voice Assistant in the new A-Class has caused quite a stir this year. For the first time, our customers can now control a number of functions, ranging from the navigation system to the radio and the air conditioning, very simply and intuitively through voice commands.
8 min reading time
In this blog post I want to tell you how we developed MBUX.
Linguatronic was the first voice control system
Mercedes-Benz is known as the inventor of the automobile. And we were also the pioneers in the voice control of vehicles. More than 50 years ago, our colleagues in Ulm began to develop programs for speech processing. Many of us were inspired by films such as “2001: A Space Odyssey” and by K.I.T.T. in the “Knight Rider” TV series.
In 1996 we introduced our Linguatronic in the S-Class — it was the first voice control system to be installed in a vehicle. A voice control system naturally makes a great deal of sense in a driving context, because the driver’s hands should remain on the steering wheel and his or her eyes stay on the road.
Speech recognition technology has changed rapidly during the past 22 years. Voice-controlled assistants are popular all over the world. But the constant availability of assistants in smartphones and at home also leads to a certain level of expectation. I’ve had exactly the same experience as the people who have Alexa and Siri and enjoy all the benefits of voice control every day.
An assistant for the car
It’s understandable that these expectations also extend to voice control inside the car. We want systems that have been developed for an automotive setting, are secure within the extreme temperature range between -40 and 70° C, and distract the driver as little as possible.
So if we wanted to create a well-functioning voice control system for cars, we had to first tackle and solve a number of challenges. For example, inside every vehicle are noises that have to be filtered out when I talk to a system. These could be driving noises or the sound of the wind or the windshield wipers. It could also be music or conversations between the passengers.
In addition, we want to offer our customers a high-performance voice control system that functions well even if they are driving along a road where there’s no Internet connection, as is still the case in many areas. If you’ve already tried to use Siri or Alexa at home without a Wi-Fi connection, you know that it doesn’t work.
That’s why we worked together with the Nuance Communications company to develop a hybrid voice control system that recognizes and understands speech both using the head unit inside the vehicle and using the server, and then tries to give optimal answers, regardless of whether the car is driving between tall buildings, driving through the mountains or sitting in an underground garage.
Of course our assistant functions even better with a network connection. For the optimal user experience, we use current data from the server to provide the user with information — for example, a list of the best local restaurants together with their Yelp ratings, the weather report for a certain location, or online music.
The main reason why we were able to provide MBUX with so many functions was our very close and connected cooperation with many different departments and specialist units. Our collaboration with our colleagues from the model series, such as the new A-Class and the GLE, was especially intense. In the process, the A-Class evolved into a vehicle that techies will be crazy about. Thank you!
MBUX knows what you want (to say)
A key criterion for the design of our speech dialogues is intuitive operation. Specifically, this means that we want our customers to be able to talk to MBUX in the same way they would speak with a human passenger. And we want our system to answer just as intelligently and understandably as a human passenger would.
For earlier voice assistants, experts made assumptions about what the users would say in order to activate certain functions, and they wrote corresponding grammars. The users then had to learn these commands, because the experts couldn’t cover all of the variants that a language offers. For users who were willing to learn, this worked very well — but users who didn’t stick to the prescribed commands were quickly frustrated.
We’ve broken down this rigid framework by gathering data from as many users as possible and using it to train statistical models. These statistical models always calculate the most likely next word on the basis of the words that have already been spoken. As a result, we can also cover sentences that were never spoken during the training period.
Thus we can shorten the learning phase of our voice assistants to a minimum. This is an incredibly huge step forward on the path to natural interaction, in which people can speak just the way they naturally do in daily life. However, it applies only to a user’s mother tongue. For people who speak with a strong accent, we’re making progress on the corresponding systems, but we’ve still got a way to go. And although we understand the problem — we’re from Baden-Württemberg, where we have some problems with standard German ourselves — we’re reaching certain limits when it comes to the speech recognition of dialects…
From recognizing speech to understanding it
In the past, speech recognition played a major role. Today we’re focusing on speech comprehension instead. Speech recognition systems transform a speech signal into the corresponding series of words. Speech comprehension goes one step further and understands the meaning of the input sentence. For example, in order to get to Mercedesstraße in Stuttgart, the system has to correctly recognize the following parameters:
Action=Navigate. City=Stuttgart. Street=Mercedesstraße
Our MBUX can recognize this parameter even if the sentence is only
” I want to go to Mercedesstraße in Stuttgart.”
If the system needs additional information such as the house number in order to execute the action, the driver can supply it later. But MBUX will begin the navigation process even without a house number. Here, as in most cases, we depend on “oneshots,” or the direct execution of actions. We believe that rapid interaction is one of our most important criteria.
What’s even trickier for voice assistants is an indirect request such as “I feel cold,” “My hands are cold” or “Do I need rubber boots on the island of Sylt?”. For these situations, we have configured MBUX to make assumptions. If the user says “I feel cold,” it turns up the temperature. For “My hands are cold,” it turns on the steering wheel heating. And to answer the question about rubber boots, MBUX checks the weather report.
But listening and understanding is only one side of the coin. The voice output, or answer, must also be brief and understandable. After all, we don’t want our customers to have to figure out what the system has just said while they are driving. We’ve also added a bit of variety to the answers so that users don’t always have to listen to the same phrases. At some point that would irritate me too.
And what does the result look like?
Of course the MBUX Voice Assistant is fundamentally based on language. It’s all about talking and listening. But for us the graphic representation was also important. We wanted it to support the voice output without distracting the driver’s attention from the road. The system therefore uses a wave to show whether it’s ready to accept commands or is processing another command at the moment.
But of course it also has to support the user experience. So when it gives the weather report, the display shows the current weather conditions by means of attractive and easily understandable visuals, whether it’s lightning, rain or sunshine. And the lists of destinations for the navigation system come with Yelp evaluations and their positions on a map.
With regard to the voice control and the visual design, we’ve made sure that the assistant can be used intuitively and that all the animations flow smoothly, thanks to high computing power. We wanted the visual design to be beautiful, because both the exterior and the interior design play an outstanding role in Mercedes-Benz vehicles.
Why aren’t you answering?
In addition to intuitive use, we’ve focused on the quick reaction times of the MBUX system. That applies to the graphic representations as well as the voice assistant. Speed is a key to success, especially in our era of seemingly ever-faster processes. We’ve oriented ourselves to human communication. Human beings perceive pauses that are 200 milliseconds long or longer as being meaningful. Does such a pause mean that our conversation partner hasn’t understood us, or that he or she is annoyed?
So rapid interaction is really important. But because communication inside a vehicle is a secondary task next to driving, reaction times can be somewhat longer than they are in a normal conversation. We aimed to make two-second pauses the maximum, and after carrying out many optimization measures we achieved this goal for most use cases.
Although we engineers are not famous for our sense of humor, we hid a few small twinkles in our voice assistant’s repertoire — what we call our “Easter eggs.” These are answers that the system might give to customers’ more or less reasonable questions. Some of them are really amusing. Here are my favorites:
“Hey Mercedes, what do you think of BMW?” – “Looks pretty good. But only in my rearview mirror.”
or “Mercedes, what’s cooler than being cool?” – “Ice cold!”
or “Hey, Mercedes, you look amazing!” – “Oh, I’m blushing!”
Comments like these are not only amusing — they also make interaction with the assistant playful and get users enthusiastic about this feature.
To activate the voice assistant, you just have to say, “Hey Mercedes.” Frankly, I’m glad that our company has such a melodious brand name. In the early years of the company, Emil Jellinek established it in spite of Gottlieb Daimler’s opposition. Mercedes is a woman’s name that is long enough to be easily recognized. We’ve made “Hey Mercedes” our international keyword and worked out a number of local variants. For example, German speakers can say “Hallo Mercedes,” and Spanish speakers can begin with “Hola Mercedes.” The voice assistant is available today in 23 languages, and we’re working to add more.
Personally, I’d be delighted if we could dispense with the formal “Sie” in favor of the informal “du” in the German language version. “Du” would be much more in tune with the times, in the new A-Class as well as elsewhere. Many of my colleagues are working intensely on expansions and improvements of our voice assistant.
There will be new software versions for the head unit and on the servers about every six months. Here we’re focusing on expanding the functions, improving the speech comprehension, and eliminating errors. MBUX will soon be available not only in the A-Class but also in other model series such as the GLE and the EQC (combined electric energy consumption: 21.3 – 20.2 kWh/100 km; combined CO2 emissions: 0 g/km)*.
We’re still learning
It’s great to see the enthusiasm the voice assistant can inspire. We already received great feedback from the public after the presentation of our new A-Class in Amsterdam. After testing the MBUX, media reporters are confirming our claim that this is the leading voice control system in automobiles. But the strongest confirmation is coming from our customers, who use the voice assistant 50 times a month on average. That goes beyond our wildest dreams!
We now want to use these millions of speech data to learn very quickly how to make our product even more attractive, add new functions, and correct errors. We look forward to receiving your feedback so that we can improve our voice assistant even further!