Cozmo, Anki‘s latest robot toy, is a smash hit. Since its US release in October 2016, it has amassed over a thousand reviews on its Amazon product page, averaging four-and-a half stars. But robotic toys have been around for years, so what’s so special about Cozmo? Well – a lot. Throughout several interviews with composers Gordy Haab, Brian White and Brian Trifon as well as sound designer Ben Gabaldon, we learned about their approach to creating the audio for Cozmo. Today, we’re going to introduce you to some of Cozmo’s sound design secrets – if you’d rather learn about its soundtrack, stay tuned for our next Cozmo article!
So what exactly is Cozmo? At first glance, Cozmo is a toy for children (and grown-up children) that moves around exploring by itself and reacting to its environment. It can play some neat mini-games against its owner, which all revolve around the three “power cubes” it ships with. Its A.I. is stored on an external device, i.e. a phone or tablet, which is connected to it via WiFi. Cozmo expresses a wide range of emotions with its cute screen-face, its voice and a variety of different moves it can perform. During the experience, Cozmo plays noises from its built-in speaker while its music soundtrack plays on the mobile device in the background. When Cozmo gets tired, it automatically returns to its charger.
Is it a toy? Is it a game? No, it’s Cozmo!
But what is Cozmo from a sound designer’s standpoint? Yes, robotic toys have been around for years, but Cozmo not only comes with its own soundtrack and tons of additional sound effects but also a voice capable of expressing a wide variety of emotions. When first approaching the creation of Cozmo’s voice, Ben Gabaldon told us, he tried to go about it like he would about a video game: “I did that with Cozmo, I walked in, I started a spreadsheet within the first two weeks, and I was like: ‘Here’s basically the assembly line that we’re going to need, and I’ll start creating the content and we’ll stick in the animations and it’ll just work.’” But very soon he realized: “Making a robot connect to you emotionally is definitely not just making sound effects.”
“It seems like everybody else hears a robot, but I just hear me”
At that point, Anki had already been looking for a voice actor to provide the base material for Cozmo’s voice for a while. Gabaldon surprisingly ended up filling that role after joining the project as a senior sound designer: “We needed to get a demo out there for investors, so I needed to get something. So I just got a little RØDE microphone and just started saying really weird stuff into it.” Sounding distinctly robotic and at the same time organic and full of character was a big challenge, but at the end, Gabaldon achieved just that, manipulating his voice with one specific signal chain.
The first problem with “just making sound effects”, he learned, was that the individual situations Cozmo would react to were far too specific, which resulted in equally specific animations. Cozmo just wouldn’t come to life with a repertoire limited to five “happies”, five “sads” and five “excitings”. So Gabaldon tried to set up a modular system that would function almost like a language, recording certain syllables and trying to reassemble words from them. But even this approach turned out not to be accurate enough to express Cozmo’s emotions. As an example, Gabaldon named Cozmo’s reaction to when it gets an upgrade: “So he needs to be curious with what’s happening to him, he needs to be surprised that it happened to him, and he needs to be impressed. So what’s been kind of cool is that it started off really generic and now it’s specifically ‘Cozmo loses this game round 2’. He has a specific animation for it.” So Gabaldon started recording individual sounds for each animation. However, due to the complexity of the signal chain, it was extremely hard to tell which emotion a certain input would end up sounding like after being processed. Gabaldon had to do a short “study”, recording 30 different syllables and listening to the respective results. Using his findings as a starting point, he started building up a library of Kontakt instruments containing all the different emotions: “So everything Cozmo ever says is something that’s been recorded, edited and keymapped and saved as its own unique instrument for every possible mood or thing that he says. So if I need to call up something to fit a new animation, I’ve got 30 takes keymapped with pitch-control map on my controller.”
Like any software, Cozmo’s app can and will be updated in the future. Since all the content, including the A.I., animations, and audio, is actually stored on the mobile device, Cozmo will be evolving constantly, learning new moves, games and words to express himself. For the 2016 Christmas update, Gabaldon said, “we introduced all kinds of new features. I have lots of new sounds, I’ve taken all of the stuff that I didn’t like, I remixed it – and it’s just an app update.” This provides a huge advantage, giving Anki the possibility to gather data and react to how their clients actually make use of Cozmo. Gabaldon will try to further develop Cozmo’s “language”: “I want him to sound like he’s really thinking in his own language and he has a lot of thought process going on behind the scenes. And continue to develop what sounds like a language to him and actually have some consistency behind it.” The updates will also provide a possibility to refine Cozmo’s emotional reactions. Although it already reacts differently to a wide variety of situations, its’ emotions mostly don’t carry over from one moment to the next – something Gabaldon is ready to tackle in the future: “If you drop him off the table and you pick him up, he can be in a bad mood. And I want him to reflect that more clearly – and not just, you know, be back to being happy and picking up his cubes.”
As if all this work to create Cozmo’s voice hadn’t been enough, the creators at Anki decided it would be really cool if Cozmo could say its owner’s name. They included a facial recognition software giving Cozmo the ability to store several people’s faces and their names in its memory. However, so far all the processing of Gabaldon’s voice happened before the files were actually loaded into Cozmo. To avoid pre-recording every name in existence, Cozmo’s creators had to come up with an alternative method. The audio programmers implemented a basic text-to-speech function, and Gabaldon was tasked with re-creating his signal chain – but now in real-time, within Cozmo’s software: “Anything you type into the app, if it’s your name, if it’s sentences, if it’s whatever it is that gets typed into the app, it actually generates a WAV file that, if you heard the source material, is a deep man’s voice [when read by the text-to-speech software]. I have to real-time process that through Wwise so that it kind of fits the register of Cozmo.” So far, so good – but the more accurate this imitation of Cozmo’s voice got, the less comprehensible the words became. Since Cozmo usually uttered robot-gibberish, its original voice wasn’t up to the task of pronouncing actual human words. “So, yeah, I got it sounding pretty good where it really sounded like Cozmo’s voice, but what happened is: We put it in a room, and it’s supposed to be a really cool moment. Cozmo sees your face, he studies it, and he’ll say the word that you typed in, but if people don’t recognize their name, the whole moment is lost.” Thus, the processing had to be done a little differently after all, but that is hardly noticeable.
Servos and screen sounds
Apart from its voice, vivid eyes and animations, Cozmo expresses itself with additional sounds accompanying its’ actions, including even the tone of the servos: “If you have voice-over for every single thing he does, it totally gets old. So the other system I wanted to build is subtlety where he has, like I said, a whole suite of sounds for just his screen. And same with servos. So if he moves and he sees a block and he gets excited, he’s going to vocalize, but when he begins moving toward that block, I wanted servo-sounds to support that he’s in a good mood, he’s happily moving towards it.” To achieve that, Gabaldon had to work with and against Cozmo’s “natural” sounds: “Cozmo is inherently kind of noisy, and that’s always been a challenge. So when he moves, there are real servo sounds.” Ultimately, what the user ends up hearing is a mixture of both Cozmo’s servos and supporting servo sounds from its speaker.
When asked about the future of toys like Cozmo, Gabaldon said that A.I. would probably be a big part of toys. “And what we’re trying to do at Anki is just to stay ahead of that curve and be the leader for the next version of each intelligent toy. Overdrive [Anki’s latest product before Cozmo] kind of instigated a lot of other companies wanting to do A.I. cars, and it’s just going to keep going.”
Whether or not you agree, the signs for a huge public interest are there, given Cozmo’s success within the first couple of months. With Augmented Reality applications simultaneously on the rise, the possibilities for future A.I.-driven toys seem almost limitless. And if other toy companies care about their sound design as much as Anki does, the future of toys may well be the future for many sound designers and composers as well.