Case Study: Esme & Roy by Xandra and Sesame Workshop

“Thank you guys for all of the work you do. It’s really, really valuable to know we can rely on someone for the piece of the puzzle that we don’t have in house. It’s great to have confidence in a partner who does it so well.”

-Danielle Frimer, Head of Conversation Design, Xandra


Xandra is one of the premier conversational design studios working on voice applications today. Specializing in narrative design, character development, and audio experiences, the Xandra team has played a major part in the design of some of the most substantial immersive interactive voice experiences available today - including the Clio Award winning “Westword: The Maze”, and the Emmy-nominated “Esme & Roy”, both Alexa skills from HBO.

The Esme & Roy skill launched in October 2018, and was HBO's first skill designed for kids. It’s an adventure game aimed at kids ages three and over, and it offers a spectrum of adventures with Esme, a “monster-sitter,” and her friend Roy, a giant yellow monster. Kids can accompany Esme and Roy underwater, to the circus, the sky, the jungle, or even outer space!

The development team, which included the Sesame Workshop, 360i, and Xandra, all working with HBO, was given the challenge of building an engaging, educational kid skill that encouraged imagination and instilled a sense of accomplishment.

Designing Skills for Kids

Kid skills pose their own particular set of challenges. To be successful, they must be precisely tailored to the appropriate age (what works well for a 3 year old probably won’t work well for a 9 year old), and they must maintain attention. Maintaining attention in young children typically requires interaction, but the ways children will interact, and the responses they will give, are much less predictable than with adults. With kids, you really need to expect the unexpected.

To create a delightful experience that maintained attention,  Xandra's Head of Audio Experience, Kevin Dusablon, worked with the Sesame Workshop team to create a rich original soundscape for the skill. On top of this rich soundscape, the team incorporated repetition, paused and paced interaction, and expanded language/utterances. These design decisions handled fallbacks, errors, and misinterpretations elegantly, which is absolutely critical when designing for young children. For example, if the skill didn’t understand a request, the responses addressed this in a fun and humorous manner while placing responsibility for the misunderstanding on the skill and not the child:

Sorry, I didn’t catch that!  I accidentally turned on my helmet hairdryer. Say where we should explore in space!  The moon or the planet?

Sorry, I missed that!  I was distracted by Dumpling rolling around in circles.  Remember if you need me to say something again, you can always say “repeat.” Should we go to the circus or to the jungle?

A well designed kid skill must incorporate all the standard strategies for handling unexpected inputs, while also being able to “fail forward”. What this means is that, occasionally, kids might just not understand the options they’re presented. Also, frequently children don’t pronounce words as precisely as adults, and therefore misunderstandings are much more frequent. When issues like these arise, it’s important to fail gracefully while still advancing the story. So, if a response to a choice doesn’t make sense, it’s OK to pick one option and move on as if a sensible answer had been given. For example:

Skill: You got it!  Let’s go to the ocean! Let’s pretend we’re scuba divers and go diving in the ocean.  First, we need to put on our scuba suits.  I’m imagining that mine is purple!  I’m imagining that mine has a pocket for meatballs!  Okay, scuba diver.  What color is your scuba suit?

Tester response:   pink (Alexa heard: “hey munk”)

Skill fail forward: That’s an interesting sounding scuba suit. Now let’s put on our flippers to help us swim. And most important, our scuba mask to help us see and breathe. How does this thing...oh.  Roy, you put yours on backwards...there you go!  Great, now let’s go diving!  Wow, the water is so clear and calm… I can see lots of plants and underwater animals around me. Oh look, I see a fish!  Okay, scuba diver. What animal do you see underwater?

What a kid skill should not do is refuse to move on without an expected response or, even worse, kick the user out of the game and into core Alexa. This would be so frustrating for a kid!

Testing Esme and Roy with Pulse Labs

“It was very helpful to get the misinterpretations and transcripts of the tests. This enabled us to enhance the interaction model generally, as well as to spot a couple of recognition bugs that we had previously missed (for example: "three two one blastoff"). [Pulse Labs] also helped us/Amazon identify an issue where single words were being treated as no input and dropped on the floor. Finally, by getting a complete list of the different ASR hypotheses for the invocation name, we were able to resolve that issue of the skill opening in no time flat with Amazon!”

-Danielle Frimer, Head of Conversation Design, Xandra

How Pulse Labs worked with Xandra

Pulse Labs worked closely with Xandra, conducting usability testing on the Esme & Roy skill before it launched. The Pulse Labs platform focused on three major areas: utterance and dialogue verification, functional testing, and exploratory engagement feedback. Testing revealed that the biggest issue was difficulty with the invocation name and consequent failure to launch the skill. There were numerous instances of users being misunderstood when trying to launch the skill, even after multiple attempts, and some testers even gave up. Some examples of these failed invocations include:

Alexa heard: open x-men the roy

Alexa heard: open disney and rory

Alexa heard: open is me a ride

Failure to launch due to a misunderstood invocation is definitely an issue that developers want to avoid. Launch should be as seamless as possible.  Can you imagine how frustrating this would be to a kid?  

Pulse Labs testers also tested the skill for bugs, errors, and misinterpretations. These tests surfaced a number of recognition bugs, as well as words/utterances that were being dropped by Alexa.

The Results

This focus on great skill design has paid off, and the Esme & Roy skill has been nominated for the Outstanding Interactive Media for a Daytime Program 2019 Emmy Award! Pulse Labs will be cheering for them when the winners are announced on May 5th, 2019.


To get the same level and quality real user feedback that helped the Esme and Roy skill get its Emmy nomination for your own voice application, you should create a Pulse Labs account today!