#437579 Disney Research Makes Robotic Gaze ...
While it’s not totally clear to what extent human-like robots are better than conventional robots for most applications, one area I’m personally comfortable with them is entertainment. The folks over at Disney Research, who are all about entertainment, have been working on this sort of thing for a very long time, and some of their animatronic attractions are actually quite impressive.
The next step for Disney is to make its animatronic figures, which currently feature scripted behaviors, to perform in an interactive manner with visitors. The challenge is that this is where you start to get into potential Uncanny Valley territory, which is what happens when you try to create “the illusion of life,” which is what Disney (they explicitly say) is trying to do.
In a paper presented at IROS this month, a team from Disney Research, Caltech, University of Illinois at Urbana-Champaign, and Walt Disney Imagineering is trying to nail that illusion of life with a single, and perhaps most important, social cue: eye gaze.
Before you watch this video, keep in mind that you’re watching a specific character, as Disney describes:
What, exactly, does “lifelike” mean in the context of robotic gaze? The paper abstract describes the goal as “[seeking] to create an interaction which demonstrates the illusion of life.” I suppose you could think of it like a sort of old-fashioned Turing test focused on gaze: If the gaze of this robot cannot be distinguished from the gaze of a human, then victory, that’s lifelike. And critically, we’re talking about mutual gaze here—not just a robot gazing off into the distance, but you looking deep into the eyes of this robot and it looking right back at you just like a human would. Or, just like some humans would.
The approach that Disney is using is more animation-y than biology-y or psychology-y. In other words, they’re not trying to figure out what’s going on in our brains to make our eyes move the way that they do when we’re looking at other people and basing their control system on that, but instead, Disney just wants it to look right. This “visual appeal” approach is totally fine, and there’s been an enormous amount of human-robot interaction (HRI) research behind it already, albeit usually with less explicitly human-like platforms. And speaking of human-like platforms, the hardware is a “custom Walt Disney Imagineering Audio-Animatronics bust,” which has DoFs that include neck, eyes, eyelids, and eyebrows.
In order to decide on gaze motions, the system first identifies a person to target with its attention using an RGB-D camera. If more than one person is visible, the system calculates a curiosity score for each, currently simplified to be based on how much motion it sees. Depending on which person that the robot can see has the highest curiosity score, the system will choose from a variety of high level gaze behavior states, including:
Running underneath these higher level behavior states are lower level motion behaviors like breathing, small head movements, eye blinking, and saccades (the quick eye movements that occur when people, or robots, look between two different focal points). The term for this hierarchical behavioral state layering is a subsumption architecture, which goes all the way back to Rodney Brooks’ work on robots like Genghis in the 1980s and Cog and Kismet in the ’90s, and it provides a way for more complex behaviors to emerge from a set of simple, decentralized low-level behaviors.
“25 years on Disney is using my subsumption architecture for humanoid eye control, better and smoother now than our 1995 implementations on Cog and Kismet.”
Brooks, an emeritus professor at MIT and, most recently, cofounder and CTO of Robust.ai, tweeted about the Disney project, saying: “People underestimate how long it takes to get from academic paper to real world robotics. 25 years on Disney is using my subsumption architecture for humanoid eye control, better and smoother now than our 1995 implementations on Cog and Kismet.”
From the paper:
The result, as the video shows, appears to be quite good, although it’s hard to tell how it would all come together if the robot had more of, you know, a face. But it seems like you don’t necessarily need to have a lifelike humanoid robot to take advantage of this architecture in an HRI context—any robot that wants to make a gaze-based connection with a human could benefit from doing it in a more human-like way.
“Realistic and Interactive Robot Gaze,” by Matthew K.X.J. Pan, Sungjoon Choi, James Kennedy, Kyna McIntosh, Daniel Campos Zamora, Gunter Niemeyer, Joohyung Kim, Alexis Wieland, and David Christensen from Disney Research, California Institute of Technology, University of Illinois at Urbana-Champaign, and Walt Disney Imagineering, was presented at IROS 2020. You can find the full paper, along with a 13-minute video presentation, on the IROS on-demand conference website.
This entry was posted in Human Robots and tagged 2020, ai, animation, animatronic, animatronics, appears, applications, area, attention, back, based, before, better, book, both, california, camera, can, challenge, character, come, conference, control, create, custom, david, deep, different, disney, elderly, entertainment, eye, eyes, face, fight, figure, first, Flight, framework, friendly, goal, going, good, hard, head, hearing, here, high, HRI, human, Human Behavior, human-like, human-robot, human-robot interaction, humanoid, humanoid robot, humans, ieee, institute, interaction, interactive, interest, iros, keep, less, level, library, life, lifelike, look, looking, makes, man, mean, mind, mit, mobile, motion, movements, need, Off, old, order, park, people, professor, project, psychology, reading, real, realistic, research, response, robot, robotic, robotics, robots, running, say, score, see, shows, simple, small, social, Space, speaking, state, states, system, TAKE, talking, Team, technology, tell, term, test, the next step, think, time, times, totally, Turing, Turing Test, uncanny, Uncanny Valley, university, valley, video, Visual, way, website, work, world, Would, years. Bookmark the permalink.