Tag Archives: hearing

#437579 Disney Research Makes Robotic Gaze ...

While it’s not totally clear to what extent human-like robots are better than conventional robots for most applications, one area I’m personally comfortable with them is entertainment. The folks over at Disney Research, who are all about entertainment, have been working on this sort of thing for a very long time, and some of their animatronic attractions are actually quite impressive.

The next step for Disney is to make its animatronic figures, which currently feature scripted behaviors, to perform in an interactive manner with visitors. The challenge is that this is where you start to get into potential Uncanny Valley territory, which is what happens when you try to create “the illusion of life,” which is what Disney (they explicitly say) is trying to do.

In a paper presented at IROS this month, a team from Disney Research, Caltech, University of Illinois at Urbana-Champaign, and Walt Disney Imagineering is trying to nail that illusion of life with a single, and perhaps most important, social cue: eye gaze.

Before you watch this video, keep in mind that you’re watching a specific character, as Disney describes:

The robot character plays an elderly man reading a book, perhaps in a library or on a park bench. He has difficulty hearing and his eyesight is in decline. Even so, he is constantly distracted from reading by people passing by or coming up to greet him. Most times, he glances at people moving quickly in the distance, but as people encroach into his personal space, he will stare with disapproval for the interruption, or provide those that are familiar to him with friendly acknowledgment.

What, exactly, does “lifelike” mean in the context of robotic gaze? The paper abstract describes the goal as “[seeking] to create an interaction which demonstrates the illusion of life.” I suppose you could think of it like a sort of old-fashioned Turing test focused on gaze: If the gaze of this robot cannot be distinguished from the gaze of a human, then victory, that’s lifelike. And critically, we’re talking about mutual gaze here—not just a robot gazing off into the distance, but you looking deep into the eyes of this robot and it looking right back at you just like a human would. Or, just like some humans would.

The approach that Disney is using is more animation-y than biology-y or psychology-y. In other words, they’re not trying to figure out what’s going on in our brains to make our eyes move the way that they do when we’re looking at other people and basing their control system on that, but instead, Disney just wants it to look right. This “visual appeal” approach is totally fine, and there’s been an enormous amount of human-robot interaction (HRI) research behind it already, albeit usually with less explicitly human-like platforms. And speaking of human-like platforms, the hardware is a “custom Walt Disney Imagineering Audio-Animatronics bust,” which has DoFs that include neck, eyes, eyelids, and eyebrows.

In order to decide on gaze motions, the system first identifies a person to target with its attention using an RGB-D camera. If more than one person is visible, the system calculates a curiosity score for each, currently simplified to be based on how much motion it sees. Depending on which person that the robot can see has the highest curiosity score, the system will choose from a variety of high level gaze behavior states, including:

Read: The Read state can be considered the “default” state of the character. When not executing another state, the robot character will return to the Read state. Here, the character will appear to read a book located at torso level.

Glance: A transition to the Glance state from the Read or Engage states occurs when the attention engine indicates that there is a stimuli with a curiosity score […] above a certain threshold.

Engage: The Engage state occurs when the attention engine indicates that there is a stimuli […] to meet a threshold and can be triggered from both Read and Glance states. This state causes the robot to gaze at the person-of-interest with both the eyes and head.

Acknowledge: The Acknowledge state is triggered from either Engage or Glance states when the person-of-interest is deemed to be familiar to the robot.

Running underneath these higher level behavior states are lower level motion behaviors like breathing, small head movements, eye blinking, and saccades (the quick eye movements that occur when people, or robots, look between two different focal points). The term for this hierarchical behavioral state layering is a subsumption architecture, which goes all the way back to Rodney Brooks’ work on robots like Genghis in the 1980s and Cog and Kismet in the ’90s, and it provides a way for more complex behaviors to emerge from a set of simple, decentralized low-level behaviors.

“25 years on Disney is using my subsumption architecture for humanoid eye control, better and smoother now than our 1995 implementations on Cog and Kismet.”
—Rodney Brooks, MIT emeritus professor

Brooks, an emeritus professor at MIT and, most recently, cofounder and CTO of Robust.ai, tweeted about the Disney project, saying: “People underestimate how long it takes to get from academic paper to real world robotics. 25 years on Disney is using my subsumption architecture for humanoid eye control, better and smoother now than our 1995 implementations on Cog and Kismet.”

From the paper:

Although originally intended for control of mobile robots, we find that the subsumption architecture, as presented in [17], lends itself as a framework for organizing animatronic behaviors. This is due to the analogous use of subsumption in human behavior: human psychomotor behavior can be intuitively modeled as layered behaviors with incoming sensory inputs, where higher behavioral levels are able to subsume lower behaviors. At the lowest level, we have involuntary movements such as heartbeats, breathing and blinking. However, higher behavioral responses can take over and control lower level behaviors, e.g., fight-or-flight response can induce faster heart rate and breathing. As our robot character is modeled after human morphology, mimicking biological behaviors through the use of a bottom-up approach is straightforward.

The result, as the video shows, appears to be quite good, although it’s hard to tell how it would all come together if the robot had more of, you know, a face. But it seems like you don’t necessarily need to have a lifelike humanoid robot to take advantage of this architecture in an HRI context—any robot that wants to make a gaze-based connection with a human could benefit from doing it in a more human-like way.

“Realistic and Interactive Robot Gaze,” by Matthew K.X.J. Pan, Sungjoon Choi, James Kennedy, Kyna McIntosh, Daniel Campos Zamora, Gunter Niemeyer, Joohyung Kim, Alexis Wieland, and David Christensen from Disney Research, California Institute of Technology, University of Illinois at Urbana-Champaign, and Walt Disney Imagineering, was presented at IROS 2020. You can find the full paper, along with a 13-minute video presentation, on the IROS on-demand conference website.

< Back to IEEE Journal Watch Continue reading

Posted in Human Robots

#437491 3.2 Billion Images and 720,000 Hours of ...

Twitter over the weekend “tagged” as manipulated a video showing US Democratic presidential candidate Joe Biden supposedly forgetting which state he’s in while addressing a crowd.

Biden’s “hello Minnesota” greeting contrasted with prominent signage reading “Tampa, Florida” and “Text FL to 30330.”

The Associated Press’s fact check confirmed the signs were added digitally and the original footage was indeed from a Minnesota rally. But by the time the misleading video was removed it already had more than one million views, The Guardian reports.

A FALSE video claiming Biden forgot what state he was in was viewed more than 1 million times on Twitter in the past 24 hours

In the video, Biden says “Hello, Minnesota.”

The event did indeed happen in MN — signs on stage read MN

But false video edited signs to read Florida pic.twitter.com/LdHQVaky8v

— Donie O'Sullivan (@donie) November 1, 2020

If you use social media, the chances are you see (and forward) some of the more than 3.2 billion images and 720,000 hours of video shared daily. When faced with such a glut of content, how can we know what’s real and what’s not?

While one part of the solution is an increased use of content verification tools, it’s equally important we all boost our digital media literacy. Ultimately, one of the best lines of defense—and the only one you can control—is you.

Seeing Shouldn’t Always Be Believing
Misinformation (when you accidentally share false content) and disinformation (when you intentionally share it) in any medium can erode trust in civil institutions such as news organizations, coalitions and social movements. However, fake photos and videos are often the most potent.

For those with a vested political interest, creating, sharing and/or editing false images can distract, confuse and manipulate viewers to sow discord and uncertainty (especially in already polarized environments). Posters and platforms can also make money from the sharing of fake, sensationalist content.

Only 11-25 percent of journalists globally use social media content verification tools, according to the International Centre for Journalists.

Could You Spot a Doctored Image?
Consider this photo of Martin Luther King Jr.

Dr. Martin Luther King Jr. Giving the middle finger #DopeHistoricPics pic.twitter.com/5W38DRaLHr

— Dope Historic Pics (@dopehistoricpic) December 20, 2013

This altered image clones part of the background over King Jr’s finger, so it looks like he’s flipping off the camera. It has been shared as genuine on Twitter, Reddit, and white supremacist websites.

In the original 1964 photo, King flashed the “V for victory” sign after learning the US Senate had passed the civil rights bill.

“Those who love peace must learn to organize as effectively as those who love war.”
Dr. Martin Luther King Jr.

This photo was taken on June 19th, 1964, showing Dr King giving a peace sign after hearing that the civil rights bill had passed the senate. @snopes pic.twitter.com/LXHmwMYZS5

— Willie's Reserve (@WilliesReserve) January 21, 2019

Beyond adding or removing elements, there’s a whole category of photo manipulation in which images are fused together.

Earlier this year, a photo of an armed man was photoshopped by Fox News, which overlaid the man onto other scenes without disclosing the edits, the Seattle Times reported.

You mean this guy who’s been photoshopped into three separate photos released by Fox News? pic.twitter.com/fAXpIKu77a

— Zander Yates ザンダーイェーツ (@ZanderYates) June 13, 2020

Similarly, the image below was shared thousands of times on social media in January, during Australia’s Black Summer bushfires. The AFP’s fact check confirmed it is not authentic and is actually a combination of several separate photos.

Image is more powerful than screams of Greta. A silent girl is holding a koala. She looks straight at you from the waters of the ocean where they found a refuge. She is wearing a breathing mask. A wall of fire is behind them. I do not know the name of the photographer #Australia pic.twitter.com/CrTX3lltdh

— EVC Music (@EVCMusicUK) January 6, 2020

Fully and Partially Synthetic Content
Online, you’ll also find sophisticated “deepfake” videos showing (usually famous) people saying or doing things they never did. Less advanced versions can be created using apps such as Zao and Reface.

Or, if you don’t want to use your photo for a profile picture, you can default to one of several websites offering hundreds of thousands of AI-generated, photorealistic images of people.

These people don’t exist, they’re just images generated by artificial intelligence. Generated Photos, CC BY

Editing Pixel Values and the (not so) Simple Crop
Cropping can greatly alter the context of a photo, too.

We saw this in 2017, when a US government employee edited official pictures of Donald Trump’s inauguration to make the crowd appear bigger, according to The Guardian. The staffer cropped out the empty space “where the crowd ended” for a set of pictures for Trump.

Views of the crowds at the inaugurations of former US President Barack Obama in 2009 (left) and President Donald Trump in 2017 (right). AP

But what about edits that only alter pixel values such as color, saturation, or contrast?

One historical example illustrates the consequences of this. In 1994, Time magazine’s cover of OJ Simpson considerably “darkened” Simpson in his police mugshot. This added fuel to a case already plagued by racial tension, to which the magazine responded, “No racial implication was intended, by Time or by the artist.”

Tools for Debunking Digital Fakery
For those of us who don’t want to be duped by visual mis/disinformation, there are tools available—although each comes with its own limitations (something we discuss in our recent paper).

Invisible digital watermarking has been proposed as a solution. However, it isn’t widespread and requires buy-in from both content publishers and distributors.

Reverse image search (such as Google’s) is often free and can be helpful for identifying earlier, potentially more authentic copies of images online. That said, it’s not foolproof because it:

Relies on unedited copies of the media already being online.
Doesn’t search the entire web.
Doesn’t always allow filtering by publication time. Some reverse image search services such as TinEye support this function, but Google’s doesn’t.
Returns only exact matches or near-matches, so it’s not thorough. For instance, editing an image and then flipping its orientation can fool Google into thinking it’s an entirely different one.

Most Reliable Tools Are Sophisticated
Meanwhile, manual forensic detection methods for visual mis/disinformation focus mostly on edits visible to the naked eye, or rely on examining features that aren’t included in every image (such as shadows). They’re also time-consuming, expensive, and need specialized expertise.

Still, you can access work in this field by visiting sites such as Snopes.com—which has a growing repository of “fauxtography.”

Computer vision and machine learning also offer relatively advanced detection capabilities for images and videos. But they too require technical expertise to operate and understand.

Moreover, improving them involves using large volumes of “training data,” but the image repositories used for this usually don’t contain the real-world images seen in the news.

If you use an image verification tool such as the REVEAL project’s image verification assistant, you might need an expert to help interpret the results.

The good news, however, is that before turning to any of the above tools, there are some simple questions you can ask yourself to potentially figure out whether a photo or video on social media is fake. Think:

Was it originally made for social media?
How widely and for how long was it circulated?
What responses did it receive?
Who were the intended audiences?

Quite often, the logical conclusions drawn from the answers will be enough to weed out inauthentic visuals. You can access the full list of questions, put together by Manchester Metropolitan University experts, here.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Image Credit: Simon Steinberger from Pixabay Continue reading

Posted in Human Robots

#437471 How Giving Robots a Hybrid, Human-Like ...

Squeezing a lot of computing power into robots without using up too much space or energy is a constant battle for their designers. But a new approach that mimics the structure of the human brain could provide a workaround.

The capabilities of most of today’s mobile robots are fairly rudimentary, but giving them the smarts to do their jobs is still a serious challenge. Controlling a body in a dynamic environment takes a surprising amount of processing power, which requires both real estate for chips and considerable amounts of energy to power them.

As robots get more complex and capable, those demands are only going to increase. Today’s most powerful AI systems run in massive data centers across far more chips than can realistically fit inside a machine on the move. And the slow death of Moore’s Law suggests we can’t rely on conventional processors getting significantly more efficient or compact anytime soon.

That prompted a team from the University of Southern California to resurrect an idea from more than 40 years ago: mimicking the human brain’s division of labor between two complimentary structures. While the cerebrum is responsible for higher cognitive functions like vision, hearing, and thinking, the cerebellum integrates sensory data and governs movement, balance, and posture.

When the idea was first proposed the technology didn’t exist to make it a reality, but in a paper recently published in Science Robotics, the researchers describe a hybrid system that combines analog circuits that control motion and digital circuits that govern perception and decision-making in an inverted pendulum robot.

“Through this cooperation of the cerebrum and the cerebellum, the robot can conduct multiple tasks simultaneously with a much shorter latency and lower power consumption,” write the researchers.

The type of robot the researchers were experimenting with looks essentially like a pole balancing on a pair of wheels. They have a broad range of applications, from hoverboards to warehouse logistics—Boston Dynamics’ recently-unveiled Handle robot operates on the same principles. Keeping them stable is notoriously tough, but the new approach managed to significantly improve all digital control approaches by radically improving the speed and efficiency of computations.

Key to bringing the idea alive was the recent emergence of memristors—electrical components whose resistance relies on previous input, which allows them to combine computing and memory in one place in a way similar to how biological neurons operate.

The researchers used memristors to build an analog circuit that runs an algorithm responsible for integrating data from the robot’s accelerometer and gyroscope, which is crucial for detecting the angle and velocity of its body, and another that controls its motion. One key advantage of this setup is that the signals from the sensors are analog, so it does away with the need for extra circuitry to convert them into digital signals, saving both space and power.

More importantly, though, the analog system is an order of magnitude faster and more energy-efficient than a standard all-digital system, the authors report. This not only lets them slash the power requirements, but also lets them cut the processing loop from 3,000 microseconds to just 6. That significantly improves the robot’s stability, with it taking just one second to settle into a steady state compared to more than three seconds using the digital-only platform.

At the minute this is just a proof of concept. The robot the researchers have built is small and rudimentary, and the algorithms being run on the analog circuit are fairly basic. But the principle is a promising one, and there is currently a huge amount of R&D going into neuromorphic and memristor-based analog computing hardware.

As often turns out to be the case, it seems like we can’t go too far wrong by mimicking the best model of computation we have found so far: our own brains.

Image Credit: Photos Hobby / Unsplash Continue reading

Posted in Human Robots

#437276 Cars Will Soon Be Able to Sense and ...

Imagine you’re on your daily commute to work, driving along a crowded highway while trying to resist looking at your phone. You’re already a little stressed out because you didn’t sleep well, woke up late, and have an important meeting in a couple hours, but you just don’t feel like your best self.

Suddenly another car cuts you off, coming way too close to your front bumper as it changes lanes. Your already-simmering emotions leap into overdrive, and you lay on the horn and shout curses no one can hear.

Except someone—or, rather, something—can hear: your car. Hearing your angry words, aggressive tone, and raised voice, and seeing your furrowed brow, the onboard computer goes into “soothe” mode, as it’s been programmed to do when it detects that you’re angry. It plays relaxing music at just the right volume, releases a puff of light lavender-scented essential oil, and maybe even says some meditative quotes to calm you down.

What do you think—creepy? Helpful? Awesome? Weird? Would you actually calm down, or get even more angry that a car is telling you what to do?

Scenarios like this (maybe without the lavender oil part) may not be imaginary for much longer, especially if companies working to integrate emotion-reading artificial intelligence into new cars have their way. And it wouldn’t just be a matter of your car soothing you when you’re upset—depending what sort of regulations are enacted, the car’s sensors, camera, and microphone could collect all kinds of data about you and sell it to third parties.

Computers and Feelings
Just as AI systems can be trained to tell the difference between a picture of a dog and one of a cat, they can learn to differentiate between an angry tone of voice or facial expression and a happy one. In fact, there’s a whole branch of machine intelligence devoted to creating systems that can recognize and react to human emotions; it’s called affective computing.

Emotion-reading AIs learn what different emotions look and sound like from large sets of labeled data; “smile = happy,” “tears = sad,” “shouting = angry,” and so on. The most sophisticated systems can likely even pick up on the micro-expressions that flash across our faces before we consciously have a chance to control them, as detailed by Daniel Goleman in his groundbreaking book Emotional Intelligence.

Affective computing company Affectiva, a spinoff from MIT Media Lab, says its algorithms are trained on 5,313,751 face videos (videos of people’s faces as they do an activity, have a conversation, or react to stimuli) representing about 2 billion facial frames. Fascinatingly, Affectiva claims its software can even account for cultural differences in emotional expression (for example, it’s more normalized in Western cultures to be very emotionally expressive, whereas Asian cultures tend to favor stoicism and politeness), as well as gender differences.

But Why?
As reported in Motherboard, companies like Affectiva, Cerence, Xperi, and Eyeris have plans in the works to partner with automakers and install emotion-reading AI systems in new cars. Regulations passed last year in Europe and a bill just introduced this month in the US senate are helping make the idea of “driver monitoring” less weird, mainly by emphasizing the safety benefits of preemptive warning systems for tired or distracted drivers (remember that part in the beginning about sneaking glances at your phone? Yeah, that).

Drowsiness and distraction can’t really be called emotions, though—so why are they being lumped under an umbrella that has a lot of other implications, including what many may consider an eerily Big Brother-esque violation of privacy?

Our emotions, in fact, are among the most private things about us, since we are the only ones who know their true nature. We’ve developed the ability to hide and disguise our emotions, and this can be a useful skill at work, in relationships, and in scenarios that require negotiation or putting on a game face.

And I don’t know about you, but I’ve had more than one good cry in my car. It’s kind of the perfect place for it; private, secluded, soundproof.

Putting systems into cars that can recognize and collect data about our emotions under the guise of preventing accidents due to the state of mind of being distracted or the physical state of being sleepy, then, seems a bit like a bait and switch.

A Highway to Privacy Invasion?
European regulations will help keep driver data from being used for any purpose other than ensuring a safer ride. But the US is lagging behind on the privacy front, with car companies largely free from any enforceable laws that would keep them from using driver data as they please.

Affectiva lists the following as use cases for occupant monitoring in cars: personalizing content recommendations, providing alternate route recommendations, adapting environmental conditions like lighting and heating, and understanding user frustration with virtual assistants and designing those assistants to be emotion-aware so that they’re less frustrating.

Our phones already do the first two (though, granted, we’re not supposed to look at them while we drive—but most cars now let you use bluetooth to display your phone’s content on the dashboard), and the third is simply a matter of reaching a hand out to turn a dial or press a button. The last seems like a solution for a problem that wouldn’t exist without said… solution.

Despite how unnecessary and unsettling it may seem, though, emotion-reading AI isn’t going away, in cars or other products and services where it might provide value.

Besides automotive AI, Affectiva also makes software for clients in the advertising space. With consent, the built-in camera on users’ laptops records them while they watch ads, gauging their emotional response, what kind of marketing is most likely to engage them, and how likely they are to buy a given product. Emotion-recognition tech is also being used or considered for use in mental health applications, call centers, fraud monitoring, and education, among others.

In a 2015 TED talk, Affectiva co-founder Rana El-Kaliouby told her audience that we’re living in a world increasingly devoid of emotion, and her goal was to bring emotions back into our digital experiences. Soon they’ll be in our cars, too; whether the benefits will outweigh the costs remains to be seen.

Image Credit: Free-Photos from Pixabay Continue reading

Posted in Human Robots

#437157 A Human-Centric World of Work: Why It ...

Long before coronavirus appeared and shattered our pre-existing “normal,” the future of work was a widely discussed and debated topic. We’ve watched automation slowly but surely expand its capabilities and take over more jobs, and we’ve wondered what artificial intelligence will eventually be capable of.

The pandemic swiftly turned the working world on its head, putting millions of people out of a job and forcing millions more to work remotely. But essential questions remain largely unchanged: we still want to make sure we’re not replaced, we want to add value, and we want an equitable society where different types of work are valued fairly.

To address these issues—as well as how the pandemic has impacted them—this week Singularity University held a digital summit on the future of work. Forty-three speakers from multiple backgrounds, countries, and sectors of the economy shared their expertise on everything from work in developing markets to why we shouldn’t want to go back to the old normal.

Gary Bolles, SU’s chair for the Future of Work, kicked off the discussion with his thoughts on a future of work that’s human-centric, including why it matters and how to build it.

What Is Work?
“Work” seems like a straightforward concept to define, but since it’s constantly shifting shape over time, let’s make sure we’re on the same page. Bolles defined work, very basically, as human skills applied to problems.

“It doesn’t matter if it’s a dirty floor or a complex market entry strategy or a major challenge in the world,” he said. “We as humans create value by applying our skills to solve problems in the world.” You can think of the problems that need solving as the demand and human skills as the supply, and the two are in constant oscillation, including, every few decades or centuries, a massive shift.

We’re in the midst of one of those shifts right now (and we already were, long before the pandemic). Skills that have long been in demand are declining. The World Economic Forum’s 2018 Future of Jobs report listed things like manual dexterity, management of financial and material resources, and quality control and safety awareness as declining skills. Meanwhile, skills the next generation will need include analytical thinking and innovation, emotional intelligence, creativity, and systems analysis.

Along Came a Pandemic
With the outbreak of coronavirus and its spread around the world, the demand side of work shrunk; all the problems that needed solving gave way to the much bigger, more immediate problem of keeping people alive. But as a result, tens of millions of people around the world are out of work—and those are just the ones that are being counted, and they’re a fraction of the true total. There are additional millions in seasonal or gig jobs or who work in informal economies now without work, too.

“This is our opportunity to focus,” Bolles said. “How do we help people re-engage with work? And make it better work, a better economy, and a better set of design heuristics for a world that we all want?”

Bolles posed five key questions—some spurred by impact of the pandemic—on which future of work conversations should focus to make sure it’s a human-centric future.

1. What does an inclusive world of work look like? Rather than seeing our current systems of work as immutable, we need to actually understand those systems and how we want to change them.

2. How can we increase the value of human work? We know that robots and software are going to be fine in the future—but for humans to be fine, we need to design for that very intentionally.

3. How can entrepreneurship help create a better world of work? In many economies the new value that’s created often comes from younger companies; how do we nurture entrepreneurship?

4. What will the intersection of workplace and geography look like? A large percentage of the global workforce is now working from home; what could some of the outcomes of that be? How does gig work fit in?

5. How can we ensure a healthy evolution of work and life? The health and the protection of those at risk is why we shut down our economies, but we need to find a balance that allows people to work while keeping them safe.

Problem-Solving Doesn’t End
The end result these questions are driving towards, and our overarching goal, is maximizing human potential. “If we come up with ways we can continue to do that, we’ll have a much more beneficial future of work,” Bolles said. “We should all be talking about where we can have an impact.”

One small silver lining? We had plenty of problems to solve in the world before ever hearing about coronavirus, and now we have even more. Is the pace of automation accelerating due to the virus? Yes. Are companies finding more ways to automate their processes in order to keep people from getting sick? They are.

But we have a slew of new problems on our hands, and we’re not going to stop needing human skills to solve them (not to mention the new problems that will surely emerge as second- and third-order effects of the shutdowns). If Bolles’ definition of work holds up, we’ve got ours cut out for us.

In an article from April titled The Great Reset, Bolles outlined three phases of the unemployment slump (we’re currently still in the first phase) and what we should be doing to minimize the damage. “The evolution of work is not about what will happen 10 to 20 years from now,” he said. “It’s about what we could be doing differently today.”

Watch Bolles’ talk and those of dozens of other experts for more insights into building a human-centric future of work here.

Image Credit: www_slon_pics from Pixabay Continue reading

Posted in Human Robots