Tag Archives: figure

#437769 Q&A: Facebook’s CTO Is at War With ...

Photo: Patricia de Melo Moreira/AFP/Getty Images

Facebook chief technology officer Mike Schroepfer leads the company’s AI and integrity efforts.

Facebook’s challenge is huge. Billions of pieces of content—short and long posts, images, and combinations of the two—are uploaded to the site daily from around the world. And any tiny piece of that—any phrase, image, or video—could contain so-called bad content.

In its early days, Facebook relied on simple computer filters to identify potentially problematic posts by their words, such as those containing profanity. These automatically filtered posts, as well as posts flagged by users as offensive, went to humans for adjudication.

In 2015, Facebook started using artificial intelligence to cull images that contained nudity, illegal goods, and other prohibited content; those images identified as possibly problematic were sent to humans for further review.

By 2016, more offensive photos were reported by Facebook’s AI systems than by Facebook users (and that is still the case).

In 2018, Facebook CEO Mark Zuckerberg made a bold proclamation: He predicted that within five or ten years, Facebook’s AI would not only look for profanity, nudity, and other obvious violations of Facebook’s policies. The tools would also be able to spot bullying, hate speech, and other misuse of the platform, and put an immediate end to them.

Today, automated systems using algorithms developed with AI scan every piece of content between the time when a user completes a post and when it is visible to others on the site—just fractions of a second. In most cases, a violation of Facebook’s standards is clear, and the AI system automatically blocks the post. In other cases, the post goes to human reviewers for a final decision, a workforce that includes 15,000 content reviewers and another 20,000 employees focused on safety and security, operating out of more than 20 facilities around the world.

In the first quarter of this year, Facebook removed or took other action (like appending a warning label) on more than 9.6 million posts involving hate speech, 8.6 million involving child nudity or exploitation, almost 8 million posts involving the sale of drugs, 2.3 million posts involving bullying and harassment, and tens of millions of posts violating other Facebook rules.

Right now, Facebook has more than 1,000 engineers working on further developing and implementing what the company calls “integrity” tools. Using these systems to screen every post that goes up on Facebook, and doing so in milliseconds, is sucking up computing resources. Facebook chief technology officer Mike Schroepfer, who is heading up Facebook’s AI and integrity efforts, spoke with IEEE Spectrum about the team’s progress on building an AI system that detects bad content.

Since that discussion, Facebook’s policies around hate speech have come under increasing scrutiny, with particular attention on divisive posts by political figures. A group of major advertisers in June announced that they would stop advertising on the platform while reviewing the situation, and civil rights groups are putting pressure on others to follow suit until Facebook makes policy changes related to hate speech and groups that promote hate, misinformation, and conspiracies.

Facebook CEO Mark Zuckerberg responded with news that Facebook will widen the category of what it considers hateful content in ads. Now the company prohibits claims that people from a specific race, ethnicity, national origin, religious affiliation, caste, sexual orientation, gender identity, or immigration status are a threat to the physical safety, health, or survival of others. The policy change also aims to better protect immigrants, migrants, refugees, and asylum seekers from ads suggesting these groups are inferior or expressing contempt. Finally, Zuckerberg announced that the company will label some problematic posts by politicians and government officials as content that violates Facebook’s policies.

However, civil rights groups say that’s not enough. And an independent audit released in July also said that Facebook needs to go much further in addressing civil rights concerns and disinformation.

Schroepfer indicated that Facebook’s AI systems are designed to quickly adapt to changes in policy. “I don’t expect considerable technical changes are needed to adjust,” he told Spectrum.

This interview has been edited and condensed for clarity.

IEEE Spectrum: What are the stakes of content moderation? Is this an existential threat to Facebook? And is it critical that you deal well with the issue of election interference this year?

Schroepfer: It’s probably existential; it’s certainly massive. We are devoting a tremendous amount of our attention to it.

The idea that anyone could meddle in an election is deeply disturbing and offensive to all of us here, just as people and citizens of democracies. We don’t want to see that happen anywhere, and certainly not on our watch. So whether it’s important to the company or not, it’s important to us as people. And I feel a similar way on the content-moderation side.

There are not a lot of easy choices here. The only way to prevent people, with certainty, from posting bad things is to not let them post anything. We can take away all voice and just say, “Sorry, the Internet’s too dangerous. No one can use it.” That will certainly get rid of all hate speech online. But I don’t want to end up in that world. And there are variants of that world that various governments are trying to implement, where they get to decide what’s true or not, and you as a person don’t. I don’t want to get there either.

My hope is that we can build a set of tools that make it practical for us to do a good enough job, so that everyone is still excited about the idea that anyone can share what they want, and so that Facebook is a safe and reasonable place for people to operate in.

Spectrum: You joined Facebook in 2008, before AI was part of the company’s toolbox. When did that change? When did you begin to think that AI tools would be useful to Facebook?

Schroepfer: Ten years ago, AI wasn’t commercially practical; the technology just didn’t work very well. In 2012, there was one of those moments that a lot of people point to as the beginning of the current revolution in deep learning and AI. A computer-vision model—a neural network—was trained using what we call supervised training, and it turned out to be better than all the existing models.

Spectrum: How is that training done, and how did computer-vision models come to Facebook?

Image: Facebook

Just Broccoli? Facebook’s image analysis algorithms can tell the difference between marijuana [left] and tempura broccoli [right] better than some humans.

Schroepfer: Say I take a bunch of photos and I have people look at them. If they see a photo of a cat, they put a text label that says cat; if it’s one of a dog, the text label says dog. If you build a big enough data set and feed that to the neural net, it learns how to tell the difference between cats and dogs.

Prior to 2012, it didn’t work very well. And then in 2012, there was this moment where it seemed like, “Oh wow, this technique might work.” And a few years later we were deploying that form of technology to help us detect problematic imagery.

Spectrum: Do your AI systems work equally well on all types of prohibited content?

Schroepfer: Nudity was technically easiest. I don’t need to understand language or culture to understand that this is either a naked human or not. Violence is a much more nuanced problem, so it was harder technically to get it right. And with hate speech, not only do you have to understand the language, it may be very contextual, even tied to recent events. A week before the Christchurch shooting [New Zealand, 2019], saying “I wish you were in the mosque” probably doesn’t mean anything. A week after, that might be a terrible thing to say.

Spectrum: How much progress have you made on hate speech?

Schroepfer: AI, in the first quarter of 2020, proactively detected 88.8 percent of the hate-speech content we removed, up from 80.2 percent in the previous quarter. In the first quarter of 2020, we took action on 9.6 million pieces of content for violating our hate-speech policies.

Image: Facebook

Off Label: Sometimes image analysis isn’t enough to determine whether a picture posted violates the company’s policies. In considering these candy-colored vials of marijuana, for example, the algorithms can look at any accompanying text and, if necessary, comments on the post.

Spectrum: It sounds like you’ve expanded beyond tools that analyze images and are also using AI tools that analyze text.

Schroepfer: AI started off as very siloed. People worked on language, people worked on computer vision, people worked on video. We’ve put these things together—in production, not just as research—into multimodal classifiers.

[Schroepfer shows a photo of a pan of Rice Krispies treats, with text referring to it as a “potent batch”] This is a case in which you have an image, and then you have the text on the post. This looks like Rice Krispies. On its own, this image is fine. You put the text together with it in a bigger model; that can then understand what’s going on. That didn’t work five years ago.

Spectrum: Today, every post that goes up on Facebook is immediately checked by automated systems. Can you explain that process?

Image: Facebook

Bigger Picture: Identifying hate speech is often a matter of context. Either the text or the photo in this post isn’t hateful standing alone, but putting them together tells a different story.

Schroepfer: You upload an image and you write some text underneath it, and the systems look at both the image and the text to try to see which, if any, policies it violates. Those decisions are based on our Community Standards. It will also look at other signals on the posts, like the comments people make.

It happens relatively instantly, though there may be times things happen after the fact. Maybe you uploaded a post that had misinformation in it, and at the time you uploaded it, we didn’t know it was misinformation. The next day we fact-check something and scan again; we may find your post and take it down. As we learn new things, we’re going to go back through and look for violations of what we now know to be a problem. Or, as people comment on your post, we might update our understanding of it. If people are saying, “That’s terrible,” or “That’s mean,” or “That looks fake,” those comments may be an interesting signal.

Spectrum: How is Facebook applying its AI tools to the problem of election interference?

Schroepfer: I would split election interference into two categories. There are times when you’re going after the content, and there are times you’re going after the behavior or the authenticity of the person.

On content, if you’re sharing misinformation, saying, “It’s super Wednesday, not super Tuesday, come vote on Wednesday,” that’s a problem whether you’re an American sitting in California or a foreign actor.

Other times, people create a series of Facebook pages pretending they’re Americans, but they’re really a foreign entity. That is a problem on its own, even if all the content they’re sharing completely meets our Community Standards. The problem there is that you have a foreign government running an information operation.

There, you need different tools. What you’re trying to do is put pieces together, to say, “Wait a second. All of these pages—Martians for Justice, Moonlings for Justice, and Venusians for Justice”—are all run by an administrator with an IP address that’s outside the United States. So they’re all connected, even though they’re pretending to not be connected. That’s a very different problem than me sitting in my office in Menlo Park [Calif.] sharing misinformation.

I’m not going to go into lots of technical detail, because this is an area of adversarial nature. The fundamental problem you’re trying to solve is that there’s one entity coordinating the activity of a bunch of things that look like they’re not all one thing. So this is a series of Instagram accounts, or a series of Facebook pages, or a series of WhatsApp accounts, and they’re pretending to be totally different things. We’re looking for signals that these things are related in some way. And we’re looking through the graph [what Facebook calls its map of relationships between users] to understand the properties of this network.

Spectrum: What cutting-edge AI tools and methods have you been working on lately?

Schroepfer: Supervised learning, with humans setting up the instruction process for the AI systems, is amazingly effective. But it has a very obvious flaw: the speed at which you can develop these things is limited by how fast you can curate the data sets. If you’re dealing in a problem domain where things change rapidly, you have to rebuild a new data set and retrain the whole thing.

Self-supervision is inspired by the way people learn, by the way kids explore the world around them. To get computers to do it themselves, we take a bunch of raw data and build a way for the computer to construct its own tests. For language, you scan a bunch of Web pages, and the computer builds a test where it takes a sentence, eliminates one of the words, and figures out how to predict what word belongs there. And because it created the test, it actually knows the answer. I can use as much raw text as I can find and store because it’s processing everything itself and doesn’t require us to sit down and build the information set. In the last two years there has been a revolution in language understanding as a result of AI self-supervised learning.

Spectrum: What else are you excited about?

Schroepfer: What we’ve been working on over the last few years is multilingual understanding. Usually, when I’m trying to figure out, say, whether something is hate speech or not I have to go through the whole process of training the model in every language. I have to do that one time for every language. When you make a post, the first thing we have to figure out is what language your post is in. “Ah, that’s Spanish. So send it to the Spanish hate-speech model.”

We’ve started to build a multilingual model—one box where you can feed in text in 40 different languages and it determines whether it’s hate speech or not. This is way more effective and easier to deploy.

To geek out for a second, just the idea that you can build a model that understands a concept in multiple languages at once is crazy cool. And it not only works for hate speech, it works for a variety of things.

When we started working on this multilingual model years ago, it performed worse than every single individual model. Now, it not only works as well as the English model, but when you get to the languages where you don’t have enough data, it’s so much better. This rapid progress is very exciting.

Spectrum: How do you move new AI tools from your research labs into operational use?

Schroepfer: Engineers trying to make the next breakthrough will often say, “Cool, I’ve got a new thing and it achieved state-of-the-art results on machine translation.” And we say, “Great. How long does it take to run in production?” They say, “Well, it takes 10 seconds for every sentence to run on a CPU.” And we say, “It’ll eat our whole data center if we deploy that.” So we take that state-of-the-art model and we make it 10 or a hundred or a thousand times more efficient, maybe at the cost of a little bit of accuracy. So it’s not as good as the state-of-the-art version, but it’s something we can actually put into our data centers and run in production.

Spectrum: What’s the role of the humans in the loop? Is it true that Facebook currently employs 35,000 moderators?

Schroepfer: Yes. Right now our goal is not to reduce that. Our goal is to do a better job catching bad content. People often think that the end state will be a fully automated system. I don’t see that world coming anytime soon.

As automated systems get more sophisticated, they take more and more of the grunt work away, freeing up the humans to work on the really gnarly stuff where you have to spend an hour researching.

We also use AI to give our human moderators power tools. Say I spot this new meme that is telling everyone to vote on Wednesday rather than Tuesday. I have a tool in front of me that says, “Find variants of that throughout the system. Find every photo with the same text, find every video that mentions this thing and kill it in one shot.” Rather than, I found this one picture, but then a bunch of other people upload that misinformation in different forms.

Another important aspect of AI is that anything I can do to prevent a person from having to look at terrible things is time well spent. Whether it’s a person employed by us as a moderator or a user of our services, looking at these things is a terrible experience. If I can build systems that take the worst of the worst, the really graphic violence, and deal with that in an automated fashion, that’s worth a lot to me. Continue reading

Posted in Human Robots

#437709 iRobot Announces Major Software Update, ...

Since the release of the very first Roomba in 2002, iRobot’s long-term goal has been to deliver cleaner floors in a way that’s effortless and invisible. Which sounds pretty great, right? And arguably, iRobot has managed to do exactly this, with its most recent generation of robot vacuums that make their own maps and empty their own dustbins. For those of us who trust our robots, this is awesome, but iRobot has gradually been realizing that many Roomba users either don’t want this level of autonomy, or aren’t ready for it.

Today, iRobot is announcing a major new update to its app that represents a significant shift of its overall approach to home robot autonomy. Humans are being brought back into the loop through software that tries to learn when, where, and how you clean so that your Roomba can adapt itself to your life rather than the other way around.

To understand why this is such a shift for iRobot, let’s take a very brief look back at how the Roomba interface has evolved over the last couple of decades. The first generation of Roomba had three buttons on it that allowed (or required) the user to select whether the room being vacuumed was small or medium or large in size. iRobot ditched that system one generation later, replacing the room size buttons with one single “clean” button. Programmable scheduling meant that users no longer needed to push any buttons at all, and with Roombas able to find their way back to their docking stations, all you needed to do was empty the dustbin. And with the most recent few generations (the S and i series), the dustbin emptying is also done for you, reducing direct interaction with the robot to once a month or less.

Image: iRobot

iRobot CEO Colin Angle believes that working toward more intelligent human-robot collaboration is “the brave new frontier” of AI. “This whole journey has been earning the right to take this next step, because a robot can’t be responsive if it’s incompetent,” he says. “But thinking that autonomy was the destination was where I was just completely wrong.”

The point that the top-end Roombas are at now reflects a goal that iRobot has been working toward since 2002: With autonomy, scheduling, and the clean base to empty the bin, you can set up your Roomba to vacuum when you’re not home, giving you cleaner floors every single day without you even being aware that the Roomba is hard at work while you’re out. It’s not just hands-off, it’s brain-off. No noise, no fuss, just things being cleaner thanks to the efforts of a robot that does its best to be invisible to you. Personally, I’ve been completely sold on this idea for home robots, and iRobot CEO Colin Angle was as well.

“I probably told you that the perfect Roomba is the Roomba that you never see, you never touch, you just come home everyday and it’s done the right thing,” Angle told us. “But customers don’t want that—they want to be able to control what the robot does. We started to hear this a couple years ago, and it took a while before it sunk in, but it made sense.”

How? Angle compares it to having a human come into your house to clean, but you weren’t allowed to tell them where or when to do their job. Maybe after a while, you’ll build up the amount of trust necessary for that to work, but in the short term, it would likely be frustrating. And people get frustrated with their Roombas for this reason. “The desire to have more control over what the robot does kept coming up, and for me, it required a pretty big shift in my view of what intelligence we were trying to build. Autonomy is not intelligence. We need to do something more.”

That something more, Angle says, is a partnership as opposed to autonomy. It’s an acknowledgement that not everyone has the same level of trust in robots as the people who build them. It’s an understanding that people want to have a feeling of control over their homes, that they have set up the way that they want, and that they’ve been cleaning the way that they want, and a robot shouldn’t just come in and do its own thing.

This change in direction also represents a substantial shift in resources for iRobot, and the company has pivoted two-thirds of its engineering organization to focus on software-based collaborative intelligence rather than hardware.

“Until the robot proves that it knows enough about your home and about the way that you want your home cleaned,” Angle says, “you can’t move forward.” He adds that this is one of those things that seem obvious in retrospect, but even if they’d wanted to address the issue before, they didn’t have the technology to solve the problem. Now they do. “This whole journey has been earning the right to take this next step, because a robot can’t be responsive if it’s incompetent,” Angle says. “But thinking that autonomy was the destination was where I was just completely wrong.”

The previous iteration of the iRobot app (and Roombas themselves) are built around one big fat CLEAN button. The new approach instead tries to figure out in much more detail where the robot should clean, and when, using a mixture of autonomous technology and interaction with the user.

Where to Clean
Knowing where to clean depends on your Roomba having a detailed and accurate map of its environment. For several generations now, Roombas have been using visual mapping and localization (VSLAM) to build persistent maps of your home. These maps have been used to tell the Roomba to clean in specific rooms, but that’s about it. With the new update, Roombas with cameras will be able to recognize some objects and features in your home, including chairs, tables, couches, and even countertops. The robots will use these features to identify where messes tend to happen so that they can focus on those areas—like around the dining room table or along the front of the couch.

We should take a minute here to clarify how the Roomba is using its camera. The original (primary?) purpose of the camera was for VSLAM, where the robot would take photos of your home, downsample them into QR-code-like patterns of light and dark, and then use those (with the assistance of other sensors) to navigate. Now the camera is also being used to take pictures of other stuff around your house to make that map more useful.

Photo: iRobot

The robots will now try to fit into the kinds of cleaning routines that many people already have established. For example, the app may suggest an “after dinner” routine that cleans just around the kitchen and dining room table.

This is done through machine learning using a library of images of common household objects from a floor perspective that iRobot had to develop from scratch. Angle clarified for us that this is all done via a neural net that runs on the robot, and that “no recognizable images are ever stored on the robot or kept, and no images ever leave the robot.” Worst case, if all the data iRobot has about your home gets somehow stolen, the hacker would only know that (for example) your dining room has a table in it and the approximate size and location of that table, because the map iRobot has of your place only stores symbolic representations rather than images.

Another useful new feature is intended to help manage the “evil Roomba places” (as Angle puts it) that every home has that cause Roombas to get stuck. If the place is evil enough that Roomba has to call you for help because it gave up completely, Roomba will now remember, and suggest that either you make some changes or that it stops cleaning there, which seems reasonable.

When to Clean
It turns out that the primary cause of mission failure for Roombas is not that they get stuck or that they run out of battery—it’s user cancellation, usually because the robot is getting in the way or being noisy when you don’t want it to be. “If you kill a Roomba’s job because it annoys you,” points out Angle, “how is that robot being a good partner? I think it’s an epic fail.” Of course, it’s not the robot’s fault, because Roombas only clean when we tell them to, which Angle says is part of the problem. “People actually aren’t very good at making their own schedules—they tend to oversimplify, and not think through what their schedules are actually about, which leads to lots of [figurative] Roomba death.”

To help you figure out when the robot should actually be cleaning, the new app will look for patterns in when you ask the robot to clean, and then recommend a schedule based on those patterns. That might mean the robot cleans different areas at different times every day of the week. The app will also make scheduling recommendations that are event-based as well, integrated with other smart home devices. Would you prefer the Roomba to clean every time you leave the house? The app can integrate with your security system (or garage door, or any number of other things) and take care of that for you.

More generally, Roomba will now try to fit into the kinds of cleaning routines that many people already have established. For example, the app may suggest an “after dinner” routine that cleans just around the kitchen and dining room table. The app will also, to some extent, pay attention to the environment and season. It might suggest increasing your vacuuming frequency if pollen counts are especially high, or if it’s pet shedding season and you have a dog. Unfortunately, Roomba isn’t (yet?) capable of recognizing dogs on its own, so the app has to cheat a little bit by asking you some basic questions.

A Smarter App

Image: iRobot

The previous iteration of the iRobot app (and Roombas themselves) are built around one big fat CLEAN button. The new approach instead tries to figure out in much more detail where the robot should clean, and when, using a mixture of autonomous technology and interaction with the user.

The app update, which should be available starting today, is free. The scheduling and recommendations will work on every Roomba model, although for object recognition and anything related to mapping, you’ll need one of the more recent and fancier models with a camera. Future app updates will happen on a more aggressive schedule. Major app releases should happen every six months, with incremental updates happening even more frequently than that.

Angle also told us that overall, this change in direction also represents a substantial shift in resources for iRobot, and the company has pivoted two-thirds of its engineering organization to focus on software-based collaborative intelligence rather than hardware. “It’s not like we’re done doing hardware,” Angle assured us. “But we do think about hardware differently. We view our robots as platforms that have longer life cycles, and each platform will be able to support multiple generations of software. We’ve kind of decoupled robot intelligence from hardware, and that’s a change.”

Angle believes that working toward more intelligent collaboration between humans and robots is “the brave new frontier of artificial intelligence. I expect it to be the frontier for a reasonable amount of time to come,” he adds. “We have a lot of work to do to create the type of easy-to-use experience that consumer robots need.” Continue reading

Posted in Human Robots

#437635 Toyota Research Demonstrates ...

Over the last several years, Toyota has been putting more muscle into forward-looking robotics research than just about anyone. In addition to the Toyota Research Institute (TRI), there’s that massive 175-acre robot-powered city of the future that Toyota still plans to build next to Mount Fuji. Even Toyota itself acknowledges that it might be crazy, but that’s just how they roll—as TRI CEO Gill Pratt told me a while back, when Toyota decides to do something, they really do go all-in on it.

TRI has been focusing heavily on home robots, which is reflective of the long-term nature of what TRI is trying to do, because home robots are both the place where we’ll need robots the most at the same time as they’re the place where it’s going to be hardest to deploy them. The unpredictable nature of homes, and the fact that homes tend to have squishy fragile people in them, are robot-unfriendly characteristics, but as the population continues to age (an increasingly acute problem in Japan), homes offer an enormous amount of potential for helping us maintain our independence.

Today, Toyota is showing off some of the research that it’s been working on recently, in the form of a virtual reality presentation in lieu of an in-person press event. For journalists, TRI pre-loaded the recording onto a VR headset, which was FedEx’ed to my house. You can watch the entire 40-minute presentation in 360 video on YouTube (or in VR if you have a headset of your own), but if you don’t watch the whole thing, you should at least check out the full-on GLaDOS (with arms) that TRI thinks belongs in your home.

The presentation features an introduction from Gill Pratt, who looks entirely too comfortable embedded inside of one of TRI’s telepresence robots. The event also covers a lot of territory, but the highlight is almost certainly the new hardware that TRI demonstrates.

Soft bubble gripper

Photo: TRI

This is a “soft bubble gripper,” under development at TRI’s Cambridge, Mass., branch. These passively-compliant, air-filled grippers make it easier to grasp many different kinds of objects safely, but the nifty thing is that they’ve got cameras inside of them watching a pattern of dots on the interior of the soft membrane.

When the outside of the bubble makes contact with an object, the bubble deforms, and the deformation of the dot pattern on the inside can be tracked by the camera to determine both directions and magnitudes of forces. This is a concept that we’ve seen elsewhere before, but TRI’s implementation is a clever way of making an inherently safe end effector that can still perform all the sensing you need it to do for relatively complex manipulation tasks.

The bubble gripper was presented at ICRA this year, and you can read the technical paper here.

Ceiling-mounted home robot

Photo: TRI

I don’t know whether robots dangling from the ceiling was somehow sinister pre-Portal, but it sure as heck is for me having played through that game a couple of times, and it’s since been reinforced by AUTO from WALL-E.

The reason that we generally see robots mounted on the floor or on tables or on mobile bases is that we’re bipeds, not bats, and giving a robot access to a human-like workspace is easiest to do if you also give that robot a human-like position and orientation. And if you want to be able to reach stuff high up, you do what TRI did with their previous generation of kitchen manipulator, and just give it the ability to make itself super tall. But TRI is convinced it’s a good place to put our future home robots:

One innovative concept is a “gantry robot” that would descend from an overhead framework to perform tasks such as loading the dishwasher, wiping surfaces, and clearing clutter. By traveling on the ceiling, the robot avoids the problems of navigating household floor clutter and navigating cramped spaces. When not in use, the robot would tuck itself up out of the way. To further investigate this idea, the team has built a laboratory prototype robot that can do all the same tasks as a floor-based mobile robot but with the innovative overhead mobility system.

Another obvious problem with the gantry robot is that you have to install all kinds of stuff in your ceiling for this to work, which makes it very impractical (if not totally impossible) to introduce a system like this into a home that wasn’t built specifically for it. If, however, you do build a home with a robot like this in mind, the animation below from TRI shows how it could be extra useful. Suddenly, stairs are a non-issue. Payload is presumably also a non-issue, since loads can be transferred to the ceiling. Batteries become unnecessary, so the whole robot can be much lighter weight, which in turn makes it safer. Sensors get a fantastic view, and obstacle avoidance becomes trivial.

Robots as “time machines”

Photo: TRI

TRI’s presentation covered more than what we’ve highlighted here—our focus has been on the hardware prototypes, but TRI had more to talk about, including learning through demonstration, scaling learning through simulation, and how TRI has been working with users to figure out what research directions should be explored. It’s all available right now on YouTube, and it’s well worth 40 minutes of your time.

“What we’re really focused on is this principle idea of amplifying, rather than replacing, human beings”
—Gill Pratt, TRI

It’s only been five years since Toyota announced the $1 billion investment that established TRI, and it feels like the progress that’s been made since then has been substantial. It’s not often that vision, resources, and long-term commitment come together like this, and TRI’s emphasis on making life better for people is one of the things that helps to keep us optimistic about the future of robotics.

“What we’re really focused on is this principle idea of amplifying, rather than replacing, human beings,” Gill Pratt told us. “And what it means to amplify a person, particularly as they’re aging—what we’re really trying to do is build a time machine. This may sound fanciful, and of course we can’t build a real time machine, but maybe we can build robotic assistants to make our lives as we age seem as if we are actually using a time machine.” He explains that it doesn’t mean building robots for convenience or to do our jobs for us. “It means building technology that enables us to continue to live and to work and to relate to each other as if we were younger,” he says. “And that’s really what our main goal is.” Continue reading

Posted in Human Robots

#437579 Disney Research Makes Robotic Gaze ...

While it’s not totally clear to what extent human-like robots are better than conventional robots for most applications, one area I’m personally comfortable with them is entertainment. The folks over at Disney Research, who are all about entertainment, have been working on this sort of thing for a very long time, and some of their animatronic attractions are actually quite impressive.

The next step for Disney is to make its animatronic figures, which currently feature scripted behaviors, to perform in an interactive manner with visitors. The challenge is that this is where you start to get into potential Uncanny Valley territory, which is what happens when you try to create “the illusion of life,” which is what Disney (they explicitly say) is trying to do.

In a paper presented at IROS this month, a team from Disney Research, Caltech, University of Illinois at Urbana-Champaign, and Walt Disney Imagineering is trying to nail that illusion of life with a single, and perhaps most important, social cue: eye gaze.

Before you watch this video, keep in mind that you’re watching a specific character, as Disney describes:

The robot character plays an elderly man reading a book, perhaps in a library or on a park bench. He has difficulty hearing and his eyesight is in decline. Even so, he is constantly distracted from reading by people passing by or coming up to greet him. Most times, he glances at people moving quickly in the distance, but as people encroach into his personal space, he will stare with disapproval for the interruption, or provide those that are familiar to him with friendly acknowledgment.

What, exactly, does “lifelike” mean in the context of robotic gaze? The paper abstract describes the goal as “[seeking] to create an interaction which demonstrates the illusion of life.” I suppose you could think of it like a sort of old-fashioned Turing test focused on gaze: If the gaze of this robot cannot be distinguished from the gaze of a human, then victory, that’s lifelike. And critically, we’re talking about mutual gaze here—not just a robot gazing off into the distance, but you looking deep into the eyes of this robot and it looking right back at you just like a human would. Or, just like some humans would.

The approach that Disney is using is more animation-y than biology-y or psychology-y. In other words, they’re not trying to figure out what’s going on in our brains to make our eyes move the way that they do when we’re looking at other people and basing their control system on that, but instead, Disney just wants it to look right. This “visual appeal” approach is totally fine, and there’s been an enormous amount of human-robot interaction (HRI) research behind it already, albeit usually with less explicitly human-like platforms. And speaking of human-like platforms, the hardware is a “custom Walt Disney Imagineering Audio-Animatronics bust,” which has DoFs that include neck, eyes, eyelids, and eyebrows.

In order to decide on gaze motions, the system first identifies a person to target with its attention using an RGB-D camera. If more than one person is visible, the system calculates a curiosity score for each, currently simplified to be based on how much motion it sees. Depending on which person that the robot can see has the highest curiosity score, the system will choose from a variety of high level gaze behavior states, including:

Read: The Read state can be considered the “default” state of the character. When not executing another state, the robot character will return to the Read state. Here, the character will appear to read a book located at torso level.

Glance: A transition to the Glance state from the Read or Engage states occurs when the attention engine indicates that there is a stimuli with a curiosity score […] above a certain threshold.

Engage: The Engage state occurs when the attention engine indicates that there is a stimuli […] to meet a threshold and can be triggered from both Read and Glance states. This state causes the robot to gaze at the person-of-interest with both the eyes and head.

Acknowledge: The Acknowledge state is triggered from either Engage or Glance states when the person-of-interest is deemed to be familiar to the robot.

Running underneath these higher level behavior states are lower level motion behaviors like breathing, small head movements, eye blinking, and saccades (the quick eye movements that occur when people, or robots, look between two different focal points). The term for this hierarchical behavioral state layering is a subsumption architecture, which goes all the way back to Rodney Brooks’ work on robots like Genghis in the 1980s and Cog and Kismet in the ’90s, and it provides a way for more complex behaviors to emerge from a set of simple, decentralized low-level behaviors.

“25 years on Disney is using my subsumption architecture for humanoid eye control, better and smoother now than our 1995 implementations on Cog and Kismet.”
—Rodney Brooks, MIT emeritus professor

Brooks, an emeritus professor at MIT and, most recently, cofounder and CTO of Robust.ai, tweeted about the Disney project, saying: “People underestimate how long it takes to get from academic paper to real world robotics. 25 years on Disney is using my subsumption architecture for humanoid eye control, better and smoother now than our 1995 implementations on Cog and Kismet.”

From the paper:

Although originally intended for control of mobile robots, we find that the subsumption architecture, as presented in [17], lends itself as a framework for organizing animatronic behaviors. This is due to the analogous use of subsumption in human behavior: human psychomotor behavior can be intuitively modeled as layered behaviors with incoming sensory inputs, where higher behavioral levels are able to subsume lower behaviors. At the lowest level, we have involuntary movements such as heartbeats, breathing and blinking. However, higher behavioral responses can take over and control lower level behaviors, e.g., fight-or-flight response can induce faster heart rate and breathing. As our robot character is modeled after human morphology, mimicking biological behaviors through the use of a bottom-up approach is straightforward.

The result, as the video shows, appears to be quite good, although it’s hard to tell how it would all come together if the robot had more of, you know, a face. But it seems like you don’t necessarily need to have a lifelike humanoid robot to take advantage of this architecture in an HRI context—any robot that wants to make a gaze-based connection with a human could benefit from doing it in a more human-like way.

“Realistic and Interactive Robot Gaze,” by Matthew K.X.J. Pan, Sungjoon Choi, James Kennedy, Kyna McIntosh, Daniel Campos Zamora, Gunter Niemeyer, Joohyung Kim, Alexis Wieland, and David Christensen from Disney Research, California Institute of Technology, University of Illinois at Urbana-Champaign, and Walt Disney Imagineering, was presented at IROS 2020. You can find the full paper, along with a 13-minute video presentation, on the IROS on-demand conference website.

< Back to IEEE Journal Watch Continue reading

Posted in Human Robots

#437491 3.2 Billion Images and 720,000 Hours of ...

Twitter over the weekend “tagged” as manipulated a video showing US Democratic presidential candidate Joe Biden supposedly forgetting which state he’s in while addressing a crowd.

Biden’s “hello Minnesota” greeting contrasted with prominent signage reading “Tampa, Florida” and “Text FL to 30330.”

The Associated Press’s fact check confirmed the signs were added digitally and the original footage was indeed from a Minnesota rally. But by the time the misleading video was removed it already had more than one million views, The Guardian reports.

A FALSE video claiming Biden forgot what state he was in was viewed more than 1 million times on Twitter in the past 24 hours

In the video, Biden says “Hello, Minnesota.”

The event did indeed happen in MN — signs on stage read MN

But false video edited signs to read Florida pic.twitter.com/LdHQVaky8v

— Donie O'Sullivan (@donie) November 1, 2020

If you use social media, the chances are you see (and forward) some of the more than 3.2 billion images and 720,000 hours of video shared daily. When faced with such a glut of content, how can we know what’s real and what’s not?

While one part of the solution is an increased use of content verification tools, it’s equally important we all boost our digital media literacy. Ultimately, one of the best lines of defense—and the only one you can control—is you.

Seeing Shouldn’t Always Be Believing
Misinformation (when you accidentally share false content) and disinformation (when you intentionally share it) in any medium can erode trust in civil institutions such as news organizations, coalitions and social movements. However, fake photos and videos are often the most potent.

For those with a vested political interest, creating, sharing and/or editing false images can distract, confuse and manipulate viewers to sow discord and uncertainty (especially in already polarized environments). Posters and platforms can also make money from the sharing of fake, sensationalist content.

Only 11-25 percent of journalists globally use social media content verification tools, according to the International Centre for Journalists.

Could You Spot a Doctored Image?
Consider this photo of Martin Luther King Jr.

Dr. Martin Luther King Jr. Giving the middle finger #DopeHistoricPics pic.twitter.com/5W38DRaLHr

— Dope Historic Pics (@dopehistoricpic) December 20, 2013

This altered image clones part of the background over King Jr’s finger, so it looks like he’s flipping off the camera. It has been shared as genuine on Twitter, Reddit, and white supremacist websites.

In the original 1964 photo, King flashed the “V for victory” sign after learning the US Senate had passed the civil rights bill.

“Those who love peace must learn to organize as effectively as those who love war.”
Dr. Martin Luther King Jr.

This photo was taken on June 19th, 1964, showing Dr King giving a peace sign after hearing that the civil rights bill had passed the senate. @snopes pic.twitter.com/LXHmwMYZS5

— Willie's Reserve (@WilliesReserve) January 21, 2019

Beyond adding or removing elements, there’s a whole category of photo manipulation in which images are fused together.

Earlier this year, a photo of an armed man was photoshopped by Fox News, which overlaid the man onto other scenes without disclosing the edits, the Seattle Times reported.

You mean this guy who’s been photoshopped into three separate photos released by Fox News? pic.twitter.com/fAXpIKu77a

— Zander Yates ザンダーイェーツ (@ZanderYates) June 13, 2020

Similarly, the image below was shared thousands of times on social media in January, during Australia’s Black Summer bushfires. The AFP’s fact check confirmed it is not authentic and is actually a combination of several separate photos.

Image is more powerful than screams of Greta. A silent girl is holding a koala. She looks straight at you from the waters of the ocean where they found a refuge. She is wearing a breathing mask. A wall of fire is behind them. I do not know the name of the photographer #Australia pic.twitter.com/CrTX3lltdh

— EVC Music (@EVCMusicUK) January 6, 2020

Fully and Partially Synthetic Content
Online, you’ll also find sophisticated “deepfake” videos showing (usually famous) people saying or doing things they never did. Less advanced versions can be created using apps such as Zao and Reface.

Or, if you don’t want to use your photo for a profile picture, you can default to one of several websites offering hundreds of thousands of AI-generated, photorealistic images of people.

These people don’t exist, they’re just images generated by artificial intelligence. Generated Photos, CC BY

Editing Pixel Values and the (not so) Simple Crop
Cropping can greatly alter the context of a photo, too.

We saw this in 2017, when a US government employee edited official pictures of Donald Trump’s inauguration to make the crowd appear bigger, according to The Guardian. The staffer cropped out the empty space “where the crowd ended” for a set of pictures for Trump.

Views of the crowds at the inaugurations of former US President Barack Obama in 2009 (left) and President Donald Trump in 2017 (right). AP

But what about edits that only alter pixel values such as color, saturation, or contrast?

One historical example illustrates the consequences of this. In 1994, Time magazine’s cover of OJ Simpson considerably “darkened” Simpson in his police mugshot. This added fuel to a case already plagued by racial tension, to which the magazine responded, “No racial implication was intended, by Time or by the artist.”

Tools for Debunking Digital Fakery
For those of us who don’t want to be duped by visual mis/disinformation, there are tools available—although each comes with its own limitations (something we discuss in our recent paper).

Invisible digital watermarking has been proposed as a solution. However, it isn’t widespread and requires buy-in from both content publishers and distributors.

Reverse image search (such as Google’s) is often free and can be helpful for identifying earlier, potentially more authentic copies of images online. That said, it’s not foolproof because it:

Relies on unedited copies of the media already being online.
Doesn’t search the entire web.
Doesn’t always allow filtering by publication time. Some reverse image search services such as TinEye support this function, but Google’s doesn’t.
Returns only exact matches or near-matches, so it’s not thorough. For instance, editing an image and then flipping its orientation can fool Google into thinking it’s an entirely different one.

Most Reliable Tools Are Sophisticated
Meanwhile, manual forensic detection methods for visual mis/disinformation focus mostly on edits visible to the naked eye, or rely on examining features that aren’t included in every image (such as shadows). They’re also time-consuming, expensive, and need specialized expertise.

Still, you can access work in this field by visiting sites such as Snopes.com—which has a growing repository of “fauxtography.”

Computer vision and machine learning also offer relatively advanced detection capabilities for images and videos. But they too require technical expertise to operate and understand.

Moreover, improving them involves using large volumes of “training data,” but the image repositories used for this usually don’t contain the real-world images seen in the news.

If you use an image verification tool such as the REVEAL project’s image verification assistant, you might need an expert to help interpret the results.

The good news, however, is that before turning to any of the above tools, there are some simple questions you can ask yourself to potentially figure out whether a photo or video on social media is fake. Think:

Was it originally made for social media?
How widely and for how long was it circulated?
What responses did it receive?
Who were the intended audiences?

Quite often, the logical conclusions drawn from the answers will be enough to weed out inauthentic visuals. You can access the full list of questions, put together by Manchester Metropolitan University experts, here.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Image Credit: Simon Steinberger from Pixabay Continue reading

Posted in Human Robots