Tag Archives: voice

#439070 Are Digital Humans the Next Step in ...

Posted on April 6, 2021 by Android

In the fictional worlds of film and TV, artificial intelligence has been depicted as so advanced that it is indistinguishable from humans. But what if we’re actually getting closer to a world where AI is capable of thinking and feeling?

Tech company UneeQ is embarking on that journey with its “digital humans.” These avatars act as visual interfaces for customer service chatbots, virtual assistants, and other applications. UneeQ’s digital humans appear lifelike not only in terms of language and tone of voice, but also because of facial movements: raised eyebrows, a tilt of the head, a smile, even a wink. They transform a transaction into an interaction: creepy yet astonishing, human, but not quite.

What lies beneath UneeQ’s digital humans? Their 3D faces are modeled on actual human features. Speech recognition enables the avatar to understand what a person is saying, and natural language processing is used to craft a response. Before the avatar utters a word, specific emotions and facial expressions are encoded within the response.

UneeQ may be part of a larger trend towards humanizing computing. ObEN’s digital avatars serve as virtual identities for celebrities, influencers, gaming characters, and other entities in the media and entertainment industry. Meanwhile, Soul Machines is taking a more biological approach, with a “digital brain” that simulates aspects of the human brain to modulate the emotions “felt” and “expressed” by its “digital people.” Amelia is employing a similar methodology in building its “digital employees.” It emulates parts of the brain involved with memory to respond to queries and, with each interaction, learns to deliver more engaging and personalized experiences.

Shiwali Mohan, an AI systems scientist at the Palo Alto Research Center, is skeptical of these digital beings. “They’re humanlike in their looks and the way they sound, but that in itself is not being human,” she says. “Being human is also how you think, how you approach problems, and how you break them down; and that takes a lot of algorithmic design. Designing for human-level intelligence is a different endeavor than designing graphics that behave like humans. If you think about the problems we’re trying to design these avatars for, we might not need something that looks like a human—it may not even be the right solution path.”

And even if these avatars appear near-human, they still evoke an uncanny valley feeling. “If something looks like a human, we have high expectations of them, but they might behave differently in ways that humans just instinctively know how other humans react. These differences give rise to the uncanny valley feeling,” says Mohan.

Yet the demand is there, with Amelia seeing high adoption of its digital employees across the financial, health care, and retail sectors. “We find that banks and insurance companies, which are so risk-averse, are leading the adoption of such disruptive technologies because they understand that the risk of non-adoption is much greater than the risk of early adoption,” says Chetan Dube, Amelia’s CEO. “Unless they innovate their business models and make them much more efficient digitally, they might be left behind.” Dube adds that the COVID-19 pandemic has accelerated adoption of digital employees in health care and retail as well.

Amelia, Soul Machines, and UneeQ are taking their digital beings a step further, enabling organizations to create avatars themselves using low-code or no-code platforms: Digital Employee Builder for Amelia, Creator for UneeQ, and Digital DNA Studio for Soul Machines. Unreal Engine, a game engine developed by Epic Games, is doing the same with MetaHuman Creator, a tool that allows anyone to create photorealistic digital humans. “The biggest motivation for Digital Employee Builder is to democratize AI,” Dube says.

Mohan is cautious about this approach. “AI has problems with bias creeping in from data sets and into the way it speaks. The AI community is still trying to figure out how to measure and counter that bias,” she says. “[Companies] have to have an AI expert on board that can recommend the right things to build for.”

Despite being wary of the technology, Mohan supports the purpose behind these virtual beings and is optimistic about where they’re headed. “We do need these tools that support humans in different kinds of things. I think the vision is the pro, and I’m behind that vision,” she says. “As we develop more sophisticated AI technology, we would then have to implement novel ways of interacting with that technology. Hopefully, all of that is designed to support humans in their goals.” Continue reading →

Posted in Human Robots

#437769 Q&A: Facebook’s CTO Is at War With ...

Posted on November 29, 2020 by Android

Photo: Patricia de Melo Moreira/AFP/Getty Images

Facebook chief technology officer Mike Schroepfer leads the company’s AI and integrity efforts.

Facebook’s challenge is huge. Billions of pieces of content—short and long posts, images, and combinations of the two—are uploaded to the site daily from around the world. And any tiny piece of that—any phrase, image, or video—could contain so-called bad content.

In its early days, Facebook relied on simple computer filters to identify potentially problematic posts by their words, such as those containing profanity. These automatically filtered posts, as well as posts flagged by users as offensive, went to humans for adjudication.

In 2015, Facebook started using artificial intelligence to cull images that contained nudity, illegal goods, and other prohibited content; those images identified as possibly problematic were sent to humans for further review.

By 2016, more offensive photos were reported by Facebook’s AI systems than by Facebook users (and that is still the case).

In 2018, Facebook CEO Mark Zuckerberg made a bold proclamation: He predicted that within five or ten years, Facebook’s AI would not only look for profanity, nudity, and other obvious violations of Facebook’s policies. The tools would also be able to spot bullying, hate speech, and other misuse of the platform, and put an immediate end to them.

Today, automated systems using algorithms developed with AI scan every piece of content between the time when a user completes a post and when it is visible to others on the site—just fractions of a second. In most cases, a violation of Facebook’s standards is clear, and the AI system automatically blocks the post. In other cases, the post goes to human reviewers for a final decision, a workforce that includes 15,000 content reviewers and another 20,000 employees focused on safety and security, operating out of more than 20 facilities around the world.

In the first quarter of this year, Facebook removed or took other action (like appending a warning label) on more than 9.6 million posts involving hate speech, 8.6 million involving child nudity or exploitation, almost 8 million posts involving the sale of drugs, 2.3 million posts involving bullying and harassment, and tens of millions of posts violating other Facebook rules.

Right now, Facebook has more than 1,000 engineers working on further developing and implementing what the company calls “integrity” tools. Using these systems to screen every post that goes up on Facebook, and doing so in milliseconds, is sucking up computing resources. Facebook chief technology officer Mike Schroepfer, who is heading up Facebook’s AI and integrity efforts, spoke with IEEE Spectrum about the team’s progress on building an AI system that detects bad content.

Since that discussion, Facebook’s policies around hate speech have come under increasing scrutiny, with particular attention on divisive posts by political figures. A group of major advertisers in June announced that they would stop advertising on the platform while reviewing the situation, and civil rights groups are putting pressure on others to follow suit until Facebook makes policy changes related to hate speech and groups that promote hate, misinformation, and conspiracies.

Facebook CEO Mark Zuckerberg responded with news that Facebook will widen the category of what it considers hateful content in ads. Now the company prohibits claims that people from a specific race, ethnicity, national origin, religious affiliation, caste, sexual orientation, gender identity, or immigration status are a threat to the physical safety, health, or survival of others. The policy change also aims to better protect immigrants, migrants, refugees, and asylum seekers from ads suggesting these groups are inferior or expressing contempt. Finally, Zuckerberg announced that the company will label some problematic posts by politicians and government officials as content that violates Facebook’s policies.

However, civil rights groups say that’s not enough. And an independent audit released in July also said that Facebook needs to go much further in addressing civil rights concerns and disinformation.

Schroepfer indicated that Facebook’s AI systems are designed to quickly adapt to changes in policy. “I don’t expect considerable technical changes are needed to adjust,” he told Spectrum.

This interview has been edited and condensed for clarity.

IEEE Spectrum: What are the stakes of content moderation? Is this an existential threat to Facebook? And is it critical that you deal well with the issue of election interference this year?

Schroepfer: It’s probably existential; it’s certainly massive. We are devoting a tremendous amount of our attention to it.

The idea that anyone could meddle in an election is deeply disturbing and offensive to all of us here, just as people and citizens of democracies. We don’t want to see that happen anywhere, and certainly not on our watch. So whether it’s important to the company or not, it’s important to us as people. And I feel a similar way on the content-moderation side.

There are not a lot of easy choices here. The only way to prevent people, with certainty, from posting bad things is to not let them post anything. We can take away all voice and just say, “Sorry, the Internet’s too dangerous. No one can use it.” That will certainly get rid of all hate speech online. But I don’t want to end up in that world. And there are variants of that world that various governments are trying to implement, where they get to decide what’s true or not, and you as a person don’t. I don’t want to get there either.

My hope is that we can build a set of tools that make it practical for us to do a good enough job, so that everyone is still excited about the idea that anyone can share what they want, and so that Facebook is a safe and reasonable place for people to operate in.

Spectrum: You joined Facebook in 2008, before AI was part of the company’s toolbox. When did that change? When did you begin to think that AI tools would be useful to Facebook?

Schroepfer: Ten years ago, AI wasn’t commercially practical; the technology just didn’t work very well. In 2012, there was one of those moments that a lot of people point to as the beginning of the current revolution in deep learning and AI. A computer-vision model—a neural network—was trained using what we call supervised training, and it turned out to be better than all the existing models.

Spectrum: How is that training done, and how did computer-vision models come to Facebook?

Image: Facebook

Just Broccoli? Facebook’s image analysis algorithms can tell the difference between marijuana [left] and tempura broccoli [right] better than some humans.

Schroepfer: Say I take a bunch of photos and I have people look at them. If they see a photo of a cat, they put a text label that says cat; if it’s one of a dog, the text label says dog. If you build a big enough data set and feed that to the neural net, it learns how to tell the difference between cats and dogs.

Prior to 2012, it didn’t work very well. And then in 2012, there was this moment where it seemed like, “Oh wow, this technique might work.” And a few years later we were deploying that form of technology to help us detect problematic imagery.

Spectrum: Do your AI systems work equally well on all types of prohibited content?

Schroepfer: Nudity was technically easiest. I don’t need to understand language or culture to understand that this is either a naked human or not. Violence is a much more nuanced problem, so it was harder technically to get it right. And with hate speech, not only do you have to understand the language, it may be very contextual, even tied to recent events. A week before the Christchurch shooting [New Zealand, 2019], saying “I wish you were in the mosque” probably doesn’t mean anything. A week after, that might be a terrible thing to say.

Spectrum: How much progress have you made on hate speech?

Schroepfer: AI, in the first quarter of 2020, proactively detected 88.8 percent of the hate-speech content we removed, up from 80.2 percent in the previous quarter. In the first quarter of 2020, we took action on 9.6 million pieces of content for violating our hate-speech policies.

Image: Facebook

Off Label: Sometimes image analysis isn’t enough to determine whether a picture posted violates the company’s policies. In considering these candy-colored vials of marijuana, for example, the algorithms can look at any accompanying text and, if necessary, comments on the post.

Spectrum: It sounds like you’ve expanded beyond tools that analyze images and are also using AI tools that analyze text.

Schroepfer: AI started off as very siloed. People worked on language, people worked on computer vision, people worked on video. We’ve put these things together—in production, not just as research—into multimodal classifiers.

[Schroepfer shows a photo of a pan of Rice Krispies treats, with text referring to it as a “potent batch”] This is a case in which you have an image, and then you have the text on the post. This looks like Rice Krispies. On its own, this image is fine. You put the text together with it in a bigger model; that can then understand what’s going on. That didn’t work five years ago.

Spectrum: Today, every post that goes up on Facebook is immediately checked by automated systems. Can you explain that process?

Image: Facebook

Bigger Picture: Identifying hate speech is often a matter of context. Either the text or the photo in this post isn’t hateful standing alone, but putting them together tells a different story.

Schroepfer: You upload an image and you write some text underneath it, and the systems look at both the image and the text to try to see which, if any, policies it violates. Those decisions are based on our Community Standards. It will also look at other signals on the posts, like the comments people make.

It happens relatively instantly, though there may be times things happen after the fact. Maybe you uploaded a post that had misinformation in it, and at the time you uploaded it, we didn’t know it was misinformation. The next day we fact-check something and scan again; we may find your post and take it down. As we learn new things, we’re going to go back through and look for violations of what we now know to be a problem. Or, as people comment on your post, we might update our understanding of it. If people are saying, “That’s terrible,” or “That’s mean,” or “That looks fake,” those comments may be an interesting signal.

Spectrum: How is Facebook applying its AI tools to the problem of election interference?

Schroepfer: I would split election interference into two categories. There are times when you’re going after the content, and there are times you’re going after the behavior or the authenticity of the person.

On content, if you’re sharing misinformation, saying, “It’s super Wednesday, not super Tuesday, come vote on Wednesday,” that’s a problem whether you’re an American sitting in California or a foreign actor.

Other times, people create a series of Facebook pages pretending they’re Americans, but they’re really a foreign entity. That is a problem on its own, even if all the content they’re sharing completely meets our Community Standards. The problem there is that you have a foreign government running an information operation.

There, you need different tools. What you’re trying to do is put pieces together, to say, “Wait a second. All of these pages—Martians for Justice, Moonlings for Justice, and Venusians for Justice”—are all run by an administrator with an IP address that’s outside the United States. So they’re all connected, even though they’re pretending to not be connected. That’s a very different problem than me sitting in my office in Menlo Park [Calif.] sharing misinformation.

I’m not going to go into lots of technical detail, because this is an area of adversarial nature. The fundamental problem you’re trying to solve is that there’s one entity coordinating the activity of a bunch of things that look like they’re not all one thing. So this is a series of Instagram accounts, or a series of Facebook pages, or a series of WhatsApp accounts, and they’re pretending to be totally different things. We’re looking for signals that these things are related in some way. And we’re looking through the graph [what Facebook calls its map of relationships between users] to understand the properties of this network.

Spectrum: What cutting-edge AI tools and methods have you been working on lately?

Schroepfer: Supervised learning, with humans setting up the instruction process for the AI systems, is amazingly effective. But it has a very obvious flaw: the speed at which you can develop these things is limited by how fast you can curate the data sets. If you’re dealing in a problem domain where things change rapidly, you have to rebuild a new data set and retrain the whole thing.

Self-supervision is inspired by the way people learn, by the way kids explore the world around them. To get computers to do it themselves, we take a bunch of raw data and build a way for the computer to construct its own tests. For language, you scan a bunch of Web pages, and the computer builds a test where it takes a sentence, eliminates one of the words, and figures out how to predict what word belongs there. And because it created the test, it actually knows the answer. I can use as much raw text as I can find and store because it’s processing everything itself and doesn’t require us to sit down and build the information set. In the last two years there has been a revolution in language understanding as a result of AI self-supervised learning.

Spectrum: What else are you excited about?

Schroepfer: What we’ve been working on over the last few years is multilingual understanding. Usually, when I’m trying to figure out, say, whether something is hate speech or not I have to go through the whole process of training the model in every language. I have to do that one time for every language. When you make a post, the first thing we have to figure out is what language your post is in. “Ah, that’s Spanish. So send it to the Spanish hate-speech model.”

We’ve started to build a multilingual model—one box where you can feed in text in 40 different languages and it determines whether it’s hate speech or not. This is way more effective and easier to deploy.

To geek out for a second, just the idea that you can build a model that understands a concept in multiple languages at once is crazy cool. And it not only works for hate speech, it works for a variety of things.

When we started working on this multilingual model years ago, it performed worse than every single individual model. Now, it not only works as well as the English model, but when you get to the languages where you don’t have enough data, it’s so much better. This rapid progress is very exciting.

Spectrum: How do you move new AI tools from your research labs into operational use?

Schroepfer: Engineers trying to make the next breakthrough will often say, “Cool, I’ve got a new thing and it achieved state-of-the-art results on machine translation.” And we say, “Great. How long does it take to run in production?” They say, “Well, it takes 10 seconds for every sentence to run on a CPU.” And we say, “It’ll eat our whole data center if we deploy that.” So we take that state-of-the-art model and we make it 10 or a hundred or a thousand times more efficient, maybe at the cost of a little bit of accuracy. So it’s not as good as the state-of-the-art version, but it’s something we can actually put into our data centers and run in production.

Spectrum: What’s the role of the humans in the loop? Is it true that Facebook currently employs 35,000 moderators?

Schroepfer: Yes. Right now our goal is not to reduce that. Our goal is to do a better job catching bad content. People often think that the end state will be a fully automated system. I don’t see that world coming anytime soon.

As automated systems get more sophisticated, they take more and more of the grunt work away, freeing up the humans to work on the really gnarly stuff where you have to spend an hour researching.

We also use AI to give our human moderators power tools. Say I spot this new meme that is telling everyone to vote on Wednesday rather than Tuesday. I have a tool in front of me that says, “Find variants of that throughout the system. Find every photo with the same text, find every video that mentions this thing and kill it in one shot.” Rather than, I found this one picture, but then a bunch of other people upload that misinformation in different forms.

Another important aspect of AI is that anything I can do to prevent a person from having to look at terrible things is time well spent. Whether it’s a person employed by us as a moderator or a user of our services, looking at these things is a terrible experience. If I can build systems that take the worst of the worst, the really graphic violence, and deal with that in an automated fashion, that’s worth a lot to me. Continue reading →

Posted in Human Robots

#437550 McDonald’s Is Making a Plant-Based ...

Posted on November 18, 2020 by Android

Fast-food chains have been doing what they can in recent years to health-ify their menus. For better or worse, burgers, fries, fried chicken, roast beef sandwiches, and the like will never go out of style—this is America, after all—but consumers are increasingly gravitating towards healthier options.

One of those options is plant-based foods, and not just salads and veggie burgers, but “meat” made from plants. Burger King was one of the first big fast-food chains to jump on the plant-based meat bandwagon, introducing its Impossible Whopper in restaurants across the country last year after a successful pilot program. Dunkin’ (formerly Dunkin’ Donuts) uses plant-based patties in its Beyond Sausage breakfast sandwiches.

But there’s one big player in the fast food market that’s been oddly missing from the plant-based trend—until now. McDonald’s announced last week that it will debut a sandwich called the McPlant in key US markets next year. Unlike Dunkin’ and Burger King, who both worked with Impossible Foods to make their plant-based products, McDonald’s worked with Los Angeles-based Beyond Meat, which makes chicken, beef, and pork-like products from plants.

According to Bloomberg, though, McDonald’s decided to forego a partnership with Beyond Meat in favor of creating its own plant-based products. Imitation chicken nuggets and plant-based breakfast sandwiches are in its plans as well.

McDonald’s has bounced back impressively from its March low (when the coronavirus lockdowns first happened in the US). Last month the company’s stock reached a 52-week high of $231 per share (as compared to its low in March of $124 per share).

To keep those numbers high and make it as easy as possible for customers to get their hands on plant-based burgers and all the traditional menu items too, the fast food chain is investing in tech and integrating more digital offerings into its restaurants.

McDonald’s has acquired a couple artificial intelligence companies in the last year and a half; Dynamic Yield is an Israeli company that uses AI to personalize customers’ experiences, and McDonald’s is using Dynamic Yield’s tech on its smart menu boards, for example by customizing the items displayed on the drive-thru menu based on the weather and the time of day, and recommending additional items based on what a customer asks for first (i.e. “You know what would go great with that coffee? Some pancakes!”).

The fast food giant also bought Apprente, a startup that uses AI in voice-based ordering platforms. McDonald’s is using the tech to help automate its drive-throughs.

In addition to these investments, the company plans to launch a digital hub called MyMcDonald’s that will include a loyalty program, start doing deliveries of its food through its mobile app, and test different ways of streamlining the food order and pickup process—with many of the new ideas geared towards pandemic times, like express pickup lanes for people who placed digital orders and restaurants with drive-throughs for delivery and pickup orders only.

Plant-based meat patties appear to be just one small piece of McDonald’s modernization plans. Those of us who were wondering what they were waiting for should have known—one of the most-recognized fast food chains in the world wasn’t about to let itself get phased out. It seems it will only be a matter of time until you can pull out your phone, make a few selections, and have a burger made from plants—with a side of fries made from more plants—show up at your door a little while later. Drive-throughs, shouting your order into a fuzzy speaker with a confused teen on the other end, and burgers made from beef? So 2019.

Image Credit: McDonald’s Continue reading →

Posted in Human Robots

#437446 Can the voice of healthcare robots ...

Posted on October 12, 2020 by Android

Robots are gradually making their way into hospitals and other clinical facilities, providing basic assistance to doctors and patients. To facilitate their widespread use in health care settings, however, robotics researchers need to ensure that users feel at ease with robots and accept the help they can offer. This could potentially be achieved by developing robots that communicate in empathetic and compassionate ways. Continue reading →

Posted in Human Robots

#437373 Microsoft’s New Deepfake Detector Puts ...

Posted on September 5, 2020 by Android

The upcoming US presidential election seems set to be something of a mess—to put it lightly. Covid-19 will likely deter millions from voting in person, and mail-in voting isn’t shaping up to be much more promising. This all comes at a time when political tensions are running higher than they have in decades, issues that shouldn’t be political (like mask-wearing) have become highly politicized, and Americans are dramatically divided along party lines.

So the last thing we need right now is yet another wrench in the spokes of democracy, in the form of disinformation; we all saw how that played out in 2016, and it wasn’t pretty. For the record, disinformation purposely misleads people, while misinformation is simply inaccurate, but without malicious intent. While there’s not a ton tech can do to make people feel safe at crowded polling stations or up the Postal Service’s budget, tech can help with disinformation, and Microsoft is trying to do so.

On Tuesday the company released two new tools designed to combat disinformation, described in a blog post by VP of Customer Security and Trust Tom Burt and Chief Scientific Officer Eric Horvitz.

The first is Microsoft Video Authenticator, which is made to detect deepfakes. In case you’re not familiar with this wicked byproduct of AI progress, “deepfakes” refers to audio or visual files made using artificial intelligence that can manipulate peoples’ voices or likenesses to make it look like they said things they didn’t. Editing a video to string together words and form a sentence someone didn’t say doesn’t count as a deepfake; though there’s manipulation involved, you don’t need a neural network and you’re not generating any original content or footage.

The Authenticator analyzes videos or images and tells users the percentage chance that they’ve been artificially manipulated. For videos, the tool can even analyze individual frames in real time.

Deepfake videos are made by feeding hundreds of hours of video of someone into a neural network, “teaching” the network the minutiae of the person’s voice, pronunciation, mannerisms, gestures, etc. It’s like when you do an imitation of your annoying coworker from accounting, complete with mimicking the way he makes every sentence sound like a question and his eyes widen when he talks about complex spreadsheets. You’ve spent hours—no, months—in his presence and have his personality quirks down pat. An AI algorithm that produces deepfakes needs to learn those same quirks, and more, about whoever the creator’s target is.

Given enough real information and examples, the algorithm can then generate its own fake footage, with deepfake creators using computer graphics and manually tweaking the output to make it as realistic as possible.

The scariest part? To make a deepfake, you don’t need a fancy computer or even a ton of knowledge about software. There are open-source programs people can access for free online, and as far as finding video footage of famous people—well, we’ve got YouTube to thank for how easy that is.

Microsoft’s Video Authenticator can detect the blending boundary of a deepfake and subtle fading or greyscale elements that the human eye may not be able to see.

In the blog post, Burt and Horvitz point out that as time goes by, deepfakes are only going to get better and become harder to detect; after all, they’re generated by neural networks that are continuously learning from and improving themselves.

Microsoft’s counter-tactic is to come in from the opposite angle, that is, being able to confirm beyond doubt that a video, image, or piece of news is real (I mean, can McDonald’s fries cure baldness? Did a seal slap a kayaker in the face with an octopus? Never has it been so imperative that the world know the truth).

A tool built into Microsoft Azure, the company’s cloud computing service, lets content producers add digital hashes and certificates to their content, and a reader (which can be used as a browser extension) checks the certificates and matches the hashes to indicate the content is authentic.

Finally, Microsoft also launched an interactive “Spot the Deepfake” quiz it developed in collaboration with the University of Washington’s Center for an Informed Public, deepfake detection company Sensity, and USA Today. The quiz is intended to help people “learn about synthetic media, develop critical media literacy skills, and gain awareness of the impact of synthetic media on democracy.”

The impact Microsoft’s new tools will have remains to be seen—but hey, we’re glad they’re trying. And they’re not alone; Facebook, Twitter, and YouTube have all taken steps to ban and remove deepfakes from their sites. The AI Foundation’s Reality Defender uses synthetic media detection algorithms to identify fake content. There’s even a coalition of big tech companies teaming up to try to fight election interference.

One thing is for sure: between a global pandemic, widespread protests and riots, mass unemployment, a hobbled economy, and the disinformation that’s remained rife through it all, we’re going to need all the help we can get to make it through not just the election, but the rest of the conga-line-of-catastrophes year that is 2020.

Image Credit: Darius Bashar on Unsplash Continue reading →

Posted in Human Robots

Humanoid Gallery

Popular Searches

Tag Archives: voice

#439070 Are Digital Humans the Next Step in ...

#437769 Q&A: Facebook’s CTO Is at War With ...

#437550 McDonald’s Is Making a Plant-Based ...

#437446 Can the voice of healthcare robots ...

#437373 Microsoft’s New Deepfake Detector Puts ...