Tag Archives: Visual
A new technique using artificial intelligence to manipulate video content gives new meaning to the expression “talking head.”
An international team of researchers showcased the latest advancement in synthesizing facial expressions—including mouth, eyes, eyebrows, and even head position—in video at this month’s 2018 SIGGRAPH, a conference on innovations in computer graphics, animation, virtual reality, and other forms of digital wizardry.
The project is called Deep Video Portraits. It relies on a type of AI called generative adversarial networks (GANs) to modify a “target” actor based on the facial and head movement of a “source” actor. As the name implies, GANs pit two opposing neural networks against one another to create a realistic talking head, right down to the sneer or raised eyebrow.
In this case, the adversaries are actually working together: One neural network generates content, while the other rejects or approves each effort. The back-and-forth interplay between the two eventually produces a realistic result that can easily fool the human eye, including reproducing a static scene behind the head as it bobs back and forth.
The researchers say the technique can be used by the film industry for a variety of purposes, from editing facial expressions of actors for matching dubbed voices to repositioning an actor’s head in post-production. AI can not only produce highly realistic results, but much quicker ones compared to the manual processes used today, according to the researchers. You can read the full paper of their work here.
“Deep Video Portraits shows how such a visual effect could be created with less effort in the future,” said Christian Richardt, from the University of Bath’s motion capture research center CAMERA, in a press release. “With our approach, even the positioning of an actor’s head and their facial expression could be easily edited to change camera angles or subtly change the framing of a scene to tell the story better.”
AI Tech Different Than So-Called “Deepfakes”
The work is far from the first to employ AI to manipulate video and audio. At last year’s SIGGRAPH conference, researchers from the University of Washington showcased their work using algorithms that inserted audio recordings from a person in one instance into a separate video of the same person in a different context.
In this case, they “faked” a video using a speech from former President Barack Obama addressing a mass shooting incident during his presidency. The AI-doctored video injects the audio into an unrelated video of the president while also blending the facial and mouth movements, creating a pretty credible job of lip synching.
A previous paper by many of the same scientists on the Deep Video Portraits project detailed how they were first able to manipulate a video in real time of a talking head (in this case, actor and former California governor Arnold Schwarzenegger). The Face2Face system pulled off this bit of digital trickery using a depth-sensing camera that tracked the facial expressions of an Asian female source actor.
A less sophisticated method of swapping faces using a machine learning software dubbed FakeApp emerged earlier this year. Predictably, the tech—requiring numerous photos of the source actor in order to train the neural network—was used for more juvenile pursuits, such as injecting a person’s face onto a porn star.
The application gave rise to the term “deepfakes,” which is now used somewhat ubiquitously to describe all such instances of AI-manipulated video—much to the chagrin of some of the researchers involved in more legitimate uses.
Fighting AI-Created Video Forgeries
However, the researchers are keenly aware that their work—intended for benign uses such as in the film industry or even to correct gaze and head positions for more natural interactions through video teleconferencing—could be used for nefarious purposes. Fake news is the most obvious concern.
“With ever-improving video editing technology, we must also start being more critical about the video content we consume every day, especially if there is no proof of origin,” said Michael Zollhöfer, a visiting assistant professor at Stanford University and member of the Deep Video Portraits team, in the press release.
Toward that end, the research team is training the same adversarial neural networks to spot video forgeries. They also strongly recommend that developers clearly watermark videos that are edited through AI or otherwise, and denote clearly what part and element of the scene was modified.
To catch less ethical users, the US Department of Defense, through the Defense Advanced Research Projects Agency (DARPA), is supporting a program called Media Forensics. This latest DARPA challenge enlists researchers to develop technologies to automatically assess the integrity of an image or video, as part of an end-to-end media forensics platform.
The DARPA official in charge of the program, Matthew Turek, did tell MIT Technology Review that so far the program has “discovered subtle cues in current GAN-manipulated images and videos that allow us to detect the presence of alterations.” In one reported example, researchers have targeted eyes, which rarely blink in the case of “deepfakes” like those created by FakeApp, because the AI is trained on still pictures. That method would seem to be less effective to spot the sort of forgeries created by Deep Video Portraits, which appears to flawlessly match the entire facial and head movements between the source and target actors.
“We believe that the field of digital forensics should and will receive a lot more attention in the future to develop approaches that can automatically prove the authenticity of a video clip,” Zollhöfer said. “This will lead to ever-better approaches that can spot such modifications even if we humans might not be able to spot them with our own eyes.
Image Credit: Tancha / Shutterstock.com Continue reading
“Rashmi” claims to be India’s first humanoid robot. It’s still in development, but will be a life-like lip-syncing, multi-lingual smart machine that uses linguistic interpretation (LI), artificial intelligence (AI), visual data and facial recognition systems. Related Posts 4 Ways Scientists Hope Nanobots … Continue reading
Robotics has come a long way in the past few years. Robots can now fetch items from specific spots in massive warehouses, swim through the ocean to study marine life, and lift 200 times their own weight. They can even perform synchronized dance routines.
But the really big question is—can robots put together an Ikea chair?
A team of engineers from Nanyang Technological University in Singapore decided to find out, detailing their work in a paper published last week in the journal Science Robotics. The team took industrial robot arms and equipped them with parallel grippers, force-detecting sensors, and 3D cameras, and wrote software enabling the souped-up bots to tackle chair assembly. The robots’ starting point was a set of chair parts randomly scattered within reach.
As impressive as the above-mentioned robotic capabilities are, it’s worth noting that they’re mostly limited to a single skill. Putting together furniture, on the other hand, requires using and precisely coordinating multiple skills, including force control, visual localization, hand-eye coordination, and the patience to read each step of the manual without rushing through it and messing everything up.
Indeed, Ikea furniture, while meant to be simple and user-friendly, has left even the best of us scratching our heads and holding a spare oddly-shaped piece of wood as we stare at the desk or bed frame we just put together—or, for the less even-tempered among us, throwing said piece of wood across the room.
It’s a good thing robots don’t have tempers, because it took a few tries for the bots to get the chair assembly right.
Practice makes perfect, though (or in this case, rewriting code makes perfect), and these bots didn’t give up so easily. They had to hone three different skills: identifying which part was which among the scattered, differently-shaped pieces of wood, coordinating their movements to put those pieces in the right place, and knowing how much force to use in various steps of the process (i.e., more force is needed to connect two pieces than to pick up one piece).
A few tries later, the bots were able to assemble the chair from start to finish in about nine minutes.
On the whole, nicely done. But before we applaud the robots’ success too loudly, it’s important to note that they didn’t autonomously assemble the chair. Rather, each step of the process was planned and coded by engineers, down to the millimeter.
However, the team believes this closely-guided chair assembly was just a first step, and they see a not-so-distant future where combining artificial intelligence with advanced robotic capabilities could produce smart bots that would learn to assemble furniture and do other complex tasks on their own.
Future applications mentioned in the paper include electronics and aircraft manufacturing, logistics, and other high-mix, low-volume sectors.
Image Credit: Francisco Suárez-Ruiz and Quang-Cuong Pham/Nanyang Technological University Continue reading
Ever wished you could be in two places at the same time? The XPRIZE Foundation wants to make that a reality with a $10 million competition to build robot avatars that can be controlled from at least 100 kilometers away.
The competition was announced by XPRIZE founder Peter Diamandis at the SXSW conference in Austin last week, with an ambitious timeline of awarding the grand prize by October 2021. Teams have until October 31st to sign up, and they need to submit detailed plans to a panel of judges by the end of next January.
The prize, sponsored by Japanese airline ANA, has given contestants little guidance on how they expect them to solve the challenge other than saying their solutions need to let users see, hear, feel, and interact with the robot’s environment as well as the people in it.
XPRIZE has also not revealed details of what kind of tasks the robots will be expected to complete, though they’ve said tasks will range from “simple” to “complex,” and it should be possible for an untrained operator to use them.
That’s a hugely ambitious goal that’s likely to require teams to combine multiple emerging technologies, from humanoid robotics to virtual reality high-bandwidth communications and high-resolution haptics.
If any of the teams succeed, the technology could have myriad applications, from letting emergency responders enter areas too hazardous for humans to helping people care for relatives who live far away or even just allowing tourists to visit other parts of the world without the jet lag.
“Our ability to physically experience another geographic location, or to provide on-the-ground assistance where needed, is limited by cost and the simple availability of time,” Diamandis said in a statement.
“The ANA Avatar XPRIZE can enable creation of an audacious alternative that could bypass these limitations, allowing us to more rapidly and efficiently distribute skill and hands-on expertise to distant geographic locations where they are needed, bridging the gap between distance, time, and cultures,” he added.
Interestingly, the technology may help bypass an enduring hand break on the widespread use of robotics: autonomy. By having a human in the loop, you don’t need nearly as much artificial intelligence analyzing sensory input and making decisions.
Robotics software is doing a lot more than just high-level planning and strategizing, though. While a human moves their limbs instinctively without consciously thinking about which muscles to activate, controlling and coordinating a robot’s components requires sophisticated algorithms.
The DARPA Robotics Challenge demonstrated just how hard it was to get human-shaped robots to do tasks humans would find simple, such as opening doors, climbing steps, and even just walking. These robots were supposedly semi-autonomous, but on many tasks they were essentially tele-operated, and the results suggested autonomy isn’t the only problem.
There’s also the issue of powering these devices. You may have noticed that in a lot of the slick web videos of humanoid robots doing cool things, the machine is attached to the roof by a large cable. That’s because they suck up huge amounts of power.
Possibly the most advanced humanoid robot—Boston Dynamics’ Atlas—has a battery, but it can only run for about an hour. That might be fine for some applications, but you don’t want it running out of juice halfway through rescuing someone from a mine shaft.
When it comes to the link between the robot and its human user, some of the technology is probably not that much of a stretch. Virtual reality headsets can create immersive audio-visual environments, and a number of companies are working on advanced haptic suits that will let people “feel” virtual environments.
Motion tracking technology may be more complicated. While even consumer-grade devices can track peoples’ movements with high accuracy, you will probably need to don something more like an exoskeleton that can both pick up motion and provide mechanical resistance, so that when the robot bumps into an immovable object, the user stops dead too.
How hard all of this will be is also dependent on how the competition ultimately defines subjective terms like “feel” and “interact.” Will the user need to be able to feel a gentle breeze on the robot’s cheek or be able to paint a watercolor? Or will simply having the ability to distinguish a hard object from a soft one or shake someone’s hand be enough?
Whatever the fidelity they decide on, the approach will require huge amounts of sensory and control data to be transmitted over large distances, most likely wirelessly, in a way that’s fast and reliable enough that there’s no lag or interruptions. Fortunately 5G is launching this year, with a speed of 10 gigabits per second and very low latency, so this problem should be solved by 2021.
And it’s worth remembering there have already been some tentative attempts at building robotic avatars. Telepresence robots have solved the seeing, hearing, and some of the interacting problems, and MIT has already used virtual reality to control robots to carry out complex manipulation tasks.
South Korean company Hankook Mirae Technology has also unveiled a 13-foot-tall robotic suit straight out of a sci-fi movie that appears to have made some headway with the motion tracking problem, albeit with a human inside the robot. Toyota’s T-HR3 does the same, but with the human controlling the robot from a “Master Maneuvering System” that marries motion tracking with VR.
Combining all of these capabilities into a single machine will certainly prove challenging. But if one of the teams pulls it off, you may be able to tick off trips to the Seven Wonders of the World without ever leaving your house.
Image Credit: ANA Avatar XPRIZE Continue reading