Tag Archives: film
If a recent project using Google’s DeepMind were a recipe, you would take a pair of AI systems, images of animals, and a whole lot of computing power. Mix it all together, and you’d get a series of imagined animals dreamed up by one of the AIs. A look through the research paper about the project—or this open Google Folder of images it produced—will likely lead you to agree that the results are a mix of impressive and downright eerie.
But the eerie factor doesn’t mean the project shouldn’t be considered a success and a step forward for future uses of AI.
From GAN To BigGAN
The team behind the project consists of Andrew Brock, a PhD student at Edinburgh Center for Robotics, and DeepMind intern and researcher Jeff Donahue and Karen Simonyan.
They used a so-called Generative Adversarial Network (GAN) to generate the images. In a GAN, two AI systems collaborate in a game-like manner. One AI produces images of an object or creature. The human equivalent would be drawing pictures of, for example, a dog—without necessarily knowing what a dog exactly looks like. Those images are then shown to the second AI, which has already been fed images of dogs. The second AI then tells the first one how far off its efforts were. The first one uses this information to improve its images. The two go back and forth in an iterative process, and the goal is for the first AI to become so good at creating images of dogs that the second can’t tell the difference between its creations and actual pictures of dogs.
The team was able to draw on Google’s vast vaults of computational power to create images of a quality and life-like nature that were beyond almost anything seen before. In part, this was achieved by feeding the GAN with more images than is usually the case. According to IFLScience, the standard is to feed about 64 images per subject into the GAN. In this case, the research team fed about 2,000 images per subject into the system, leading to it being nicknamed BigGAN.
Their results showed that feeding the system with more images and using masses of raw computer power markedly increased the GAN’s precision and ability to create life-like renditions of the subjects it was trained to reproduce.
“The main thing these models need is not algorithmic improvements, but computational ones. […] When you increase model capacity and you increase the number of images you show at every step, you get this twofold combined effect,” Andrew Brock told Fast Company.
The Power Drain
The team used 512 of Google’s AI-focused Tensor Processing Units (TPU) to generate 512-pixel images. Each experiment took between 24 and 48 hours to run.
That kind of computing power needs a lot of electricity. As artist and Innovator-In-Residence at the Library of Congress Jer Thorp tongue-in-cheek put it on Twitter: “The good news is that AI can now give you a more believable image of a plate of spaghetti. The bad news is that it used roughly enough energy to power Cleveland for the afternoon.”
Thorp added that a back-of-the-envelope calculation showed that the computations to produce the images would require about 27,000 square feet of solar panels to have adequate power.
BigGAN’s images have been hailed by researchers, with Oriol Vinyals, research scientist at DeepMind, rhetorically asking if these were the ‘Best GAN samples yet?’
However, they are still not perfect. The number of legs on a given creature is one example of where the BigGAN seemed to struggle. The system was good at recognizing that something like a spider has a lot of legs, but seemed unable to settle on how many ‘a lot’ was supposed to be. The same applied to dogs, especially if the images were supposed to show said dogs in motion.
Those eerie images are contrasted by other renditions that show such lifelike qualities that a human mind has a hard time identifying them as fake. Spaniels with lolling tongues, ocean scenery, and butterflies were all rendered with what looks like perfection. The same goes for an image of a hamburger that was good enough to make me stop writing because I suddenly needed lunch.
The Future Use Cases
GAN networks were first introduced in 2014, and given their relative youth, researchers and companies are still busy trying out possible use cases.
One possible use is image correction—making pixillated images clearer. Not only does this help your future holiday snaps, but it could be applied in industries such as space exploration. A team from the University of Michigan and the Max Planck Institute have developed a method for GAN networks to create images from text descriptions. At Berkeley, a research group has used GAN to create an interface that lets users change the shape, size, and design of objects, including a handbag.
For anyone who has seen a film like Wag the Dog or read 1984, the possibilities are also starkly alarming. GANs could, in other words, make fake news look more real than ever before.
For now, it seems that while not all GANs require the computational and electrical power of the BigGAN, there is still some way to reach these potential use cases. However, if there’s one lesson from Moore’s Law and exponential technology, it is that today’s technical roadblock quickly becomes tomorrow’s minor issue as technology progresses.
Image Credit: Ondrej Prosicky/Shutterstock Continue reading
A new technique using artificial intelligence to manipulate video content gives new meaning to the expression “talking head.”
An international team of researchers showcased the latest advancement in synthesizing facial expressions—including mouth, eyes, eyebrows, and even head position—in video at this month’s 2018 SIGGRAPH, a conference on innovations in computer graphics, animation, virtual reality, and other forms of digital wizardry.
The project is called Deep Video Portraits. It relies on a type of AI called generative adversarial networks (GANs) to modify a “target” actor based on the facial and head movement of a “source” actor. As the name implies, GANs pit two opposing neural networks against one another to create a realistic talking head, right down to the sneer or raised eyebrow.
In this case, the adversaries are actually working together: One neural network generates content, while the other rejects or approves each effort. The back-and-forth interplay between the two eventually produces a realistic result that can easily fool the human eye, including reproducing a static scene behind the head as it bobs back and forth.
The researchers say the technique can be used by the film industry for a variety of purposes, from editing facial expressions of actors for matching dubbed voices to repositioning an actor’s head in post-production. AI can not only produce highly realistic results, but much quicker ones compared to the manual processes used today, according to the researchers. You can read the full paper of their work here.
“Deep Video Portraits shows how such a visual effect could be created with less effort in the future,” said Christian Richardt, from the University of Bath’s motion capture research center CAMERA, in a press release. “With our approach, even the positioning of an actor’s head and their facial expression could be easily edited to change camera angles or subtly change the framing of a scene to tell the story better.”
AI Tech Different Than So-Called “Deepfakes”
The work is far from the first to employ AI to manipulate video and audio. At last year’s SIGGRAPH conference, researchers from the University of Washington showcased their work using algorithms that inserted audio recordings from a person in one instance into a separate video of the same person in a different context.
In this case, they “faked” a video using a speech from former President Barack Obama addressing a mass shooting incident during his presidency. The AI-doctored video injects the audio into an unrelated video of the president while also blending the facial and mouth movements, creating a pretty credible job of lip synching.
A previous paper by many of the same scientists on the Deep Video Portraits project detailed how they were first able to manipulate a video in real time of a talking head (in this case, actor and former California governor Arnold Schwarzenegger). The Face2Face system pulled off this bit of digital trickery using a depth-sensing camera that tracked the facial expressions of an Asian female source actor.
A less sophisticated method of swapping faces using a machine learning software dubbed FakeApp emerged earlier this year. Predictably, the tech—requiring numerous photos of the source actor in order to train the neural network—was used for more juvenile pursuits, such as injecting a person’s face onto a porn star.
The application gave rise to the term “deepfakes,” which is now used somewhat ubiquitously to describe all such instances of AI-manipulated video—much to the chagrin of some of the researchers involved in more legitimate uses.
Fighting AI-Created Video Forgeries
However, the researchers are keenly aware that their work—intended for benign uses such as in the film industry or even to correct gaze and head positions for more natural interactions through video teleconferencing—could be used for nefarious purposes. Fake news is the most obvious concern.
“With ever-improving video editing technology, we must also start being more critical about the video content we consume every day, especially if there is no proof of origin,” said Michael Zollhöfer, a visiting assistant professor at Stanford University and member of the Deep Video Portraits team, in the press release.
Toward that end, the research team is training the same adversarial neural networks to spot video forgeries. They also strongly recommend that developers clearly watermark videos that are edited through AI or otherwise, and denote clearly what part and element of the scene was modified.
To catch less ethical users, the US Department of Defense, through the Defense Advanced Research Projects Agency (DARPA), is supporting a program called Media Forensics. This latest DARPA challenge enlists researchers to develop technologies to automatically assess the integrity of an image or video, as part of an end-to-end media forensics platform.
The DARPA official in charge of the program, Matthew Turek, did tell MIT Technology Review that so far the program has “discovered subtle cues in current GAN-manipulated images and videos that allow us to detect the presence of alterations.” In one reported example, researchers have targeted eyes, which rarely blink in the case of “deepfakes” like those created by FakeApp, because the AI is trained on still pictures. That method would seem to be less effective to spot the sort of forgeries created by Deep Video Portraits, which appears to flawlessly match the entire facial and head movements between the source and target actors.
“We believe that the field of digital forensics should and will receive a lot more attention in the future to develop approaches that can automatically prove the authenticity of a video clip,” Zollhöfer said. “This will lead to ever-better approaches that can spot such modifications even if we humans might not be able to spot them with our own eyes.
Image Credit: Tancha / Shutterstock.com Continue reading
It’s the end of a long day in your apartment in the early 2040s. You decide your work is done for the day, stand up from your desk, and yawn. “Time for a film!” you say. The house responds to your cues. The desk splits into hundreds of tiny pieces, which flow behind you and take on shape again as a couch. The computer screen you were working on flows up the wall and expands into a flat projection screen. You relax into the couch and, after a few seconds, a remote control surfaces from one of its arms.
In a few seconds flat, you’ve gone from a neatly-equipped office to a home cinema…all within the same four walls. Who needs more than one room?
This is the dream of those who work on “programmable matter.”
In his recent book about AI, Max Tegmark makes a distinction between three different levels of computational sophistication for organisms. Life 1.0 is single-celled organisms like bacteria; here, hardware is indistinguishable from software. The behavior of the bacteria is encoded into its DNA; it cannot learn new things.
Life 2.0 is where humans live on the spectrum. We are more or less stuck with our hardware, but we can change our software by choosing to learn different things, say, Spanish instead of Italian. Much like managing space on your smartphone, your brain’s hardware will allow you to download only a certain number of packages, but, at least theoretically, you can learn new behaviors without changing your underlying genetic code.
Life 3.0 marks a step-change from this: creatures that can change both their hardware and software in something like a feedback loop. This is what Tegmark views as a true artificial intelligence—one that can learn to change its own base code, leading to an explosion in intelligence. Perhaps, with CRISPR and other gene-editing techniques, we could be using our “software” to doctor our “hardware” before too long.
Programmable matter extends this analogy to the things in our world: what if your sofa could “learn” how to become a writing desk? What if, instead of a Swiss Army knife with dozens of tool attachments, you just had a single tool that “knew” how to become any other tool you could require, on command? In the crowded cities of the future, could houses be replaced by single, OmniRoom apartments? It would save space, and perhaps resources too.
Such are the dreams, anyway.
But when engineering and manufacturing individual gadgets is such a complex process, you can imagine that making stuff that can turn into many different items can be extremely complicated. Professor Skylar Tibbits at MIT referred to it as 4D printing in a TED Talk, and the website for his research group, the Self-Assembly Lab, excitedly claims, “We have also identified the key ingredients for self-assembly as a simple set of responsive building blocks, energy and interactions that can be designed within nearly every material and machining process available. Self-assembly promises to enable breakthroughs across many disciplines, from biology to material science, software, robotics, manufacturing, transportation, infrastructure, construction, the arts, and even space exploration.”
Naturally, their projects are still in the early stages, but the Self-Assembly Lab and others are genuinely exploring just the kind of science fiction applications we mooted.
For example, there’s the cell-phone self-assembly project, which brings to mind eerie, 24/7 factories where mobile phones assemble themselves from 3D printed kits without human or robotic intervention. Okay, so the phones they’re making are hardly going to fly off the shelves as fashion items, but if all you want is something that works, it could cut manufacturing costs substantially and automate even more of the process.
One of the major hurdles to overcome in making programmable matter a reality is choosing the right fundamental building blocks. There’s a very important balance to strike. To create fine details, you need to have things that aren’t too big, so as to keep your rearranged matter from being too lumpy. This might make the building blocks useless for certain applications—for example, if you wanted to make tools for fine manipulation. With big pieces, it might be difficult to simulate a range of textures. On the other hand, if the pieces are too small, different problems can arise.
Imagine a setup where each piece is a small robot. You have to contain the robot’s power source and its brain, or at least some kind of signal-generator and signal-processor, all in the same compact unit. Perhaps you can imagine that one might be able to simulate a range of textures and strengths by changing the strength of the “bond” between individual units—your desk might need to be a little bit more firm than your bed, which might be nicer with a little more give.
Early steps toward creating this kind of matter have been taken by those who are developing modular robots. There are plenty of different groups working on this, including MIT, Lausanne, and the University of Brussels.
In the latter configuration, one individual robot acts as a centralized decision-maker, referred to as the brain unit, but additional robots can autonomously join the brain unit as and when needed to change the shape and structure of the overall system. Although the system is only ten units at present, it’s a proof-of-concept that control can be orchestrated over a modular system of robots; perhaps in the future, smaller versions of the same thing could be the components of Stuff 3.0.
You can imagine that with machine learning algorithms, such swarms of robots might be able to negotiate obstacles and respond to a changing environment more easily than an individual robot (those of you with techno-fear may read “respond to a changing environment” and imagine a robot seamlessly rearranging itself to allow a bullet to pass straight through without harm).
Speaking of robotics, the form of an ideal robot has been a subject of much debate. In fact, one of the major recent robotics competitions—DARPA’s Robotics Challenge—was won by a robot that could adapt, beating Boston Dynamics’ infamous ATLAS humanoid with the simple addition of a wheel that allowed it to drive as well as walk.
Rather than building robots into a humanoid shape (only sometimes useful), allowing them to evolve and discover the ideal form for performing whatever you’ve tasked them to do could prove far more useful. This is particularly true in disaster response, where expensive robots can still be more valuable than humans, but conditions can be very unpredictable and adaptability is key.
Further afield, many futurists imagine “foglets” as the tiny nanobots that will be capable of constructing anything from raw materials, somewhat like the “Santa Claus machine.” But you don’t necessarily need anything quite so indistinguishable from magic to be useful. Programmable matter that can respond and adapt to its surroundings could be used in all kinds of industrial applications. How about a pipe that can strengthen or weaken at will, or divert its direction on command?
We’re some way off from being able to order our beds to turn into bicycles. As with many tech ideas, it may turn out that the traditional low-tech solution is far more practical and cost-effective, even as we can imagine alternatives. But as the march to put a chip in every conceivable object goes on, it seems certain that inanimate objects are about to get a lot more animated.
Image Credit: PeterVrabel / Shutterstock.com Continue reading
“The Six Ds are a chain reaction of technological progression, a road map of rapid development that always leads to enormous upheaval and opportunity.”
–Peter Diamandis and Steven Kotler, Bold
We live in incredible times. News travels the globe in an instant. Music, movies, games, communication, and knowledge are ever-available on always-connected devices. From biotechnology to artificial intelligence, powerful technologies that were once only available to huge organizations and governments are becoming more accessible and affordable thanks to digitization.
The potential for entrepreneurs to disrupt industries and corporate behemoths to unexpectedly go extinct has never been greater.
One hundred or fifty or even twenty years ago, disruption meant coming up with a product or service people needed but didn’t have yet, then finding a way to produce it with higher quality and lower costs than your competitors. This entailed hiring hundreds or thousands of employees, having a large physical space to put them in, and waiting years or even decades for hard work to pay off and products to come to fruition.
“Technology is disrupting traditional industrial processes, and they’re never going back.”
But thanks to digital technologies developing at exponential rates of change, the landscape of 21st-century business has taken on a dramatically different look and feel.
The structure of organizations is changing. Instead of thousands of employees and large physical plants, modern start-ups are small organizations focused on information technologies. They dematerialize what was once physical and create new products and revenue streams in months, sometimes weeks.
It no longer takes a huge corporation to have a huge impact.
Technology is disrupting traditional industrial processes, and they’re never going back. This disruption is filled with opportunity for forward-thinking entrepreneurs.
The secret to positively impacting the lives of millions of people is understanding and internalizing the growth cycle of digital technologies. This growth cycle takes place in six key steps, which Peter Diamandis calls the Six Ds of Exponentials: digitization, deception, disruption, demonetization, dematerialization, and democratization.
According to Diamandis, cofounder and chairman of Singularity University and founder and executive chairman of XPRIZE, when something is digitized it begins to behave like an information technology.
Newly digitized products develop at an exponential pace instead of a linear one, fooling onlookers at first before going on to disrupt companies and whole industries. Before you know it, something that was once expensive and physical is an app that costs a buck.
Newspapers and CDs are two obvious recent examples. The entertainment and media industries are still dealing with the aftermath of digitization as they attempt to transform and update old practices tailored to a bygone era. But it won’t end with digital media. As more of the economy is digitized—from medicine to manufacturing—industries will hop on an exponential curve and be similarly disrupted.
Diamandis’s 6 Ds are critical to understanding and planning for this disruption.
The 6 Ds of Exponential Organizations are Digitized, Deceptive, Disruptive, Demonetized, Dematerialized, and Democratized.
Diamandis uses the contrasting fates of Kodak and Instagram to illustrate the power of the six Ds and exponential thinking.
Kodak invented the digital camera in 1975, but didn’t invest heavily in the new technology, instead sticking with what had always worked: traditional cameras and film. In 1996, Kodak had a $28 billion market capitalization with 95,000 employees.
But the company didn’t pay enough attention to how digitization of their core business was changing it; people were no longer taking pictures in the same way and for the same reasons as before.
After a downward spiral, Kodak went bankrupt in 2012. That same year, Facebook acquired Instagram, a digital photo sharing app, which at the time was a startup with 13 employees. The acquisition’s price tag? $1 billion. And Instagram had been founded only 18 months earlier.
The most ironic piece of this story is that Kodak invented the digital camera; they took the first step toward overhauling the photography industry and ushering it into the modern age, but they were unwilling to disrupt their existing business by taking a risk in what was then uncharted territory. So others did it instead.
The same can happen with any technology that’s just getting off the ground. It’s easy to stop pursuing it in the early part of the exponential curve, when development appears to be moving slowly. But failing to follow through only gives someone else the chance to do it instead.
The Six Ds are a road map showing what can happen when an exponential technology is born. Not every phase is easy, but the results give even small teams the power to change the world in a faster and more impactful way than traditional business ever could.
Image Credit: Mohammed Tareq / Shutterstock Continue reading