Tag Archives: dog

#437614 Video Friday: Poimo Is a Portable ...

Video Friday is your weekly selection of awesome robotics videos, collected by your Automaton bloggers. We’ll also be posting a weekly calendar of upcoming robotics events for the next few months; here's what we have so far (send us your events!):

IROS 2020 – October 25-29, 2020 – [Online]
ROS World 2020 – November 12, 2020 – [Online]
CYBATHLON 2020 – November 13-14, 2020 – [Online]
ICSR 2020 – November 14-16, 2020 – Golden, Colo., USA
Let us know if you have suggestions for next week, and enjoy today's videos.

Engineers at the University of California San Diego have built a squid-like robot that can swim untethered, propelling itself by generating jets of water. The robot carries its own power source inside its body. It can also carry a sensor, such as a camera, for underwater exploration.

[ UCSD ]

Thanks Ioana!

Shark Robotics, French and European leader in Unmanned Ground Vehicles, is announcing today a disinfection add-on for Boston Dynamics Spot robot, designed to fight the COVID-19 pandemic. The Spot robot with Shark’s purpose-built disinfection payload can decontaminate up to 2,000 m2 in 15 minutes, in any space that needs to be sanitized – such as hospitals, metro stations, offices, warehouses or facilities.

[ Shark Robotics ]

Here’s an update on the Poimo portable inflatable mobility project we wrote about a little while ago; while not strictly robotics, it seems like it holds some promise for rapidly developing different soft structures that robotics might find useful.

[ University of Tokyo ]

Thanks Ryuma!

Pretty cool that you can do useful force feedback teleop while video chatting through a “regular broadband Internet connection.” Although, what “regular” means to you is a bit subjective, right?

[ HEBI Robotics ]

Thanks Dave!

While NASA's Mars rover Perseverance travels through space toward the Red Planet, its nearly identical rover twin is hard at work on Earth. The vehicle system test bed (VSTB) rover named OPTIMISM is a full-scale engineering version of the Mars-bound rover. It is used to test hardware and software before the commands are sent up to the Perseverance rover.

[ NASA ]

Jacquard takes ordinary, familiar objects and enhances them with new digital abilities and experiences, while remaining true to their original purpose — like being your favorite jacket, backpack or a pair of shoes that you love to wear.

Our ambition is simple: to make life easier. By staying connected to your digital world, your things can do so much more. Skip a song by brushing your sleeve. Take a picture by tapping on a shoulder strap. Get reminded about the phone you left behind with a blink of light or a haptic buzz on your cuff.

[ Google ATAP ]

Should you attend the IROS 2020 workshop on “Planetary Exploration Robots: Challenges and Opportunities”? Of course you should!

[ Workshop ]

Kuka makes a lot of these videos where I can’t help but think that if they put as much effort into programming the robot as they did into producing the video, the result would be much more impressive.

[ Kuka ]

The Colorado School of Mines is one of the first customers to buy a Spot robot from Boston Dynamics to help with robotics research. Watch as scientists take Spot into the school's mine for the first time.

[ HCR ] via [ CNET ]

A very interesting soft(ish) actuator from Ayato Kanada at Kyushu University's Control Engineering Lab.

A flexible ultrasonic motor (FUSM), which generates linear motion as a novel soft actuator. This motor consists of a single metal cube stator with a hole and an elastic elongated coil spring inserted into the hole. When voltages are applied to piezoelectric plates on the stator, the coil spring moves back and forward as a linear slider. In the FUSM that uses the friction drive as the principle, the most important parameter for optimizing its output is the preload between the stator and slider. The coil spring has a slightly larger diameter than the stator hole and generates the preload by expanding in a radial direction. The coil springs act not only as a flexible slider but also as a resistive positional sensor. Changes in the resistance between the stator and the coil spring end are converted to a voltage and used for position detection.

[ Control Engineering Lab ]

Thanks Ayato!

We show how to use the limbs of a quadruped robot to identify fine-grained soil, representative for Martian regolith.

[ Paper ] via [ ANYmal Research ]

PR2 is serving breakfast and cleaning up afterwards. It’s slow, but all you have to do is eat and leave.

That poor PR2 is a little more naked than it's probably comfortable with.

[ EASE ]

NVIDIA researchers present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped robot (the Unitree Laikago).

[ NVIDIA ]

What's interesting about this assembly task is that the robot is using its arm only for positioning, and doing the actual assembly with just fingers.

[ RC2L ]

In this electronics assembly application, Kawasaki's cobot duAro2 uses a tool changing station to tackle a multitude of tasks and assemble different CPU models.

Okay but can it apply thermal paste to a CPU in the right way? Personally, I find that impossible.

[ Kawasaki ]

You only need to watch this video long enough to appreciate the concept of putting a robot on a robot.

[ Impress ]

In this lecture, we’ll hear from the man behind one of the biggest robotics companies in the world, Boston Dynamics, whose robotic dog, Spot, has been used to encourage social distancing in Singapore and is now getting ready for FDA approval to be able to measure patients’ vital signs in hospitals.

[ Alan Turing Institute ]

Greg Kahn from UC Berkeley wrote in to share his recent dissertation talk on “Mobile Robot Learning.”

In order to create mobile robots that can autonomously navigate real-world environments, we need generalizable perception and control systems that can reason about the outcomes of navigational decisions. Learning-based methods, in which the robot learns to navigate by observing the outcomes of navigational decisions in the real world, offer considerable promise for obtaining these intelligent navigation systems. However, there are many challenges impeding mobile robots from autonomously learning to act in the real-world, in particular (1) sample-efficiency–how to learn using a limited amount of data? (2) supervision–how to tell the robot what to do? and (3) safety–how to ensure the robot and environment are not damaged or destroyed during learning? In this talk, I will present deep reinforcement learning methods for addressing these real world mobile robot learning challenges and show results which enable ground and aerial robots to navigate in complex indoor and outdoor environments.

[ UC Berkeley ]

Thanks Greg!

Leila Takayama from UC Santa Cruz (and previously Google X and Willow Garage) gives a talk entitled “Toward a more human-centered future of robotics.”

Robots are no longer only in outer space, in factory cages, or in our imaginations. We interact with robotic agents when withdrawing cash from bank ATMs, driving cars with adaptive cruise control, and tuning our smart home thermostats. In the moment of those interactions with robotic agents, we behave in ways that do not necessarily align with the rational belief that robots are just plain machines. Through a combination of controlled experiments and field studies, we use theories and concepts from the social sciences to explore ways that human and robotic agents come together, including how people interact with personal robots and how people interact through telepresence robots. Together, we will explore topics and raise questions about the psychology of human-robot interaction and how we could invent a future of a more human-centered robotics that we actually want to live in.

[ Leila Takayama ]

Roboticist and stand-up comedian Naomi Fitter from Oregon State University gives a talk on “Everything I Know about Telepresence.”

Telepresence robots hold promise to connect people by providing videoconferencing and navigation abilities in far-away environments. At the same time, the impacts of current commercial telepresence robots are not well understood, and circumstances of robot use including internet connection stability, odd personalizations, and interpersonal relationship between a robot operator and people co-located with the robot can overshadow the benefit of the robot itself. And although the idea of telepresence robots has been around for over two decades, available nonverbal expressive abilities through telepresence robots are limited, and suitable operator user interfaces for the robot (for example, controls that allow for the operator to hold a conversation and move the robot simultaneously) remain elusive. So where should we be using telepresence robots? Are there any pitfalls to watch out for? What do we know about potential robot expressivity and user interfaces? This talk will cover my attempts to address these questions and ways in which the robotics research community can build off of this work

[ Talking Robotics ] Continue reading

Posted in Human Robots

#437583 Video Friday: Attack of the Hexapod ...

Video Friday is your weekly selection of awesome robotics videos, collected by your Automaton bloggers. We’ll also be posting a weekly calendar of upcoming robotics events for the next few months; here’s what we have so far (send us your events!):

IROS 2020 – October 25-25, 2020 – [Online]
ROS World 2020 – November 12, 2020 – [Online]
CYBATHLON 2020 – November 13-14, 2020 – [Online]
ICSR 2020 – November 14-16, 2020 – Golden, Colo., USA
Let us know if you have suggestions for next week, and enjoy today’s videos.

Happy Halloween from HEBI Robotics!

Thanks Hardik!

[ HEBI Robotics ]

Happy Halloween from Berkshire Grey!

[ Berkshire Grey ]

These are some preliminary results of our lab’s new work on using reinforcement learning to train neural networks to imitate common bipedal gait behaviors, without using any motion capture data or reference trajectories. Our method is described in an upcoming submission to ICRA 2021. Work by Jonah Siekmann and Yesh Godse.

[ OSU DRL ]

The northern goshawk is a fast, powerful raptor that flies effortlessly through forests. This bird was the design inspiration for the next-generation drone developed by scientifics of the Laboratory of Intelligent Systems of EPFL led by Dario Floreano. They carefully studied the shape of the bird’s wings and tail and its flight behavior, and used that information to develop a drone with similar characteristics.

The engineers already designed a bird-inspired drone with morphing wing back in 2016. In a step forward, their new model can adjust the shape of its wing and tail thanks to its artificial feathers. Flying this new type of drone isn’t easy, due to the large number of wing and tail configurations possible. To take full advantage of the drone’s flight capabilities, Floreano’s team plans to incorporate artificial intelligence into the drone’s flight system so that it can fly semi-automatically. The team’s research has been published in Science Robotics.

[ EPFL ]

Oopsie.

[ Roborace ]

We’ve covered MIT’s Roboats in the past, but now they’re big enough to keep a couple of people afloat.

Self-driving boats have been able to transport small items for years, but adding human passengers has felt somewhat intangible due to the current size of the vessels. Roboat II is the “half-scale” boat in the growing body of work, and joins the previously developed quarter-scale Roboat, which is 1 meter long. The third installment, which is under construction in Amsterdam and is considered to be “full scale,” is 4 meters long and aims to carry anywhere from four to six passengers.

[ MIT ]

With a training technique commonly used to teach dogs to sit and stay, Johns Hopkins University computer scientists showed a robot how to teach itself several new tricks, including stacking blocks. With the method, the robot, named Spot, was able to learn in days what typically takes a month.

[ JHU ]

Exyn, a pioneer in autonomous aerial robot systems for complex, GPS-denied industrial environments, today announced the first dog, Kody, to successfully fly a drone at Number 9 Coal Mine, in Lansford, PA. Selected to carry out this mission was the new autonomous aerial robot, the ExynAero.

Yes, this is obviously a publicity stunt, and Kody is only flying the drone in the sense that he’s pushing the launch button and then taking a nap. But that’s also the point— drone autonomy doesn’t get much fuller than this, despite the challenge of the environment.

[ Exyn ]

In this video object instance segmentation and shape completion are combined with classical regrasp planning to perform pick-place of novel objects. It is demonstrated with a UR5, Robotiq 85 parallel-jaw gripper, and Structure depth sensor with three rearrangement tasks: bin packing (minimize the height of the packing), placing bottles onto coasters, and arrange blocks from tallest to shortest (according to the longest edge). The system also accounts for uncertainty in the segmentation/completion by avoiding grasping or placing on parts of the object where perceptual uncertainty is predicted to be high.

[ Paper ] via [ Northeastern ]

Thanks Marcus!

U can’t touch this!

[ University of Tokyo ]

We introduce a way to enable more natural interaction between humans and robots through Mixed Reality, by using a shared coordinate system. Azure Spatial Anchors, which already supports colocalizing multiple HoloLens and smartphone devices in the same space, has now been extended to support robots equipped with cameras. This allows humans and robots sharing the same space to interact naturally: humans can see the plan and intention of the robot, while the robot can interpret commands given from the person’s perspective. We hope that this can be a building block in the future of humans and robots being collaborators and coworkers.

[ Microsoft ]

Some very high jumps from the skinniest quadruped ever.

[ ODRI ]

In this video we present recent efforts to make our humanoid robot LOLA ready for multi-contact locomotion, i.e. additional hand-environment support for extra stabilization during walking.

[ TUM ]

Classic bike moves from Dr. Guero.

[ Dr. Guero ]

For a robotics company, iRobot is OLD.

[ iRobot ]

The Canadian Space Agency presents Juno, a preliminary version of a rover that could one day be sent to the Moon or Mars. Juno can navigate autonomously or be operated remotely. The Lunar Exploration Analogue Deployment (LEAD) consisted in replicating scenarios of a lunar sample return mission.

[ CSA ]

How exactly does the Waymo Driver handle a cat cutting across its driving path? Jonathan N., a Product Manager on our Perception team, breaks it all down for us.

Now do kangaroos.

[ Waymo ]

Jibo is hard at work at MIT playing games with kids.

Children’s creativity plummets as they enter elementary school. Social interactions with peers and playful environments have been shown to foster creativity in children. Digital pedagogical tools often lack the creativity benefits of co-located social interaction with peers. In this work, we leverage a social embodied robot as a playful peer and designed Escape!Bot, a game involving child-robot co-play, where the robot is a social agent that scaffolds for creativity during gameplay.

[ Paper ]

It’s nice when convenience stores are convenient even for the folks who have to do the restocking.

Who’s moving the crates around, though?

[ Telexistence ]

Hi, fans ! Join the ROS World 2020, opening November 12th , and see the footage of ROBOTIS’ ROS platform robots 🙂

[ ROS World 2020 ]

ML/RL methods are often viewed as a magical black box, and while that’s not true, learned policies are nonetheless a valuable tool that can work in conjunction with the underlying physics of the robot. In this video, Agility CTO Jonathan Hurst – wearing his professor hat at Oregon State University – presents some recent student work on using learned policies as a control method for highly dynamic legged robots.

[ Agility Robotics ]

Here’s an ICRA Legged Robots workshop talk from Marco Hutter at ETH Zürich, on Autonomy for ANYmal.

Recent advances in legged robots and their locomotion skills has led to systems that are skilled and mature enough for real-world deployment. In particular, quadrupedal robots have reached a level of mobility to navigate complex environments, which enables them to take over inspection or surveillance jobs in place like offshore industrial plants, in underground areas, or on construction sites. In this talk, I will present our research work with the quadruped ANYmal and explain some of the underlying technologies for locomotion control, environment perception, and mission autonomy. I will show how these robots can learn and plan complex maneuvers, how they can navigate through unknown environments, and how they are able to conduct surveillance, inspection, or exploration scenarios.

[ RSL ] Continue reading

Posted in Human Robots

#437466 How Future AI Could Recognize a Kangaroo ...

AI is continuously taking on new challenges, from detecting deepfakes (which, incidentally, are also made using AI) to winning at poker to giving synthetic biology experiments a boost. These impressive feats result partly from the huge datasets the systems are trained on. That training is costly and time-consuming, and it yields AIs that can really only do one thing well.

For example, to train an AI to differentiate between a picture of a dog and one of a cat, it’s fed thousands—if not millions—of labeled images of dogs and cats. A child, on the other hand, can see a dog or cat just once or twice and remember which is which. How can we make AIs learn more like children do?

A team at the University of Waterloo in Ontario has an answer: change the way AIs are trained.

Here’s the thing about the datasets normally used to train AI—besides being huge, they’re highly specific. A picture of a dog can only be a picture of a dog, right? But what about a really small dog with a long-ish tail? That sort of dog, while still being a dog, looks more like a cat than, say, a fully-grown Golden Retriever.

It’s this concept that the Waterloo team’s methodology is based on. They described their work in a paper published on the pre-print (or non-peer-reviewed) server arXiv last month. Teaching an AI system to identify a new class of objects using just one example is what they call “one-shot learning.” But they take it a step further, focusing on “less than one shot learning,” or LO-shot learning for short.

LO-shot learning consists of a system learning to classify various categories based on a number of examples that’s smaller than the number of categories. That’s not the most straightforward concept to wrap your head around, so let’s go back to the dogs and cats example. Say you want to teach an AI to identify dogs, cats, and kangaroos. How could that possibly be done without several clear examples of each animal?

The key, the Waterloo team says, is in what they call soft labels. Unlike hard labels, which label a data point as belonging to one specific class, soft labels tease out the relationship or degree of similarity between that data point and multiple classes. In the case of an AI trained on only dogs and cats, a third class of objects, say, kangaroos, might be described as 60 percent like a dog and 40 percent like a cat (I know—kangaroos probably aren’t the best animal to have thrown in as a third category).

“Soft labels can be used to represent training sets using fewer prototypes than there are classes, achieving large increases in sample efficiency over regular (hard-label) prototypes,” the paper says. Translation? Tell an AI a kangaroo is some fraction cat and some fraction dog—both of which it’s seen and knows well—and it’ll be able to identify a kangaroo without ever having seen one.

If the soft labels are nuanced enough, you could theoretically teach an AI to identify a large number of categories based on a much smaller number of training examples.

The paper’s authors use a simple machine learning algorithm called k-nearest neighbors (kNN) to explore this idea more in depth. The algorithm operates under the assumption that similar things are most likely to exist near each other; if you go to a dog park, there will be lots of dogs but no cats or kangaroos. Go to the Australian grasslands and there’ll be kangaroos but no cats or dogs. And so on.

To train a kNN algorithm to differentiate between categories, you choose specific features to represent each category (i.e. for animals you could use weight or size as a feature). With one feature on the x-axis and the other on the y-axis, the algorithm creates a graph where data points that are similar to each other are clustered near each other. A line down the center divides the categories, and it’s pretty straightforward for the algorithm to discern which side of the line new data points should fall on.

The Waterloo team kept it simple and used plots of color on a 2D graph. Using the colors and their locations on the graphs, the team created synthetic data sets and accompanying soft labels. One of the more simplistic graphs is pictured below, along with soft labels in the form of pie charts.

Image Credit: Ilia Sucholutsky & Matthias Schonlau
When the team had the algorithm plot the boundary lines of the different colors based on these soft labels, it was able to split the plot up into more colors than the number of data points it was given in the soft labels.

While the results are encouraging, the team acknowledges that they’re just the first step, and there’s much more exploration of this concept yet to be done. The kNN algorithm is one of the least complex models out there; what might happen when LO-shot learning is applied to a far more complex algorithm? Also, to apply it, you still need to distill a larger dataset down into soft labels.

One idea the team is already working on is having other algorithms generate the soft labels for the algorithm that’s going to be trained using LO-shot; manually designing soft labels won’t always be as easy as splitting up some pie charts into different colors.

LO-shot’s potential for reducing the amount of training data needed to yield working AI systems is promising. Besides reducing the cost and the time required to train new models, the method could also make AI more accessible to industries, companies, or individuals who don’t have access to large datasets—an important step for democratization of AI.

Image Credit: pen_ash from Pixabay Continue reading

Posted in Human Robots

#437276 Cars Will Soon Be Able to Sense and ...

Imagine you’re on your daily commute to work, driving along a crowded highway while trying to resist looking at your phone. You’re already a little stressed out because you didn’t sleep well, woke up late, and have an important meeting in a couple hours, but you just don’t feel like your best self.

Suddenly another car cuts you off, coming way too close to your front bumper as it changes lanes. Your already-simmering emotions leap into overdrive, and you lay on the horn and shout curses no one can hear.

Except someone—or, rather, something—can hear: your car. Hearing your angry words, aggressive tone, and raised voice, and seeing your furrowed brow, the onboard computer goes into “soothe” mode, as it’s been programmed to do when it detects that you’re angry. It plays relaxing music at just the right volume, releases a puff of light lavender-scented essential oil, and maybe even says some meditative quotes to calm you down.

What do you think—creepy? Helpful? Awesome? Weird? Would you actually calm down, or get even more angry that a car is telling you what to do?

Scenarios like this (maybe without the lavender oil part) may not be imaginary for much longer, especially if companies working to integrate emotion-reading artificial intelligence into new cars have their way. And it wouldn’t just be a matter of your car soothing you when you’re upset—depending what sort of regulations are enacted, the car’s sensors, camera, and microphone could collect all kinds of data about you and sell it to third parties.

Computers and Feelings
Just as AI systems can be trained to tell the difference between a picture of a dog and one of a cat, they can learn to differentiate between an angry tone of voice or facial expression and a happy one. In fact, there’s a whole branch of machine intelligence devoted to creating systems that can recognize and react to human emotions; it’s called affective computing.

Emotion-reading AIs learn what different emotions look and sound like from large sets of labeled data; “smile = happy,” “tears = sad,” “shouting = angry,” and so on. The most sophisticated systems can likely even pick up on the micro-expressions that flash across our faces before we consciously have a chance to control them, as detailed by Daniel Goleman in his groundbreaking book Emotional Intelligence.

Affective computing company Affectiva, a spinoff from MIT Media Lab, says its algorithms are trained on 5,313,751 face videos (videos of people’s faces as they do an activity, have a conversation, or react to stimuli) representing about 2 billion facial frames. Fascinatingly, Affectiva claims its software can even account for cultural differences in emotional expression (for example, it’s more normalized in Western cultures to be very emotionally expressive, whereas Asian cultures tend to favor stoicism and politeness), as well as gender differences.

But Why?
As reported in Motherboard, companies like Affectiva, Cerence, Xperi, and Eyeris have plans in the works to partner with automakers and install emotion-reading AI systems in new cars. Regulations passed last year in Europe and a bill just introduced this month in the US senate are helping make the idea of “driver monitoring” less weird, mainly by emphasizing the safety benefits of preemptive warning systems for tired or distracted drivers (remember that part in the beginning about sneaking glances at your phone? Yeah, that).

Drowsiness and distraction can’t really be called emotions, though—so why are they being lumped under an umbrella that has a lot of other implications, including what many may consider an eerily Big Brother-esque violation of privacy?

Our emotions, in fact, are among the most private things about us, since we are the only ones who know their true nature. We’ve developed the ability to hide and disguise our emotions, and this can be a useful skill at work, in relationships, and in scenarios that require negotiation or putting on a game face.

And I don’t know about you, but I’ve had more than one good cry in my car. It’s kind of the perfect place for it; private, secluded, soundproof.

Putting systems into cars that can recognize and collect data about our emotions under the guise of preventing accidents due to the state of mind of being distracted or the physical state of being sleepy, then, seems a bit like a bait and switch.

A Highway to Privacy Invasion?
European regulations will help keep driver data from being used for any purpose other than ensuring a safer ride. But the US is lagging behind on the privacy front, with car companies largely free from any enforceable laws that would keep them from using driver data as they please.

Affectiva lists the following as use cases for occupant monitoring in cars: personalizing content recommendations, providing alternate route recommendations, adapting environmental conditions like lighting and heating, and understanding user frustration with virtual assistants and designing those assistants to be emotion-aware so that they’re less frustrating.

Our phones already do the first two (though, granted, we’re not supposed to look at them while we drive—but most cars now let you use bluetooth to display your phone’s content on the dashboard), and the third is simply a matter of reaching a hand out to turn a dial or press a button. The last seems like a solution for a problem that wouldn’t exist without said… solution.

Despite how unnecessary and unsettling it may seem, though, emotion-reading AI isn’t going away, in cars or other products and services where it might provide value.

Besides automotive AI, Affectiva also makes software for clients in the advertising space. With consent, the built-in camera on users’ laptops records them while they watch ads, gauging their emotional response, what kind of marketing is most likely to engage them, and how likely they are to buy a given product. Emotion-recognition tech is also being used or considered for use in mental health applications, call centers, fraud monitoring, and education, among others.

In a 2015 TED talk, Affectiva co-founder Rana El-Kaliouby told her audience that we’re living in a world increasingly devoid of emotion, and her goal was to bring emotions back into our digital experiences. Soon they’ll be in our cars, too; whether the benefits will outweigh the costs remains to be seen.

Image Credit: Free-Photos from Pixabay Continue reading

Posted in Human Robots

#437269 DeepMind’s Newest AI Programs Itself ...

When Deep Blue defeated world chess champion Garry Kasparov in 1997, it may have seemed artificial intelligence had finally arrived. A computer had just taken down one of the top chess players of all time. But it wasn’t to be.

Though Deep Blue was meticulously programmed top-to-bottom to play chess, the approach was too labor-intensive, too dependent on clear rules and bounded possibilities to succeed at more complex games, let alone in the real world. The next revolution would take a decade and a half, when vastly more computing power and data revived machine learning, an old idea in artificial intelligence just waiting for the world to catch up.

Today, machine learning dominates, mostly by way of a family of algorithms called deep learning, while symbolic AI, the dominant approach in Deep Blue’s day, has faded into the background.

Key to deep learning’s success is the fact the algorithms basically write themselves. Given some high-level programming and a dataset, they learn from experience. No engineer anticipates every possibility in code. The algorithms just figure it.

Now, Alphabet’s DeepMind is taking this automation further by developing deep learning algorithms that can handle programming tasks which have been, to date, the sole domain of the world’s top computer scientists (and take them years to write).

In a paper recently published on the pre-print server arXiv, a database for research papers that haven’t been peer reviewed yet, the DeepMind team described a new deep reinforcement learning algorithm that was able to discover its own value function—a critical programming rule in deep reinforcement learning—from scratch.

Surprisingly, the algorithm was also effective beyond the simple environments it trained in, going on to play Atari games—a different, more complicated task—at a level that was, at times, competitive with human-designed algorithms and achieving superhuman levels of play in 14 games.

DeepMind says the approach could accelerate the development of reinforcement learning algorithms and even lead to a shift in focus, where instead of spending years writing the algorithms themselves, researchers work to perfect the environments in which they train.

Pavlov’s Digital Dog
First, a little background.

Three main deep learning approaches are supervised, unsupervised, and reinforcement learning.

The first two consume huge amounts of data (like images or articles), look for patterns in the data, and use those patterns to inform actions (like identifying an image of a cat). To us, this is a pretty alien way to learn about the world. Not only would it be mind-numbingly dull to review millions of cat images, it’d take us years or more to do what these programs do in hours or days. And of course, we can learn what a cat looks like from just a few examples. So why bother?

While supervised and unsupervised deep learning emphasize the machine in machine learning, reinforcement learning is a bit more biological. It actually is the way we learn. Confronted with several possible actions, we predict which will be most rewarding based on experience—weighing the pleasure of eating a chocolate chip cookie against avoiding a cavity and trip to the dentist.

In deep reinforcement learning, algorithms go through a similar process as they take action. In the Atari game Breakout, for instance, a player guides a paddle to bounce a ball at a ceiling of bricks, trying to break as many as possible. When playing Breakout, should an algorithm move the paddle left or right? To decide, it runs a projection—this is the value function—of which direction will maximize the total points, or rewards, it can earn.

Move by move, game by game, an algorithm combines experience and value function to learn which actions bring greater rewards and improves its play, until eventually, it becomes an uncanny Breakout player.

Learning to Learn (Very Meta)
So, a key to deep reinforcement learning is developing a good value function. And that’s difficult. According to the DeepMind team, it takes years of manual research to write the rules guiding algorithmic actions—which is why automating the process is so alluring. Their new Learned Policy Gradient (LPG) algorithm makes solid progress in that direction.

LPG trained in a number of toy environments. Most of these were “gridworlds”—literally two-dimensional grids with objects in some squares. The AI moves square to square and earns points or punishments as it encounters objects. The grids vary in size, and the distribution of objects is either set or random. The training environments offer opportunities to learn fundamental lessons for reinforcement learning algorithms.

Only in LPG’s case, it had no value function to guide that learning.

Instead, LPG has what DeepMind calls a “meta-learner.” You might think of this as an algorithm within an algorithm that, by interacting with its environment, discovers both “what to predict,” thereby forming its version of a value function, and “how to learn from it,” applying its newly discovered value function to each decision it makes in the future.

Prior work in the area has had some success, but according to DeepMind, LPG is the first algorithm to discover reinforcement learning rules from scratch and to generalize beyond training. The latter was particularly surprising because Atari games are so different from the simple worlds LPG trained in—that is, it had never seen anything like an Atari game.

Time to Hand Over the Reins? Not Just Yet
LPG is still behind advanced human-designed algorithms, the researchers said. But it outperformed a human-designed benchmark in training and even some Atari games, which suggests it isn’t strictly worse, just that it specializes in some environments.

This is where there’s room for improvement and more research.

The more environments LPG saw, the more it could successfully generalize. Intriguingly, the researchers speculate that with enough well-designed training environments, the approach might yield a general-purpose reinforcement learning algorithm.

At the least, though, they say further automation of algorithm discovery—that is, algorithms learning to learn—will accelerate the field. In the near term, it can help researchers more quickly develop hand-designed algorithms. Further out, as self-discovered algorithms like LPG improve, engineers may shift from manually developing the algorithms themselves to building the environments where they learn.

Deep learning long ago left Deep Blue in the dust at games. Perhaps algorithms learning to learn will be a winning strategy in the real world too.

Image credit: Mike Szczepanski / Unsplash Continue reading

Posted in Human Robots