Tag Archives: feedback

#439095 DARPA Prepares for the Subterranean ...

The DARPA Subterranean Challenge Final Event is scheduled to take place at the Louisville Mega Cavern in Louisville, Kentucky, from September 21 to 23. We’ve followed SubT teams as they’ve explored their way through abandoned mines, unfinished nuclear reactors, and a variety of caves, and now everything comes together in one final course where the winner of the Systems Track will take home the $2 million first prize.

It’s a fitting reward for teams that have been solving some of the hardest problems in robotics, but winning isn’t going to be easy, and we’ll talk with SubT Program Manager Tim Chung about what we have to look forward to.

Since we haven’t talked about SubT in a little while (what with the unfortunate covid-related cancellation of the Systems Track Cave Circuit), here’s a quick refresher of where we are: the teams have made it through the Tunnel Circuit, the Urban Circuit, and a virtual version of the Cave Circuit, and some of them have been testing in caves of their own. The Final Event will include all of these environments, and the teams of robots will have 60 minutes to autonomously map the course, locating artifacts to score points. Since I’m not sure where on Earth there’s an underground location that combines tunnels and caves with urban structures, DARPA is going to have to get creative, and the location in which they’ve chosen to do that is Louisville, Kentucky.

The Louisville Mega Cavern is a former limestone mine, most of which is under the Louisville Zoo. It’s not all that deep, mostly less than 30 meters under the surface, but it’s enormous: with 370,000 square meters of rooms and passages, the cavern currently hosts (among other things) a business park, a zipline course, and mountain bike trails, because why not. While DARPA is keeping pretty quiet on the details, I’m guessing that they’ll be taking over a chunk of the cavern and filling it with features representing as many of the environmental challenges as they can.

To learn more about how the SubT Final Event is going to go, we spoke with SubT Program Manager Tim Chung. But first, we talked about Tim’s perspective on the success of the Urban Circuit, and how teams have been managing without an in-person Cave Circuit.

IEEE Spectrum: How did the SubT Urban Circuit go?

Tim Chung: On a couple fronts, Urban Circuit was really exciting. We were in this unfinished nuclear power plant—I’d be surprised if any of the competitors had prior experience in such a facility, or anything like it. I think that was illuminating both from an experiential point of view for the competitors, but also from a technology point of view, too.

One thing that I thought was really interesting was that we, DARPA, didn't need to make the venue more challenging. The real world is really that hard. There are places that were just really heinous for these robots to have to navigate through in order to look in every nook and cranny for artifacts. There were corners and doorways and small corridors and all these kind of things that really forced the teams to have to work hard, and the feedback was, why did DARPA have to make it so hard? But we didn’t, and in fact there were places that for the safety of the robots and personnel, we had to ensure the robots couldn’t go.

It sounds like some teams thought this course was on the more difficult side—do you think you tuned it to just the right amount of DARPA-hard?

Our calibration worked quite well. We were able to tease out and help refine and better understand what technologies are both useful and critical and also those technologies that might not necessarily get you the leap ahead capability. So as an example, the Urban Circuit really emphasized verticality, where you have to be able to sense, understand, and maneuver in three dimensions. Being able to capitalize on their robot technologies to address that verticality really stratified the teams, and showed how critical those capabilities are.

We saw teams that brought a lot of those capabilities do very well, and teams that brought baseline capabilities do what they could on the single floor that they were able to operate on. And so I think we got the Goldilocks solution for Urban Circuit that combined both difficulty and ambition.

Photos: Evan Ackerman/IEEE Spectrum

Two SubT Teams embedded networking equipment in balls that they could throw onto the course.

One of the things that I found interesting was that two teams independently came up with throwable network nodes. What was DARPA’s reaction to this? Is any solution a good solution, or was it more like the teams were trying to game the system?

You mean, do we want teams to game the rules in any way so as to get a competitive advantage? I don't think that's what the teams were doing. I think they were operating not only within the bounds of the rules, which permitted such a thing as throwable sensors where you could stand at the line and see how far you could chuck these things—not only was that acceptable by the rules, but anticipated. Behind the scenes, we tried to do exactly what these teams are doing and think through different approaches, so we explicitly didn't forbid such things in our rules because we thought it's important to have as wide an aperture as possible.

With these comms nodes specifically, I think they’re pretty clever. They were in some cases hacked together with a variety of different sports paraphernalia to see what would provide the best cushioning. You know, a lot of that happens in the field, and what it captured was that sometimes you just need to be up at two in the morning and thinking about things in a slightly different way, and that's when some nuggets of innovation can arise, and we see this all the time with operators in the field as well. They might only have duct tape or Styrofoam or whatever the case may be and that's when they come up with different ways to solve these problems. I think from DARPA’s perspective, and certainly from my perspective, wherever innovation can strike, we want to try to encourage and inspire those opportunities. I thought it was great, and it’s all part of the challenge.

Is there anything you can tell us about what your original plan had been for the Cave Circuit?

I can say that we’ve had the opportunity to go through a number of these caves scattered all throughout the country, and engage with caving communities—cavers clubs, speleologists that conduct research, and then of course the cave rescue community. The single biggest takeaway
is that every cave, and there are tens of thousands of them in the US alone, every cave has its own personality, and a lot of that personality is quite hidden from humans, because we can’t explore or access all of the cave. This led us to a number of different caves that were intriguing from a DARPA perspective but also inspirational for our Cave Circuit Virtual Competition.

How do you feel like the tuning was for the Virtual Cave Circuit?

The Virtual Competition, as you well know, was exciting in the sense that we could basically combine eight worlds into one competition, whereas the systems track competition really didn’t give us that opportunity. Even if we were able have held the Cave Circuit Systems Competition in person, it would have been at one site, and it would have been challenging to represent the level of diversity that we could with the Virtual Competition. So I think from that perspective, it’s clearly an advantage in terms of calibration—diversity gets you the ability to aggregate results to capture those that excel across all worlds as well as those that do well in one world or some worlds and not the others. I think the calibration was great in the sense that we were able to see the gamut of performance. Those that did well, did quite well, and those that have room to grow showed where those opportunities are for them as well.

We had to find ways to capture that diversity and that representativeness, and I think one of the fun ways we did that was with the different cave world tiles that we were able to combine in a variety of different ways. We also made use of a real world data set that we were able to take from a laser scan. Across the board, we had a really great chance to illustrate why virtual testing and simulation still plays such a dominant role in robotics technology development, and why I think it will continue to play an increasing role for developing these types of autonomy solutions.

Photo: Team CSIRO Data 61

How can systems track teams learn from their testing in whatever cave is local to them and effectively apply that to whatever cave environment is part of the final considering what the diversity of caves is?

I think that hits the nail on the head for what we as technologists are trying to discover—what are the transferable generalizable insights and how does that inform our technology development? As roboticists we want to optimize our systems to perform well at the tasks that they were designed to do, and oftentimes that means specialization because we get increased performance at the expense of being a generalist robot. I think in the case of SubT, we want to have our cake and eat it too—we want robots that perform well and reliably, but we want them to do so not just in one environment, which is how we tend to think about robot performance, but we want them to operate well in many environments, many of which have yet to be faced.

And I think that's kind of the nuance here, that we want robot systems to be generalists for the sake of being able to handle the unknown, namely the real world, but still achieve a high level of performance and perhaps they do that to their combined use of different technologies or advances in autonomy or perception approaches or novel mechanisms or mobility, but somehow they're still able, at least in aggregate, to achieve high performance.

We know these teams eagerly await any type of clue that DARPA can provide like about the SubT environments. From the environment previews for Tunnel, Urban, and even Cave, the teams were pivoting around and thinking a little bit differently. The takeaway, however, was that they didn't go to a clean sheet design—their systems were flexible enough that they could incorporate some of those specialist trends while still maintaining the notion of a generalist framework.

Looking ahead to the SubT Final, what can you tell us about the Louisville Mega Cavern?

As always, I’ll keep you in suspense until we get you there, but I can say that from the beginning of the SubT Challenge we had always envisioned teams of robots that are able to address not only the uncertainty of what's right in front of them, but also the uncertainty of what comes next. So I think the teams will be advantaged by thinking through subdomain awareness, or domain awareness if you want to generalize it, whether that means tuning multi-purpose robots, or deploying different robots, or employing your team of robots differently. Knowing which subdomain you are in is likely to be helpful, because then you can take advantage of those unique lessons learned through all those previous experiences then capitalize on that.

As far as specifics, I think the Mega Cavern offers many of the features important to what it means to be underground, while giving DARPA a pretty blank canvas to realize our vision of the SubT Challenge.

The SubT Final will be different from the earlier circuits in that there’s just one 60-minute run, rather than two. This is going to make things a lot more stressful for teams who have experienced bad robot days—why do it this way?

The preliminary round has two 30-minute runs, and those two runs are very similar to how we have done it during the circuits, of a single run per configuration per course. Teams will have the opportunity to show that their systems can face the obstacles in the final course, and it's the sum of those scores much like we did during the circuits, to help mitigate some of the concerns that you mentioned of having one robot somehow ruin their chances at a prize.

The prize round does give DARPA as well as the community a chance to focus on the top six teams from the preliminary round, and allows us to understand how they came to be at the top of the pack while emphasizing their technological contributions. The prize round will be one and done, but all of these teams we anticipate will be putting their best robot forward and will show the world why they deserve to win the SubT Challenge.

We’ve always thought that when called upon these robots need to operate in really challenging environments, and in the context of real world operations, there is no second chance. I don't think it's actually that much of a departure from our interests and insistence on bringing reliable technologies to the field, and those teams that might have something break here and there, that's all part of the challenge, of being resilient. Many teams struggled with robots that were debilitated on the course, and they still found ways to succeed and overcome that in the field, so maybe the rules emphasize that desire for showing up and working on game day which is consistent, I think, with how we've always envisioned it. This isn’t to say that these systems have to work perfectly, they just have to work in a way such that the team is resilient enough to tackle anything that they face.

It’s not too late for teams to enter for both the Virtual Track and the Systems Track to compete in the SubT Final, right?

Yes, that's absolutely right. Qualifications are still open, we are eager to welcome new teams to join in along with our existing competitors. I think any dark horse competitors coming into the Finals may be able to bring something that we haven't seen before, and that would be really exciting. I think it'll really make for an incredibly vibrant and illuminating final event.

The final event qualification deadline for the Systems Competition is April 21, and the qualification deadline for the Virtual Competition is June 29. More details here. Continue reading

Posted in Human Robots

#439081 Classify This Robot-Woven Sneaker With ...

For athletes trying to run fast, the right shoe can be essential to achieving peak performance. For athletes trying to run fast as humanly possible, a runner’s shoe can also become a work of individually customized engineering.

This is why Adidas has married 3D printing with robotic automation in a mass-market footwear project it’s called Futurecraft.Strung, expected to be available for purchase as soon as later this year. Using a customized, 3D-printed sole, a Futurecraft.Strung manufacturing robot can place some 2,000 threads from up to 10 different sneaker yarns in one upper section of the shoe.

Skylar Tibbits, founder and co-director of the Self-Assembly Lab and associate professor in MIT's Department of Architecture, says that because of its small scale, footwear has been an area of focus for 3D printing and additive manufacturing, which involves adding material bit by bit.

“There are really interesting complex geometry problems,” he says. “It’s pretty well suited.”

Photo: Adidas

Beginning with a 3D-printed sole, Adidas robots weave together some 2000 threads from up to 10 different sneaker yarns to make one Futurecraft.Strung shoe—expected on the marketplace later this year or sometime in 2022.

Adidas began working on the Futurecraft.Strung project in 2016. Then two years later, Adidas Futurecraft, the company’s innovation incubator, began collaborating with digital design studio Kram/Weisshaar. In less than a year the team built the software and hardware for the upper part of the shoe, called Strung uppers.

“Most 3D printing in the footwear space has been focused on the midsole or outsole, like the bottom of the shoe,” Tibbits explains. But now, he says, Adidas is bringing robotics and a threaded design to the upper part of the shoe. The company bases its Futurecraft.Strung design on high-resolution scans of how runners’ feet move as they travel.

This more flexible design can benefit athletes in multiple sports, according to an Adidas blog post. It will be able to use motion capture of an athlete’s foot and feedback from the athlete to make the design specific to the athlete’s specific gait. Adidas customizes the weaving of the shoe’s “fabric” (really more like an elaborate woven string figure, a cat’s cradle to fit the foot) to achieve a close and comfortable fit, the company says.

What they call their “4D sole” consists of a design combining 3D printing with materials that can change their shape and properties over time. In fact, Tibbits coined the term 4D printing to describe this process in 2013. The company takes customized data from the Adidas Athlete Intelligent Engine to make the shoe, according to Kram/Weisshaar’s website.

Photo: Adidas

Closeup of the weaving process behind a Futurecraft.Strung shoe

“With Strung for the first time, we can program single threads in any direction, where each thread has a different property or strength,” Fionn Corcoran-Tadd, an innovation designer at Adidas’ Futurecraft lab, said in a company video. Each thread serves a purpose, the video noted. “This is like customized string art for your feet,” Tibbits says.

Although the robotics technology the company uses has been around for many years, what Adidas’s robotic weavers can achieve with thread is a matter of elaborate geometry. “It’s more just like a really elegant way to build up material combining robotics and the fibers and yarns into these intricate and complex patterns,” he says.

Robots can of course create patterns with more precision than if someone wound it by hand, as well as rapidly and reliably changing the yarn and color of the fabric pattern. Adidas says it can make a single upper in 45 minutes and a pair of sneakers in 1 hour and 30 minutes. It plans to reduce this time down to minutes in the months ahead, the company said.

An Adidas spokesperson says sneakers incorporating the Futurecraft.Strung uppers design are a prototype, but the company plans to bring a Strung shoe to market in late 2021 or 2022. However, Adidas Futurecraft sneakers are currently available with a 3D-printed midsole.
Adidas plans to continue gathering data from athletes to customize the uppers of sneakers. “We’re building up a library of knowledge and it will get more interesting as we aggregate data of testing and from different athletes and sports,” the Adidas Futurecraft team writes in a blog post. “The more we understand about how data can become design code, the more we can take that and apply it to new Strung textiles. It’s a continuous evolution.” Continue reading

Posted in Human Robots

#438982 Quantum Computing and Reinforcement ...

Deep reinforcement learning is having a superstar moment.

Powering smarter robots. Simulating human neural networks. Trouncing physicians at medical diagnoses and crushing humanity’s best gamers at Go and Atari. While far from achieving the flexible, quick thinking that comes naturally to humans, this powerful machine learning idea seems unstoppable as a harbinger of better thinking machines.

Except there’s a massive roadblock: they take forever to run. Because the concept behind these algorithms is based on trial and error, a reinforcement learning AI “agent” only learns after being rewarded for its correct decisions. For complex problems, the time it takes an AI agent to try and fail to learn a solution can quickly become untenable.

But what if you could try multiple solutions at once?

This week, an international collaboration led by Dr. Philip Walther at the University of Vienna took the “classic” concept of reinforcement learning and gave it a quantum spin. They designed a hybrid AI that relies on both quantum and run-of-the-mill classic computing, and showed that—thanks to quantum quirkiness—it could simultaneously screen a handful of different ways to solve a problem.

The result is a reinforcement learning AI that learned over 60 percent faster than its non-quantum-enabled peers. This is one of the first tests that shows adding quantum computing can speed up the actual learning process of an AI agent, the authors explained.

Although only challenged with a “toy problem” in the study, the hybrid AI, once scaled, could impact real-world problems such as building an efficient quantum internet. The setup “could readily be integrated within future large-scale quantum communication networks,” the authors wrote.

The Bottleneck
Learning from trial and error comes intuitively to our brains.

Say you’re trying to navigate a new convoluted campground without a map. The goal is to get from the communal bathroom back to your campsite. Dead ends and confusing loops abound. We tackle the problem by deciding to turn either left or right at every branch in the road. One will get us closer to the goal; the other leads to a half hour of walking in circles. Eventually, our brain chemistry rewards correct decisions, so we gradually learn the correct route. (If you’re wondering…yeah, true story.)

Reinforcement learning AI agents operate in a similar trial-and-error way. As a problem becomes more complex, the number—and time—of each trial also skyrockets.

“Even in a moderately realistic environment, it may simply take too long to rationally respond to a given situation,” explained study author Dr. Hans Briegel at the Universität Innsbruck in Austria, who previously led efforts to speed up AI decision-making using quantum mechanics. If there’s pressure that allows “only a certain time for a response, an agent may then be unable to cope with the situation and to learn at all,” he wrote.

Many attempts have tried speeding up reinforcement learning. Giving the AI agent a short-term “memory.” Tapping into neuromorphic computing, which better resembles the brain. In 2014, Briegel and colleagues showed that a “quantum brain” of sorts can help propel an AI agent’s decision-making process after learning. But speeding up the learning process itself has eluded our best attempts.

The Hybrid AI
The new study went straight for that previously untenable jugular.

The team’s key insight was to tap into the best of both worlds—quantum and classical computing. Rather than building an entire reinforcement learning system using quantum mechanics, they turned to a hybrid approach that could prove to be more practical. Here, the AI agent uses quantum weirdness as it’s trying out new approaches—the “trial” in trial and error. The system then passes the baton to a classical computer to give the AI its reward—or not—based on its performance.

At the heart of the quantum “trial” process is a quirk called superposition. Stay with me. Our computers are powered by electrons, which can represent only two states—0 or 1. Quantum mechanics is far weirder, in that photons (particles of light) can simultaneously be both 0 and 1, with a slightly different probability of “leaning towards” one or the other.

This noncommittal oddity is part of what makes quantum computing so powerful. Take our reinforcement learning example of navigating a new campsite. In our classic world, we—and our AI—need to decide between turning left or right at an intersection. In a quantum setup, however, the AI can (in a sense) turn left and right at the same time. So when searching for the correct path back to home base, the quantum system has a leg up in that it can simultaneously explore multiple routes, making it far faster than conventional, consecutive trail and error.

“As a consequence, an agent that can explore its environment in superposition will learn significantly faster than its classical counterpart,” said Briegel.

It’s not all theory. To test out their idea, the team turned to a programmable chip called a nanophotonic processor. Think of it as a CPU-like computer chip, but it processes particles of light—photons—rather than electricity. These light-powered chips have been a long time in the making. Back in 2017, for example, a team from MIT built a fully optical neural network into an optical chip to bolster deep learning.

The chips aren’t all that exotic. Nanophotonic processors act kind of like our eyeglasses, which can carry out complex calculations that transform light that passes through them. In the glasses case, they let people see better. For a light-based computer chip, it allows computation. Rather than using electrical cables, the chips use “wave guides” to shuttle photons and perform calculations based on their interactions.

The “error” or “reward” part of the new hardware comes from a classical computer. The nanophotonic processor is coupled to a traditional computer, where the latter provides the quantum circuit with feedback—that is, whether to reward a solution or not. This setup, the team explains, allows them to more objectively judge any speed-ups in learning in real time.

In this way, a hybrid reinforcement learning agent alternates between quantum and classical computing, trying out ideas in wibbly-wobbly “multiverse” land while obtaining feedback in grounded, classic physics “normality.”

A Quantum Boost
In simulations using 10,000 AI agents and actual experimental data from 165 trials, the hybrid approach, when challenged with a more complex problem, showed a clear leg up.

The key word is “complex.” The team found that if an AI agent has a high chance of figuring out the solution anyway—as for a simple problem—then classical computing works pretty well. The quantum advantage blossoms when the task becomes more complex or difficult, allowing quantum mechanics to fully flex its superposition muscles. For these problems, the hybrid AI was 63 percent faster at learning a solution compared to traditional reinforcement learning, decreasing its learning effort from 270 guesses to 100.

Now that scientists have shown a quantum boost for reinforcement learning speeds, the race for next-generation computing is even more lit. Photonics hardware required for long-range light-based communications is rapidly shrinking, while improving signal quality. The partial-quantum setup could “aid specifically in problems where frequent search is needed, for example, network routing problems” that’s prevalent for a smooth-running internet, the authors wrote. With a quantum boost, reinforcement learning may be able to tackle far more complex problems—those in the real world—than currently possible.

“We are just at the beginning of understanding the possibilities of quantum artificial intelligence,” said lead author Walther.

Image Credit: Oleg Gamulinskiy from Pixabay Continue reading

Posted in Human Robots

#438801 This AI Thrashes the Hardest Atari Games ...

Learning from rewards seems like the simplest thing. I make coffee, I sip coffee, I’m happy. My brain registers “brewing coffee” as an action that leads to a reward.

That’s the guiding insight behind deep reinforcement learning, a family of algorithms that famously smashed most of Atari’s gaming catalog and triumphed over humans in strategy games like Go. Here, an AI “agent” explores the game, trying out different actions and registering ones that let it win.

Except it’s not that simple. “Brewing coffee” isn’t one action; it’s a series of actions spanning several minutes, where you’re only rewarded at the very end. By just tasting the final product, how do you learn to fine-tune grind coarseness, water to coffee ratio, brewing temperature, and a gazillion other factors that result in the reward—tasty, perk-me-up coffee?

That’s the problem with “sparse rewards,” which are ironically very abundant in our messy, complex world. We don’t immediately get feedback from our actions—no video-game-style dings or points for just grinding coffee beans—yet somehow we’re able to learn and perform an entire sequence of arm and hand movements while half-asleep.

This week, researchers from UberAI and OpenAI teamed up to bestow this talent on AI.

The trick is to encourage AI agents to “return” to a previous step, one that’s promising for a winning solution. The agent then keeps a record of that state, reloads it, and branches out again to intentionally explore other solutions that may have been left behind on the first go-around. Video gamers are likely familiar with this idea: live, die, reload a saved point, try something else, repeat for a perfect run-through.

The new family of algorithms, appropriately dubbed “Go-Explore,” smashed notoriously difficult Atari games like Montezuma’s Revenge that were previously unsolvable by its AI predecessors, while trouncing human performance along the way.

It’s not just games and digital fun. In a computer simulation of a robotic arm, the team found that installing Go-Explore as its “brain” allowed it to solve a challenging series of actions when given very sparse rewards. Because the overarching idea is so simple, the authors say, it can be adapted and expanded to other real-world problems, such as drug design or language learning.

Growing Pains
How do you reward an algorithm?

Rewards are very hard to craft, the authors say. Take the problem of asking a robot to go to a fridge. A sparse reward will only give the robot “happy points” if it reaches its destination, which is similar to asking a baby, with no concept of space and danger, to crawl through a potential minefield of toys and other obstacles towards a fridge.

“In practice, reinforcement learning works very well, if you have very rich feedback, if you can tell, ‘hey, this move is good, that move is bad, this move is good, that move is bad,’” said study author Joost Huinzinga. However, in situations that offer very little feedback, “rewards can intentionally lead to a dead end. Randomly exploring the space just doesn’t cut it.”

The other extreme is providing denser rewards. In the same robot-to-fridge example, you could frequently reward the bot as it goes along its journey, essentially helping “map out” the exact recipe to success. But that’s troubling as well. Over-holding an AI’s hand could result in an extremely rigid robot that ignores new additions to its path—a pet, for example—leading to dangerous situations. It’s a deceptive AI solution that seems effective in a simple environment, but crashes in the real world.

What we need are AI agents that can tackle both problems, the team said.

Intelligent Exploration
The key is to return to the past.

For AI, motivation usually comes from “exploring new or unusual situations,” said Huizinga. It’s efficient, but comes with significant downsides. For one, the AI agent could prematurely stop going back to promising areas because it thinks it had already found a good solution. For another, it could simply forget a previous decision point because of the mechanics of how it probes the next step in a problem.

For a complex task, the end result is an AI that randomly stumbles around towards a solution while ignoring potentially better ones.

“Detaching from a place that was previously visited after collecting a reward doesn’t work in difficult games, because you might leave out important clues,” Huinzinga explained.

Go-Explore solves these problems with a simple principle: first return, then explore. In essence, the algorithm saves different approaches it previously tried and loads promising save points—once more likely to lead to victory—to explore further.

Digging a bit deeper, the AI stores screen caps from a game. It then analyzes saved points and groups images that look alike as a potential promising “save point” to return to. Rinse and repeat. The AI tries to maximize its final score in the game, and updates its save points when it achieves a new record score. Because Atari doesn’t usually allow people to revisit any random point, the team used an emulator, which is a kind of software that mimics the Atari system but with custom abilities such as saving and reloading at any time.

The trick worked like magic. When pitted against 55 Atari games in the OpenAI gym, now commonly used to benchmark reinforcement learning algorithms, Go-Explore knocked out state-of-the-art AI competitors over 85 percent of the time.

It also crushed games previously unbeatable by AI. Montezuma’s Revenge, for example, requires you to move Pedro, the blocky protagonist, through a labyrinth of underground temples while evading obstacles such as traps and enemies and gathering jewels. One bad jump could derail the path to the next level. It’s a perfect example of sparse rewards: you need a series of good actions to get to the reward—advancing onward.

Go-Explore didn’t just beat all levels of the game, a first for AI. It also scored higher than any previous record for reinforcement learning algorithms at lower levels while toppling the human world record.

Outside a gaming environment, Go-Explore was also able to boost the performance of a simulated robot arm. While it’s easy for humans to follow high-level guidance like “put the cup on this shelf in a cupboard,” robots often need explicit training—from grasping the cup to recognizing a cupboard, moving towards it while avoiding obstacles, and learning motions to not smash the cup when putting it down.

Here, similar to the real world, the digital robot arm was only rewarded when it placed the cup onto the correct shelf, out of four possible shelves. When pitted against another algorithm, Go-Explore quickly figured out the movements needed to place the cup, while its competitor struggled with even reliably picking the cup up.

Combining Forces
By itself, the “first return, then explore” idea behind Go-Explore is already powerful. The team thinks it can do even better.

One idea is to change the mechanics of save points. Rather than reloading saved states through the emulator, it’s possible to train a neural network to do the same, without needing to relaunch a saved state. It’s a potential way to make the AI even smarter, the team said, because it can “learn” to overcome one obstacle once, instead of solving the same problem again and again. The downside? It’s much more computationally intensive.

Another idea is to combine Go-Explore with an alternative form of learning, called “imitation learning.” Here, an AI observes human behavior and mimics it through a series of actions. Combined with Go-Explore, said study author Adrien Ecoffet, this could make more robust robots capable of handling all the complexity and messiness in the real world.

To the team, the implications go far beyond Go-Explore. The concept of “first return, then explore” seems to be especially powerful, suggesting “it may be a fundamental feature of learning in general.” The team said, “Harnessing these insights…may be essential…to create generally intelligent agents.”

Image Credit: Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune Continue reading

Posted in Human Robots

#438755 Soft Legged Robot Uses Pneumatic ...

Soft robots are inherently safe, highly resilient, and potentially very cheap, making them promising for a wide array of applications. But development on them has been a bit slow relative to other areas of robotics, at least partially because soft robots can’t directly benefit from the massive increase in computing power and sensor and actuator availability that we’ve seen over the last few decades. Instead, roboticists have had to get creative to find ways of achieving the functionality of conventional robotics components using soft materials and compatible power sources.

In the current issue of Science Robotics, researchers from UC San Diego demonstrate a soft walking robot with four legs that moves with a turtle-like gait controlled by a pneumatic circuit system made from tubes and valves. This air-powered nervous system can actuate multiple degrees of freedom in sequence from a single source of pressurized air, offering a huge reduction in complexity and bringing a very basic form of decision making onto the robot itself.

Generally, when people talk about soft robots, the robots are only mostly soft. There are some components that are very difficult to make soft, including pressure sources and the necessary electronics to direct that pressure between different soft actuators in a way that can be used for propulsion. What’s really cool about this robot is that researchers have managed to take a pressure source (either a single tether or an onboard CO2 cartridge) and direct it to four different legs, each with three different air chambers, using an oscillating three valve circuit made entirely of soft materials.

Photo: UCSD

The pneumatic circuit that powers and controls the soft quadruped.

The inspiration for this can be found in biology—natural organisms, including quadrupeds, use nervous system components called central pattern generators (CPGs) to prompt repetitive motions with limbs that are used for walking, flying, and swimming. This is obviously more complicated in some organisms than in others, and is typically mediated by sensory feedback, but the underlying structure of a CPG is basically just a repeating circuit that drives muscles in sequence to produce a stable, continuous gait. In this case, we’ve got pneumatic muscles being driven in opposing pairs, resulting in a diagonal couplet gait, where diagonally opposed limbs rotate forwards and backwards at the same time.

Diagram: Science Robotics

(J) Pneumatic logic circuit for rhythmic leg motion. A constant positive pressure source (P+) applied to three inverter components causes a high-pressure state to propagate around the circuit, with a delay at each inverter. While the input to one inverter is high, the attached actuator (i.e., A1, A2, or A3) is inflated. This sequence of high-pressure states causes each pair of legs of the robot to rotate in a direction determined by the pneumatic connections. (K) By reversing the sequence of activation of the pneumatic oscillator circuit, the attached actuators inflate in a new sequence (A1, A3, and A2), causing (L) the legs of the robot to rotate in reverse. (M) Schematic bottom view of the robot with the directions of leg motions indicated for forward walking.

Diagram: Science Robotics

Each of the valves acts as an inverter by switching the normally closed half (top) to open and the normally open half (bottom) to closed.

The circuit itself is made up of three bistable pneumatic valves connected by tubing that acts as a delay by providing resistance to the gas moving through it that can be adjusted by altering the tube’s length and inner diameter. Within the circuit, the movement of the pressurized gas acts as both a source of energy and as a signal, since wherever the pressure is in the circuit is where the legs are moving. The simplest circuit uses only three valves, and can keep the robot walking in one single direction, but more valves can add more complex leg control options. For example, the researchers were able to use seven valves to tune the phase offset of the gait, and even just one additional valve (albeit of a slightly more complex design) could enable reversal of the system, causing the robot to walk backwards in response to input from a soft sensor. And with another complex valve, a manual (tethered) controller could be used for omnidirectional movement.

This work has some similarities to the rover that JPL is developing to explore Venus—that rover isn’t a soft robot, of course, but it operates under similar constraints in that it can’t rely on conventional electronic systems for autonomous navigation or control. It turns out that there are plenty of clever ways to use mechanical (or in this case, pneumatic) intelligence to make robots with relatively complex autonomous behaviors, meaning that in the future, soft (or soft-ish) robots could find valuable roles in situations where using a non-compliant system is not a good option.

For more on why we should be so excited about soft robots and just how soft a soft robot needs to be, we spoke with Michael Tolley, who runs the Bioinspired Robotics and Design Lab at UCSD, and Dylan Drotman, the paper’s first author.

IEEE Spectrum: What can soft robots do for us that more rigid robotic designs can’t?

Michael Tolley: At the very highest level, one of the fundamental assumptions of robotics is that you have rigid bodies connected at joints, and all your motion happens at these joints. That's a really nice approach because it makes the math easy, frankly, and it simplifies control. But when you look around us in nature, even though animals do have bones and joints, the way we interact with the world is much more complicated than that simple story. I’m interested in where we can take advantage of material properties in robotics. If you look at robots that have to operate in very unknown environments, I think you can build in some of the intelligence for how to deal with those environments into the body of the robot itself. And that’s the category this work really falls under—it's about navigating the world.

Dylan Drotman: Walking through confined spaces is a good example. With the rigid legged robot, you would have to completely change the way that the legs move to walk through a confined space, while if you have flexible legs, like the robot in our paper, you can use relatively simple control strategies to squeeze through an area you wouldn’t be able to get through with a rigid system.

How smart can a soft robot get?

Drotman: Right now we have a sensor on the front that's connected through a fluidic transmission to a bistable valve that causes the robot to reverse. We could add other sensors around the robot to allow it to change direction whenever it runs into an obstacle to effectively make an electronics-free version of a Roomba.

Tolley: Stepping back a little bit from that, one could make an argument that we’re using basic memory elements to generate very basic signals. There’s nothing in principle that would stop someone from making a pneumatic computer—it’s just very complicated to make something that complex. I think you could build on this and do more intelligent decision making, but using this specific design and the components we’re using, it’s likely to be things that are more direct responses to the environment.

How well would robots like these scale down?

Drotman: At the moment we’re manufacturing these components by hand, so the idea would be to make something more like a printed circuit board instead, and looking at how the channel sizes and the valve design would affect the actuation properties. We’ll also be coming up with new circuits, and different designs for the circuits themselves.

Tolley: Down to centimeter or millimeter scale, I don’t think you’d have fundamental fluid flow problems. I think you’re going to be limited more by system design constraints. You’ll have to be able to locomote while carrying around your pressure source, and possibly some other components that are also still rigid. When you start to talk about really small scales, though, it's not as clear to me that you really need an intrinsically soft robot. If you think about insects, their structural geometry can make them behave like they’re soft, but they’re not intrinsically soft.

Should we be thinking about soft robots and compliant robots in the same way, or are they fundamentally different?

Tolley: There’s certainly a connection between the two. You could have a compliant robot that behaves in a very similar way to an intrinsically soft robot, or a robot made of intrinsically soft materials. At that point, it comes down to design and manufacturing and practical limitations on what you can make. I think when you get down to small scales, the two sort of get connected.

There was some interesting work several years ago on using explosions to power soft robots. Is that still a thing?

Tolley: One of the opportunities with soft robots is that with material compliance, you have the potential to store energy. I think there’s exciting potential there for rapid motion with a soft body. Combustion is one way of doing that with power coming from a chemical source all at once, but you could also use a relatively weak muscle that over time stores up energy in a soft body and then releases it.

Is it realistic to expect complete softness from soft robots, or will they likely always have rigid components because they have to store or generate and move pressurized gas somehow?

Tolley: If you look in nature, you do have soft pumps like the heart, but although it’s soft, it’s still relatively stiff. Like, if you grab a heart, it’s not totally squishy. I haven’t done it, but I’d imagine. If you have a container that you’re pressurizing, it has to be stiff enough to not just blow up like a balloon. Certainly pneumatics or hydraulics are not the only way to go for soft actuators; there has been some really nice work on smart muscles and smart materials like hydraulic electrostatic (HASEL) actuators. They seem promising, but all of these actuators have challenges. We’ve chosen to stick with pressurized pneumatics in the near term; longer term, I think you’ll start to see more of these smart material actuators become more practical.

Personally, I don’t have any problem with soft robots having some rigid components. Most animals on land have some rigid components, but they can still take advantage of being soft, so it’s probably going to be a combination. But I do also like the vision of making an entirely soft, squishy thing. Continue reading

Posted in Human Robots