Tag Archives: bot

#439831 Tesla’s Tesla Bot

Here’s Elon Musk hyping “Tesla Bot”, said to be a general purpose, bi-pedal, humanoid robot that will perform “unsafe, repetitive or boring” tasks.

Posted in Human Robots

#439646 Elon Musk Has No Idea What He’s Doing ...

Yesterday, at the end of
Tesla's AI Day, Elon Musk introduced a concept for “Tesla Bot,” a 125 lb, 5'8″ tall electromechanically actuated autonomous bipedal “general purpose” humanoid robot. By “concept,” I mean that Musk showed some illustrations and talked about his vision for the robot, which struck me as, let's say, somewhat naïve. Based on the content of a six-minute long presentation, it seems as though Musk believes that someone (Tesla, suddenly?) should just go make an autonomous humanoid robot already—like, the technology exists, so why not do it?

To be fair, Musk did go out and do more or less much exactly that for electric cars and reusable rockets. But humanoid robots are much different, and much more complicated. With rockets, well, we already had rockets. And with electric cars, we already had cars, batteries, sensors, and the
DARPA competitions to build on. I don't say this to minimize what Musk has done with SpaceX and Tesla, but rather to emphasize that humanoid robotics is a very different challenge.

Unlike rockets or cars, humanoid robots aren't an existing technology that needs an ambitious vision plus a team of clever people plus sustained financial investment. With humanoid robotics, there are many more problems to solve, the problems are harder, and we're much farther away from practical solutions. Lots of very smart people have been actively working on these things for decades, and there's still a laundry list of fundamental breakthroughs in hardware and especially software that are probably necessary to make Musk's vision happen.

Are these fundamental breakthroughs impossible for Tesla? Not impossible, no. But from listening to what Elon Musk said today, I don't think he has any idea what getting humanoid robots to do useful stuff actually involves. Let's talk about why.

Watch the presentation if you haven't yet, and then let's go through what Musk talks about.

Okay, here we go!
“Our cars are semi-sentient robots on wheels.”

I don't know what that even means. Semi-sentient? Sure, whatever, a cockroach is semi-sentient I guess, although the implicit suggestion that these robots are therefore somehow part of the way towards actual sentience is ridiculous. Besides, autonomous cars live in a highly constrained action space within a semi-constrained environment, and Tesla cars in particular have plenty of well-known issues with their autonomy.

“With the full self-driving computer, essentially the inference engine on the car (which we'll keep evolving, obviously) and Dojo, and all the neural nets recognizing the world, understanding how to navigate through the world, it kind of makes sense to put that onto a humanoid form.”
Yes, because that's totally how it works. Look, the neural networks in a Tesla (the car) are trained to recognize the world from a car's perspective. They look for things that cars need to understand, and they have absolutely no idea about anything else, which can cause all kinds of problems for them. Same with navigation: autonomous cars navigate through a world that consists of roads and road-related stuff. You can't just “put that” onto a humanoid robot and have any sort of expectation that it'll be useful, unless all you want it to do is walk down the middle of the street and obey traffic lights. Also, the suggestion here seems to be that “AI for general purpose robotics” can be solved by just throwing enough computing power at it, which as far as I'm aware is not even remotely how that works, especially with physical robots.

“[Tesla] is also quite good at sensors and batteries and actuators. So, we think we'll probably have a prototype sometime next year.”
It's plausible that by spending enough money, Tesla could construct a humanoid robot with batteries, actuators, and computers in a similar design to what Musk has described. Can Tesla do it by sometime next year like Musk says they can? Sure, why not. But the hard part is not building a robot, it's getting that robot to do useful stuff, and I think Musk is way out of his depth here. People without a lot of experience in robotics often seem to think that once you've built the robot, you've solved most of the problem, so they focus on mechanical things like actuators and what it'll look like and how much it can lift and whatever. But that's backwards, and the harder problems come after you've got a robot that's mechanically functional.

What the heck does “human-level hands” mean?

“It's intended to navigate through a world built for humans…”
This is one of the few good reasons to make a humanoid robot, and I'm not even sure that by itself, it's a good enough reason to do so. But in any case, the word “intended” is doing a lot of heavy lifting here. The implications of a world built for humans includes an almost infinite variety of different environments, full of all kinds of robot-unfriendly things, not to mention the safety aspects of an inherently unstable 125 lb robot.

I feel like I have a pretty good handle on the current state of the art in humanoid robotics, and if you visit this site regularly, you probably do too. Companies like Boston Dynamics and Agility Robotics have been working on robots that can navigate through human environments for literally decades, and it's still a super hard problem. I don't know why Musk thinks that he can suddenly do better.

For anyone wondering why I Tweeted “Elon Musk has no idea what getting humanoid robots to do useful stuff actually… https://t.co/5uei4LIpyF
— Evan Ackerman (@BotJunkie)
1629446537.0

The “human-level hands” that you see annotated in Musk's presentation above are a good example of why I think Musk doesn't really grasp how much work this robot is going to be. What does “human-level hands” even mean? If we're talking about five-fingered hands with human-equivalent sensing and dexterity, those do exist (sort of), although they're generally fragile and expensive. It would take an enormous engineering effort to make hands like that into something practical just from a hardware perspective, which is why nobody has bothered—most robots use much simpler, much more robust two or three finger grippers instead. Could Tesla solve this problem? I have no doubt that they could, given enough time and money. But they've also got every other part of the robot to deal with. And even if you can make the hardware robust enough to be useful, you've still got to come up with all of the software to make it work. Again, we're talking about huge problems within huge problems at a scale that it seems like Musk hasn't considered.

“…And eliminate dangerous, repetitive, and boring tasks.”

Great. This is what robots should be doing. But as Musk himself knows, it's easy to say that robots will eliminate dangerous, repetitive, and boring tasks, and much more difficult to actually get them to do it—not because the robots aren't capable, but because humans are far more capable. We set a very high bar for performance and versatility in ways that aren't always obvious, and even when they are obvious, robots may not be able to replicate them effectively.

[Musk makes jokes about robots going rogue.]

Uh, okay.

“Things I think that are hard about having a really useful humanoid robot are, can it navigate through the world without being explicitly trained, without explicit line-by-line instructions? Can you talk to it and say, 'please pick up that bolt and attach it to the car with that wrench?' 'Please go to the store and get me the following groceries?' That kind of thing.”
Robots can already navigate through the world without “explicit line-by-line instructions” when they have a pretty good idea of what “the world” consists of. If the world is “roads” or “my apartment” or “this specific shopping mall,” that's probably a 95%+ solved problem, keeping in mind that the last 5% gets ridiculously complicated. But if you start talking about “my apartment plus any nearby grocery store along with everything between my apartment and that grocery store,” that's a whole lot of not necessarily well structured or predictable space.

And part of that challenge is just physically moving through those spaces. Are there stairs? Heavy doors? Crosswalks? Lots of people? These are complicated enough environments for those small wheeled sidewalk delivery robots with humans in the loop, never mind a (hypothetical) fully autonomous bipedal humanoid that is also carrying objects. And going into a crowded grocery store and picking things up off of shelves and putting them into a basket or a cart that then has to be pushed safely? These are cutting edge unsolved robotics problems, and we've barely seen this kind of thing happen with industrial arms on wheeled bases, even in a research context. Heck, even “pick up that bolt” is not an easy thing for a robot to do right now, if it wasn't specifically designed for that task.

“This I think will be quite profound, because what is the economy—at the foundation, it is labor. So, what happens when there is no shortage of labor? This is why I think long term there will need to be universal basic income. But not right now, because this robot doesn't work.”

Economics is well beyond my area of expertise, but as Musk says, until the robot works, this is all moot.

“AI for General Purpose Robotics.” Sure.

It's possible, even likely, that Tesla will build some sort of Tesla Bot by sometime next year, as Musk says. I think that it won't look all that much like the concept images in this presentation. I think that it'll be able to stand up, and perhaps walk. Maybe withstand a shove or two and do some basic object recognition and grasping. And I think after that, progress will be slow. I don't think Tesla will catch up with Boston Dynamics or Agility Robotics. Maybe they'll end up with the equivalent of Asimo, with a PR tool that can do impressive demos but is ultimately not all that useful.

Part of what bothers me so much about all this is how Musk's vision for the Tesla Bot implies that he's going to just casually leapfrog all of the roboticists who have been working towards useful humanoids for decades. Musk assumes that he will be able to wander into humanoid robot development and do what nobody else has yet been able to do: build a useful general purpose humanoid. I doubt Musk intended it this way, but I feel like he's backhandedly suggesting that the challenges with humanoids aren't actually that hard, and that if other people were cleverer, or worked harder, or threw more money at the problem, then we would have had general purpose humanoids already.
I think he's wrong. But if Tesla ends up investing time and money into solving some really hard robotics problems, perhaps they'll have some success that will help move the entire field forward. And I'd call that a win. Continue reading

Posted in Human Robots

#439233 Robotics group announces an ...

Agility Robotics, a branch of Oregon State University, has just revealed a new bipedal robot called Cassie. Unlike the many four-legged and four-wheeled robots currently in existence, Cassie will walk much more like a human. This kind of movement allows for far easier travel across diverse types of terrain while delivering packages or even contributing to disaster relief efforts. Continue reading

Posted in Human Robots

#438801 This AI Thrashes the Hardest Atari Games ...

Learning from rewards seems like the simplest thing. I make coffee, I sip coffee, I’m happy. My brain registers “brewing coffee” as an action that leads to a reward.

That’s the guiding insight behind deep reinforcement learning, a family of algorithms that famously smashed most of Atari’s gaming catalog and triumphed over humans in strategy games like Go. Here, an AI “agent” explores the game, trying out different actions and registering ones that let it win.

Except it’s not that simple. “Brewing coffee” isn’t one action; it’s a series of actions spanning several minutes, where you’re only rewarded at the very end. By just tasting the final product, how do you learn to fine-tune grind coarseness, water to coffee ratio, brewing temperature, and a gazillion other factors that result in the reward—tasty, perk-me-up coffee?

That’s the problem with “sparse rewards,” which are ironically very abundant in our messy, complex world. We don’t immediately get feedback from our actions—no video-game-style dings or points for just grinding coffee beans—yet somehow we’re able to learn and perform an entire sequence of arm and hand movements while half-asleep.

This week, researchers from UberAI and OpenAI teamed up to bestow this talent on AI.

The trick is to encourage AI agents to “return” to a previous step, one that’s promising for a winning solution. The agent then keeps a record of that state, reloads it, and branches out again to intentionally explore other solutions that may have been left behind on the first go-around. Video gamers are likely familiar with this idea: live, die, reload a saved point, try something else, repeat for a perfect run-through.

The new family of algorithms, appropriately dubbed “Go-Explore,” smashed notoriously difficult Atari games like Montezuma’s Revenge that were previously unsolvable by its AI predecessors, while trouncing human performance along the way.

It’s not just games and digital fun. In a computer simulation of a robotic arm, the team found that installing Go-Explore as its “brain” allowed it to solve a challenging series of actions when given very sparse rewards. Because the overarching idea is so simple, the authors say, it can be adapted and expanded to other real-world problems, such as drug design or language learning.

Growing Pains
How do you reward an algorithm?

Rewards are very hard to craft, the authors say. Take the problem of asking a robot to go to a fridge. A sparse reward will only give the robot “happy points” if it reaches its destination, which is similar to asking a baby, with no concept of space and danger, to crawl through a potential minefield of toys and other obstacles towards a fridge.

“In practice, reinforcement learning works very well, if you have very rich feedback, if you can tell, ‘hey, this move is good, that move is bad, this move is good, that move is bad,’” said study author Joost Huinzinga. However, in situations that offer very little feedback, “rewards can intentionally lead to a dead end. Randomly exploring the space just doesn’t cut it.”

The other extreme is providing denser rewards. In the same robot-to-fridge example, you could frequently reward the bot as it goes along its journey, essentially helping “map out” the exact recipe to success. But that’s troubling as well. Over-holding an AI’s hand could result in an extremely rigid robot that ignores new additions to its path—a pet, for example—leading to dangerous situations. It’s a deceptive AI solution that seems effective in a simple environment, but crashes in the real world.

What we need are AI agents that can tackle both problems, the team said.

Intelligent Exploration
The key is to return to the past.

For AI, motivation usually comes from “exploring new or unusual situations,” said Huizinga. It’s efficient, but comes with significant downsides. For one, the AI agent could prematurely stop going back to promising areas because it thinks it had already found a good solution. For another, it could simply forget a previous decision point because of the mechanics of how it probes the next step in a problem.

For a complex task, the end result is an AI that randomly stumbles around towards a solution while ignoring potentially better ones.

“Detaching from a place that was previously visited after collecting a reward doesn’t work in difficult games, because you might leave out important clues,” Huinzinga explained.

Go-Explore solves these problems with a simple principle: first return, then explore. In essence, the algorithm saves different approaches it previously tried and loads promising save points—once more likely to lead to victory—to explore further.

Digging a bit deeper, the AI stores screen caps from a game. It then analyzes saved points and groups images that look alike as a potential promising “save point” to return to. Rinse and repeat. The AI tries to maximize its final score in the game, and updates its save points when it achieves a new record score. Because Atari doesn’t usually allow people to revisit any random point, the team used an emulator, which is a kind of software that mimics the Atari system but with custom abilities such as saving and reloading at any time.

The trick worked like magic. When pitted against 55 Atari games in the OpenAI gym, now commonly used to benchmark reinforcement learning algorithms, Go-Explore knocked out state-of-the-art AI competitors over 85 percent of the time.

It also crushed games previously unbeatable by AI. Montezuma’s Revenge, for example, requires you to move Pedro, the blocky protagonist, through a labyrinth of underground temples while evading obstacles such as traps and enemies and gathering jewels. One bad jump could derail the path to the next level. It’s a perfect example of sparse rewards: you need a series of good actions to get to the reward—advancing onward.

Go-Explore didn’t just beat all levels of the game, a first for AI. It also scored higher than any previous record for reinforcement learning algorithms at lower levels while toppling the human world record.

Outside a gaming environment, Go-Explore was also able to boost the performance of a simulated robot arm. While it’s easy for humans to follow high-level guidance like “put the cup on this shelf in a cupboard,” robots often need explicit training—from grasping the cup to recognizing a cupboard, moving towards it while avoiding obstacles, and learning motions to not smash the cup when putting it down.

Here, similar to the real world, the digital robot arm was only rewarded when it placed the cup onto the correct shelf, out of four possible shelves. When pitted against another algorithm, Go-Explore quickly figured out the movements needed to place the cup, while its competitor struggled with even reliably picking the cup up.

Combining Forces
By itself, the “first return, then explore” idea behind Go-Explore is already powerful. The team thinks it can do even better.

One idea is to change the mechanics of save points. Rather than reloading saved states through the emulator, it’s possible to train a neural network to do the same, without needing to relaunch a saved state. It’s a potential way to make the AI even smarter, the team said, because it can “learn” to overcome one obstacle once, instead of solving the same problem again and again. The downside? It’s much more computationally intensive.

Another idea is to combine Go-Explore with an alternative form of learning, called “imitation learning.” Here, an AI observes human behavior and mimics it through a series of actions. Combined with Go-Explore, said study author Adrien Ecoffet, this could make more robust robots capable of handling all the complexity and messiness in the real world.

To the team, the implications go far beyond Go-Explore. The concept of “first return, then explore” seems to be especially powerful, suggesting “it may be a fundamental feature of learning in general.” The team said, “Harnessing these insights…may be essential…to create generally intelligent agents.”

Image Credit: Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune Continue reading

Posted in Human Robots

#438798 This AI Thrashes the Hardest Atari Games ...

Learning from rewards seems like the simplest thing. I make coffee, I sip coffee, I’m happy. My brain registers “brewing coffee” as an action that leads to a reward.

That’s the guiding insight behind deep reinforcement learning, a family of algorithms that famously smashed most of Atari’s gaming catalog and triumphed over humans in strategy games like Go. Here, an AI “agent” explores the game, trying out different actions and registering ones that let it win.

Except it’s not that simple. “Brewing coffee” isn’t one action; it’s a series of actions spanning several minutes, where you’re only rewarded at the very end. By just tasting the final product, how do you learn to fine-tune grind coarseness, water to coffee ratio, brewing temperature, and a gazillion other factors that result in the reward—tasty, perk-me-up coffee?

That’s the problem with “sparse rewards,” which are ironically very abundant in our messy, complex world. We don’t immediately get feedback from our actions—no video-game-style dings or points for just grinding coffee beans—yet somehow we’re able to learn and perform an entire sequence of arm and hand movements while half-asleep.

This week, researchers from UberAI and OpenAI teamed up to bestow this talent on AI.

The trick is to encourage AI agents to “return” to a previous step, one that’s promising for a winning solution. The agent then keeps a record of that state, reloads it, and branches out again to intentionally explore other solutions that may have been left behind on the first go-around. Video gamers are likely familiar with this idea: live, die, reload a saved point, try something else, repeat for a perfect run-through.

The new family of algorithms, appropriately dubbed “Go-Explore,” smashed notoriously difficult Atari games like Montezuma’s Revenge that were previously unsolvable by its AI predecessors, while trouncing human performance along the way.

It’s not just games and digital fun. In a computer simulation of a robotic arm, the team found that installing Go-Explore as its “brain” allowed it to solve a challenging series of actions when given very sparse rewards. Because the overarching idea is so simple, the authors say, it can be adapted and expanded to other real-world problems, such as drug design or language learning.

Growing Pains
How do you reward an algorithm?

Rewards are very hard to craft, the authors say. Take the problem of asking a robot to go to a fridge. A sparse reward will only give the robot “happy points” if it reaches its destination, which is similar to asking a baby, with no concept of space and danger, to crawl through a potential minefield of toys and other obstacles towards a fridge.

“In practice, reinforcement learning works very well, if you have very rich feedback, if you can tell, ‘hey, this move is good, that move is bad, this move is good, that move is bad,’” said study author Joost Huinzinga. However, in situations that offer very little feedback, “rewards can intentionally lead to a dead end. Randomly exploring the space just doesn’t cut it.”

The other extreme is providing denser rewards. In the same robot-to-fridge example, you could frequently reward the bot as it goes along its journey, essentially helping “map out” the exact recipe to success. But that’s troubling as well. Over-holding an AI’s hand could result in an extremely rigid robot that ignores new additions to its path—a pet, for example—leading to dangerous situations. It’s a deceptive AI solution that seems effective in a simple environment, but crashes in the real world.

What we need are AI agents that can tackle both problems, the team said.

Intelligent Exploration
The key is to return to the past.

For AI, motivation usually comes from “exploring new or unusual situations,” said Huizinga. It’s efficient, but comes with significant downsides. For one, the AI agent could prematurely stop going back to promising areas because it thinks it had already found a good solution. For another, it could simply forget a previous decision point because of the mechanics of how it probes the next step in a problem.

For a complex task, the end result is an AI that randomly stumbles around towards a solution while ignoring potentially better ones.

“Detaching from a place that was previously visited after collecting a reward doesn’t work in difficult games, because you might leave out important clues,” Huinzinga explained.

Go-Explore solves these problems with a simple principle: first return, then explore. In essence, the algorithm saves different approaches it previously tried and loads promising save points—once more likely to lead to victory—to explore further.

Digging a bit deeper, the AI stores screen caps from a game. It then analyzes saved points and groups images that look alike as a potential promising “save point” to return to. Rinse and repeat. The AI tries to maximize its final score in the game, and updates its save points when it achieves a new record score. Because Atari doesn’t usually allow people to revisit any random point, the team used an emulator, which is a kind of software that mimics the Atari system but with custom abilities such as saving and reloading at any time.

The trick worked like magic. When pitted against 55 Atari games in the OpenAI gym, now commonly used to benchmark reinforcement learning algorithms, Go-Explore knocked out state-of-the-art AI competitors over 85 percent of the time.

It also crushed games previously unbeatable by AI. Montezuma’s Revenge, for example, requires you to move Pedro, the blocky protagonist, through a labyrinth of underground temples while evading obstacles such as traps and enemies and gathering jewels. One bad jump could derail the path to the next level. It’s a perfect example of sparse rewards: you need a series of good actions to get to the reward—advancing onward.

Go-Explore didn’t just beat all levels of the game, a first for AI. It also scored higher than any previous record for reinforcement learning algorithms at lower levels while toppling the human world record.

Outside a gaming environment, Go-Explore was also able to boost the performance of a simulated robot arm. While it’s easy for humans to follow high-level guidance like “put the cup on this shelf in a cupboard,” robots often need explicit training—from grasping the cup to recognizing a cupboard, moving towards it while avoiding obstacles, and learning motions to not smash the cup when putting it down.

Here, similar to the real world, the digital robot arm was only rewarded when it placed the cup onto the correct shelf, out of four possible shelves. When pitted against another algorithm, Go-Explore quickly figured out the movements needed to place the cup, while its competitor struggled with even reliably picking the cup up.

Combining Forces
By itself, the “first return, then explore” idea behind Go-Explore is already powerful. The team thinks it can do even better.

One idea is to change the mechanics of save points. Rather than reloading saved states through the emulator, it’s possible to train a neural network to do the same, without needing to relaunch a saved state. It’s a potential way to make the AI even smarter, the team said, because it can “learn” to overcome one obstacle once, instead of solving the same problem again and again. The downside? It’s much more computationally intensive.

Another idea is to combine Go-Explore with an alternative form of learning, called “imitation learning.” Here, an AI observes human behavior and mimics it through a series of actions. Combined with Go-Explore, said study author Adrien Ecoffet, this could make more robust robots capable of handling all the complexity and messiness in the real world.

To the team, the implications go far beyond Go-Explore. The concept of “first return, then explore” seems to be especially powerful, suggesting “it may be a fundamental feature of learning in general.” The team said, “Harnessing these insights…may be essential…to create generally intelligent agents.”

Image Credit: Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune Continue reading

Posted in Human Robots