Tag Archives: error
#435707 AI Agents Startle Researchers With ...
After 25 million games, the AI agents playing hide-and-seek with each other had mastered four basic game strategies. The researchers expected that part.
After a total of 380 million games, the AI players developed strategies that the researchers didn’t know were possible in the game environment—which the researchers had themselves created. That was the part that surprised the team at OpenAI, a research company based in San Francisco.
The AI players learned everything via a machine learning technique known as reinforcement learning. In this learning method, AI agents start out by taking random actions. Sometimes those random actions produce desired results, which earn them rewards. Via trial-and-error on a massive scale, they can learn sophisticated strategies.
In the context of games, this process can be abetted by having the AI play against another version of itself, ensuring that the opponents will be evenly matched. It also locks the AI into a process of one-upmanship, where any new strategy that emerges forces the opponent to search for a countermeasure. Over time, this “self-play” amounted to what the researchers call an “auto-curriculum.”
According to OpenAI researcher Igor Mordatch, this experiment shows that self-play “is enough for the agents to learn surprising behaviors on their own—it’s like children playing with each other.”
Reinforcement is a hot field of AI research right now. OpenAI’s researchers used the technique when they trained a team of bots to play the video game Dota 2, which squashed a world-champion human team last April. The Alphabet subsidiary DeepMind has used it to triumph in the ancient board game Go and the video game StarCraft.
Aniruddha Kembhavi, a researcher at the Allen Institute for Artificial Intelligence (AI2) in Seattle, says games such as hide-and-seek offer a good way for AI agents to learn “foundational skills.” He worked on a team that taught their AllenAI to play Pictionary with humans, viewing the gameplay as a way for the AI to work on common sense reasoning and communication. “We are, however, quite far away from being able to translate these preliminary findings in highly simplified environments into the real world,” says Kembhavi.
Illustration: OpenAI
AI agents construct a fort during a hide-and-seek game developed by OpenAI.
In OpenAI’s game of hide-and-seek, both the hiders and the seekers received a reward only if they won the game, leaving the AI players to develop their own strategies. Within a simple 3D environment containing walls, blocks, and ramps, the players first learned to run around and chase each other (strategy 1). The hiders next learned to move the blocks around to build forts (2), and then the seekers learned to move the ramps (3), enabling them to jump inside the forts. Then the hiders learned to move all the ramps into their forts before the seekers could use them (4).
The two strategies that surprised the researchers came next. First the seekers learned that they could jump onto a box and “surf” it over to a fort (5), allowing them to jump in—a maneuver that the researchers hadn’t realized was physically possible in the game environment. So as a final countermeasure, the hiders learned to lock all the boxes into place (6) so they weren’t available for use as surfboards.
Illustration: OpenAI
An AI agent uses a nearby box to surf its way into a competitor’s fort.
In this circumstance, having AI agents behave in an unexpected way wasn’t a problem: They found different paths to their rewards, but didn’t cause any trouble. However, you can imagine situations in which the outcome would be rather serious. Robots acting in the real world could do real damage. And then there’s Nick Bostrom’s famous example of a paper clip factory run by an AI, whose goal is to make as many paper clips as possible. As Bostrom told IEEE Spectrum back in 2014, the AI might realize that “human bodies consist of atoms, and those atoms could be used to make some very nice paper clips.”
Bowen Baker, another member of the OpenAI research team, notes that it’s hard to predict all the ways an AI agent will act inside an environment—even a simple one. “Building these environments is hard,” he says. “The agents will come up with these unexpected behaviors, which will be a safety problem down the road when you put them in more complex environments.”
AI researcher Katja Hofmann at Microsoft Research Cambridge, in England, has seen a lot of gameplay by AI agents: She started a competition that uses Minecraft as the playing field. She says the emergent behavior seen in this game, and in prior experiments by other researchers, shows that games can be a useful for studies of safe and responsible AI.
“I find demonstrations like this, in games and game-like settings, a great way to explore the capabilities and limitations of existing approaches in a safe environment,” says Hofmann. “Results like these will help us develop a better understanding on how to validate and debug reinforcement learning systems–a crucial step on the path towards real-world applications.”
Baker says there’s also a hopeful takeaway from the surprises in the hide-and-seek experiment. “If you put these agents into a rich enough environment they will find strategies that we never knew were possible,” he says. “Maybe they can solve problems that we can’t imagine solutions to.” Continue reading
#435605 All of the Winners in the DARPA ...
The first competitive event in the DARPA Subterranean Challenge concluded last week—hopefully you were able to follow along on the livestream, on Twitter, or with some of the articles that we’ve posted about the event. We’ll have plenty more to say about how things went for the SubT teams, but while they take a bit of a (well earned) rest, we can take a look at the winning teams as well as who won DARPA’s special superlative awards for the competition.
First Place: Team Explorer (25/40 artifacts found)
With their rugged, reliable robots featuring giant wheels and the ability to drop communications nodes, Team Explorer was in the lead from day 1, scoring in double digits on every single run.
Second Place: Team CoSTAR (11/40 artifacts found)
Team CoSTAR had one of the more diverse lineups of robots, and they switched up which robots they decided to send into the mine as they learned more about the course.
Third Place: Team CTU-CRAS (10/40 artifacts found)
While many teams came to SubT with DARPA funding, Team CTU-CRAS was self-funded, making them eligible for a special $200,000 Tunnel Circuit prize.
DARPA also awarded a bunch of “superlative awards” after SubT:
Most Accurate Artifact: Team Explorer
To score a point, teams had to submit the location of an artifact that was correct to within 5 meters of the artifact itself. However, DARPA was tracking the artifact locations with much higher precision—for example, the “zero” point on the backpack artifact was the center of the label on the front, which DARPA tracked to the millimeter. Team Explorer managed to return the location of a backpack with an error of just 0.18 meter, which is kind of amazing.
Down to the Wire: Team CSIRO Data61
With just an hour to find as many artifacts as possible, teams had to find the right balance between sending robots off to explore and bringing them back into communication range to download artifact locations. Team CSIRO Data61 cut their last point pretty close, sliding their final point in with a mere 22 seconds to spare.
Most Distinctive Robots: Team Robotika
Team Robotika had some of the quirkiest and most recognizable robots, which DARPA recognized with the “Most Distinctive” award. Robotika told us that part of the reason for that distinctiveness was practical—having a robot that was effectively in two parts meant that they could disassemble it so that it would fit in the baggage compartment of an airplane, very important for a team based in the Czech Republic.
Most Robots Per Person: Team Coordinated Robotics
Kevin Knoedler, who won NASA’s Space Robotics Challenge entirely by himself, brought his own personal swarm of drones to SubT. With a ratio of seven robots to one human, Kevin was almost certainly the hardest working single human at the challenge.
Fan Favorite: Team NCTU
Photo: Evan Ackerman/IEEE Spectrum
The Fan Favorite award went to the team that was most popular on Twitter (with the #SubTChallenge hashtag), and it may or may not be the case that I personally tweeted enough about Team NCTU’s blimp to win them this award. It’s also true that whenever we asked anyone on other teams what their favorite robot was (besides their own, of course), the blimp was overwhelmingly popular. So either way, the award is well deserved.
DARPA shared this little behind-the-scenes clip of the blimp in action (sort of), showing what happened to the poor thing when the mine ventilation system was turned on between runs and DARPA staff had to chase it down and rescue it:
The thing to keep in mind about the results of the Tunnel Circuit is that unlike past DARPA robotics challenges (like the DRC), they don’t necessarily indicate how things are going to go for the Urban or Cave circuits because of how different things are going to be. Explorer did a great job with a team of rugged wheeled vehicles, which turned out to be ideal for navigating through mines, but they’re likely going to need to change things up substantially for the rest of the challenges, where the terrain will be much more complex.
DARPA hasn’t provided any details on the location of the Urban Circuit yet; all we know is that it’ll be sometime in February 2020. This gives teams just six months to take all the lessons that they learned from the Tunnel Circuit and update their hardware, software, and strategies. What were those lessons, and what do teams plan to do differently next year? Check back next week, and we’ll tell you.
[ DARPA SubT ] Continue reading