Tag Archives: Handle

#435681 Video Friday: This NASA Robot Uses ...

Video Friday is your weekly selection of awesome robotics videos, collected by your Automaton bloggers. We’ll also be posting a weekly calendar of upcoming robotics events for the next few months; here’s what we have so far (send us your events!):

ICRES 2019 – July 29-30, 2019 – London, U.K.
DARPA SubT Tunnel Circuit – August 15-22, 2019 – Pittsburgh, Pa., USA
IEEE Africon 2019 – September 25-27, 2019 – Accra, Ghana
ISRR 2019 – October 6-10, 2019 – Hanoi, Vietnam
Let us know if you have suggestions for next week, and enjoy today’s videos.

Robots can land on the Moon and drive on Mars, but what about the places they can’t reach? Designed by engineers as NASA’s Jet Propulsion Laboratory in Pasadena, California, a four-limbed robot named LEMUR (Limbed Excursion Mechanical Utility Robot) can scale rock walls, gripping with hundreds of tiny fishhooks in each of its 16 fingers and using artificial intelligence to find its way around obstacles. In its last field test in Death Valley, California, in early 2019, LEMUR chose a route up a cliff, scanning the rock for ancient fossils from the sea that once filled the area.

The LEMUR project has since concluded, but it helped lead to a new generation of walking, climbing and crawling robots. In future missions to Mars or icy moons, robots with AI and climbing technology derived from LEMUR could discover similar signs of life. Those robots are being developed now, honing technology that may one day be part of future missions to distant worlds.

[ NASA ]

This video demonstrates the autonomous footstep planning developed by IHMC. Robots in this video are the Atlas humanoid robot (DRC version) and the NASA Valkyrie. The operator specifies a goal location in the world, which is modeled as planar regions using the robot’s perception sensors. The planner then automatically computes the necessary steps to reach the goal using a Weighted A* algorithm. The algorithm does not reject footholds that have a certain amount of support, but instead modifies them after the plan is found to try and increase that support area.

Currently, narrow terrain has a success rate of about 50%, rough terrain is about 90%, whereas flat ground is near 100%. We plan on increasing planner speed and the ability to plan through mazes and to unseen goals by including a body-path planner as the first step. Control, Perception, and Planning algorithms by IHMC Robotics.

[ IHMC ]

I’ve never really been able to get into watching people play poker, but throw an AI from CMU and Facebook into a game of no-limit Texas hold’em with five humans, and I’m there.

[ Facebook ]

In this video, Cassie Blue is navigating autonomously. Right now, her world is very small, the Wavefield at the University of Michigan, where she is told to turn left at intersections. You’re right, that is not a lot of independence, but it’s a first step away from a human and an RC controller!

Using a RealSense RGBD Camera, an IMU, and our version of an InEKF with contact factors, Cassie Blue is building a 3D semantic map in real time that identifies sidewalks, grass, poles, bicycles, and buildings. From the semantic map, occupancy and cost maps are built with the sidewalk identified as walk-able area and everything else considered as an obstacle. A planner then sets a goal to stay approximately 50 cm to the right of the sidewalk’s left edge and plans a path around obstacles and corners using D*. The path is translated into way-points that are achieved via Cassie Blue’s gait controller.

[ University of Michigan ]

Thanks Jesse!

Dave from HEBI Robotics wrote in to share some new actuators that are designed to get all kinds of dirty: “The R-Series takes HEBI’s X-Series to the next level, providing a sealed robotics solution for rugged, industrial applications and laying the groundwork for industrial users to address challenges that are not well met by traditional robotics. To prove it, we shot some video right in the Allegheny River here in Pittsburgh. Not a bad way to spend an afternoon :-)”

The R-Series Actuator is a full-featured robotic component as opposed to a simple servo motor. The output rotates continuously, requires no calibration or homing on boot-up, and contains a thru-bore for easy daisy-chaining of wiring. Modular in nature, R-Series Actuators can be used in everything from wheeled robots to collaborative robotic arms. They are sealed to IP67 and designed with a lightweight form factor for challenging field applications, and they’re packed with sensors that enable simultaneous control of position, velocity, and torque.

[ HEBI Robotics ]

Thanks Dave!

If your robot hands out karate chops on purpose, that’s great. If it hands out karate chops accidentally, maybe you should fix that.

COVR is short for “being safe around collaborative and versatile robots in shared spaces”. Our mission is to significantly reduce the complexity in safety certifying cobots. Increasing safety for collaborative robots enables new innovative applications, thus increasing production and job creation for companies utilizing the technology. Whether you’re an established company seeking to deploy cobots or an innovative startup with a prototype of a cobot related product, COVR will help you analyze, test and validate the safety for that application.

[ COVR ]

Thanks Anna!

EPFL startup Flybotix has developed a novel drone with just two propellers and an advanced stabilization system that allow it to fly for twice as long as conventional models. That fact, together with its small size, makes it perfect for inspecting hard-to-reach parts of industrial facilities such as ducts.

[ Flybotix ]

SpaceBok is a quadruped robot designed and built by a Swiss student team from ETH Zurich and ZHAW Zurich, currently being tested using Automation and Robotics Laboratories (ARL) facilities at our technical centre in the Netherlands. The robot is being used to investigate the potential of ‘dynamic walking’ and jumping to get around in low gravity environments.

SpaceBok could potentially go up to 2 m high in lunar gravity, although such a height poses new challenges. Once it comes off the ground the legged robot needs to stabilise itself to come down again safely – like a mini-spacecraft. So, like a spacecraft. SpaceBok uses a reaction wheel to control its orientation.

[ ESA ]

A new video from GITAI showing progress on their immersive telepresence robot for space.

[ GITAI ]

Tech United’s HERO robot (a Toyota HSR) competed in the RoboCup@Home competition, and it had a couple of garbage-related hiccups.

[ Tech United ]

Even small drones are getting better at autonomous obstacle avoidance in cluttered environments at useful speeds, as this work from the HKUST Aerial Robotics Group shows.

[ HKUST ]

DelFly Nimbles now come in swarms.

[ DelFly Nimble ]

This is a very short video, but it’s a fairly impressive look at a Baxter robot collaboratively helping someone put a shirt on, a useful task for folks with disabilities.

[ Shibata Lab ]

ANYmal can inspect the concrete in sewers for deterioration by sliding its feet along the ground.

[ ETH Zurich ]

HUG is a haptic user interface for teleoperating advanced robotic systems as the humanoid robot Justin or the assistive robotic system EDAN. With its lightweight robot arms, HUG can measure human movements and simultaneously display forces from the distant environment. In addition to such teleoperation applications, HUG serves as a research platform for virtual assembly simulations, rehabilitation, and training.

[ DLR ]

This video about “image understanding” from CMU in 1979 (!) is amazing, and even though it’s long, you won’t regret watching until 3:30. Or maybe you will.

[ ARGOS (pdf) ]

Will Burrard-Lucas’ BeetleCam turned 10 this month, and in this video, he recounts the history of his little robotic camera.

[ BeetleCam ]

In this week’s episode of Robots in Depth, Per speaks with Gabriel Skantze from Furhat Robotics.

Gabriel Skantze is co-founder and Chief Scientist at Furhat Robotics and Professor in speech technology at KTH with a specialization in conversational systems. He has a background in research into how humans use spoken communication to interact.

In this interview, Gabriel talks about how the social robot revolution makes it necessary to communicate with humans in a human ways through speech and facial expressions. This is necessary as we expand the number of people that interact with robots as well as the types of interaction. Gabriel gives us more insight into the many challenges of implementing spoken communication for co-bots, where robots and humans work closely together. They need to communicate about the world, the objects in it and how to handle them. We also get to hear how having an embodied system using the Furhat robot head helps the interaction between humans and the system.

[ Robots in Depth ] Continue reading

Posted in Human Robots

#435632 DARPA Subterranean Challenge: Tunnel ...

The Tunnel Circuit of the DARPA Subterranean Challenge starts later this week at the NIOSH research mine just outside of Pittsburgh, Pennsylvania. From 15-22 August, 11 teams will send robots into a mine that they've never seen before, with the goal of making maps and locating items. All DARPA SubT events involve tunnels of one sort or another, but in this case, the “Tunnel Circuit” refers to mines as opposed to urban underground areas or natural caves. This month’s challenge is the first of three discrete events leading up to a huge final event in August of 2021.

While the Tunnel Circuit competition will be closed to the public, and media are only allowed access for a single day (which we'll be at, of course), DARPA has provided a substantial amount of information about what teams will be able to expect. We also have details from the SubT Integration Exercise, called STIX, which was a completely closed event that took place back in April. STIX was aimed at giving some teams (and DARPA) a chance to practice in a real tunnel environment.

For more general background on SubT, here are some articles to get you all caught up:

SubT: The Next DARPA Challenge for Robotics

Q&A with DARPA Program Manager Tim Chung

Meet The First Nine Teams

It makes sense to take a closer look at what happened at April's STIX exercise, because it is (probably) very similar to what teams will experience in the upcoming Tunnel Circuit. STIX took place at Edgar Experimental Mine in Colorado, and while no two mines are the same (and many are very, very different), there are enough similarities for STIX to have been a valuable experience for teams. Here's an overview video of the exercise from DARPA:

DARPA has also put together a much more detailed walkthrough of the STIX mine exercise, which gives you a sense of just how vast, complicated, and (frankly) challenging for robots the mine environment is:

So, that's the kind of thing that teams had to deal with back in April. Since the event was an exercise, rather than a competition, DARPA didn't really keep score, and wouldn't comment on the performance of individual teams. We've been trolling YouTube for STIX footage, though, to get a sense of how things went, and we found a few interesting videos.

Here's a nice overview from Team CERBERUS, which used drones plus an ANYmal quadruped:

Team CTU-CRAS also used drones, along with a tracked robot:

Team Robotika was brave enough to post video of a “fatal failure” experienced by its wheeled robot; the poor little bot gets rescued at about 7:00 in case you get worried:

So that was STIX. But what about the Tunnel Circuit competition this week? Here's a course preview video from DARPA:

It sort of looks like the NIOSH mine might be a bit less dusty than the Edgar mine was, but it could also be wetter and muddier. It’s hard to tell, because we’re just getting a few snapshots of what’s probably an enormous area with kilometers of tunnels that the robots will have to explore. But DARPA has promised “constrained passages, sharp turns, large drops/climbs, inclines, steps, ladders, and mud, sand, and/or water.” Combine that with the serious challenge to communications imposed by the mine itself, and robots will have to be both physically capable, and almost entirely autonomous. Which is, of course, exactly what DARPA is looking to test with this challenge.

Lastly, we had a chance to catch up with Tim Chung, Program Manager for the Subterranean Challenge at DARPA, and ask him a few brief questions about STIX and what we have to look forward to this week.

IEEE Spectrum: How did STIX go?

Tim Chung: It was a lot of fun! I think it gave a lot of the teams a great opportunity to really get a taste of what these types of real world environments look like, and also what DARPA has in store for them in the SubT Challenge. STIX I saw as an experiment—a learning experience for all the teams involved (as well as the DARPA team) so that we can continue our calibration.

What do you think teams took away from STIX, and what do you think DARPA took away from STIX?

I think the thing that teams took away was that, when DARPA hosts a challenge, we have very audacious visions for what the art of the possible is. And that's what we want—in my mind, the purpose of a DARPA Grand Challenge is to provide that inspiration of, ‘Holy cow, someone thinks we can do this!’ So I do think the teams walked away with a better understanding of what DARPA's vision is for the capabilities we're seeking in the SubT Challenge, and hopefully walked away with a better understanding of the technical, physical, even maybe mental challenges of doing this in the wild— which will all roll back into how they think about the problem, and how they develop their systems.

This was a collaborative exercise, so the DARPA field team was out there interacting with the other engineers, figuring out what their strengths and weaknesses and needs might be, and even understanding how to handle the robots themselves. That will help [strengthen] connections between these university teams and DARPA going forward. Across the board, I think that collaborative spirit is something we really wish to encourage, and something that the DARPA folks were able to take away.

What do we have to look forward to during the Tunnel Circuit?

The vision here is that the Tunnel Circuit is representative of one of the three subterranean subdomains, along with urban and cave. Characteristics of all of these three subdomains will be mashed together in an epic final course, so that teams will have to face hints of tunnel once again in that final event.

Without giving too much away, the NIOSH mine will be similar to the Edgar mine in that it's a human-made environment that supports mining operations and research. But of course, every site is different, and these differences, I think, will provide good opportunities for the teams to shine.

Again, we'll be visiting the NIOSH mine in Pennsylvania during the Tunnel Circuit and will post as much as we can from there. But if you’re an actual participant in the Subterranean Challenge, please tweet me @BotJunkie so that I can follow and help share live updates.

[ DARPA Subterranean Challenge ] Continue reading

Posted in Human Robots

#435621 ANYbotics Introduces Sleek New ANYmal C ...

Quadrupedal robots are making significant advances lately, and just in the past few months we’ve seen Boston Dynamics’ Spot hauling a truck, IIT’s HyQReal pulling a plane, MIT’s MiniCheetah doing backflips, Unitree Robotics’ Laikago towing a van, and Ghost Robotics’ Vision 60 exploring a mine. Robot makers are betting that their four-legged machines will prove useful in a variety of applications in construction, security, delivery, and even at home.

ANYbotics has been working on such applications for years, testing out their ANYmal robot in places where humans typically don’t want to go (like offshore platforms) as well as places where humans really don’t want to go (like sewers), and they have a better idea than most companies what can make quadruped robots successful.

This week, ANYbotics is announcing a completely new quadruped platform, ANYmal C, a major upgrade from the really quite research-y ANYmal B. The new quadruped has been optimized for ruggedness and reliability in industrial environments, with a streamlined body painted a color that lets you know it means business.

ANYmal C’s physical specs are pretty impressive for a production quadruped. It can move at 1 meter per second, manage 20-degree slopes and 45-degree stairs, cross 25-centimeter gaps, and squeeze through passages just 60 centimeters wide. It’s packed with cameras and 3D sensors, including a lidar for 3D mapping and simultaneous localization and mapping (SLAM). All these sensors (along with the vast volume of gait research that’s been done with ANYmal) make this one of the most reliably autonomous quadrupeds out there, with real-time motion planning and obstacle avoidance.

Image: ANYbotics

ANYmal can autonomously attach itself to a cone-shaped docking station to recharge.

ANYmal C is also one of the ruggedest legged robots in existence. The 50-kilogram robot is IP67 rated, meaning that it’s completely impervious to dust and can withstand being submerged in a meter of water for an hour. If it’s submerged for longer than that, you’re absolutely doing something wrong. The robot will run for over 2 hours on battery power, and if that’s not enough endurance, don’t worry, because ANYmal can autonomously impale itself on a weird cone-shaped docking station to recharge.

Photo: ANYbotics

ANYmal C’s sensor payload includes cameras and a lidar for 3D mapping and SLAM.

As far as what ANYmal C is designed to actually do, it’s mostly remote inspection tasks where you need to move around through a relatively complex environment, but where for whatever reason you’d be better off not sending a human. ANYmal C has a sensor payload that gives it lots of visual options, like thermal imaging, and with the ability to handle a 10-kilogram payload, the robot can be adapted to many different environments.

Over the next few months, we’re hoping to see more examples of ANYmal C being deployed to do useful stuff in real-world environments, but for now, we do have a bit more detail from ANYbotics CTO Christian Gehring.

IEEE Spectrum: Can you tell us about the development process for ANYmal C?

Christian Gehring: We tested the previous generation of ANYmal (B) in a broad range of environments over the last few years and gained a lot of insights. Based on our learnings, it became clear that we would have to re-design the robot to meet the requirements of industrial customers in terms of safety, quality, reliability, and lifetime. There were different prototype stages both for the new drives and for single robot assemblies. Apart from electrical tests, we thoroughly tested the thermal control and ingress protection of various subsystems like the depth cameras and actuators.

What can ANYmal C do that the previous version of ANYmal can’t?

ANYmal C was redesigned with a focus on performance increase regarding actuation (new drives), computational power (new hexacore Intel i7 PCs), locomotion and navigation skills, and autonomy (new depth cameras). The new robot additionally features a docking system for autonomous recharging and an inspection payload as an option. The design of ANYmal C is far more integrated than its predecessor, which increases both performance and reliability.

How much of ANYmal C’s development and design was driven by your experience with commercial or industry customers?

Tests (such as the offshore installation with TenneT) and discussions with industry customers were important to get the necessary design input in terms of performance, safety, quality, reliability, and lifetime. Most customers ask for very similar inspection tasks that can be performed with our standard inspection payload and the required software packages. Some are looking for a robot that can also solve some simple manipulation tasks like pushing a button. Overall, most use cases customers have in mind are realistic and achievable, but some are really tough for the robot, like climbing 50° stairs in hot environments of 50°C.

Can you describe how much autonomy you expect ANYmal C to have in industrial or commercial operations?

ANYmal C is primarily developed to perform autonomous routine inspections in industrial environments. This autonomy especially adds value for operations that are difficult to access, as human operation is extremely costly. The robot can naturally also be operated via a remote control and we are working on long-distance remote operation as well.

Do you expect that researchers will be interested in ANYmal C? What research applications could it be useful for?

ANYmal C has been designed to also address the needs of the research community. The robot comes with two powerful hexacore Intel i7 computers and can additionally be equipped with an NVIDIA Jetson Xavier graphics card for learning-based applications. Payload interfaces enable users to easily install and test new sensors. By joining our established ANYmal Research community, researchers get access to simulation tools and software APIs, which boosts their research in various areas like control, machine learning, and navigation.

[ ANYmal C ] Continue reading

Posted in Human Robots

#435616 Video Friday: AlienGo Quadruped Robot ...

Video Friday is your weekly selection of awesome robotics videos, collected by your Automaton bloggers. We’ll also be posting a weekly calendar of upcoming robotics events for the next few months; here’s what we have so far (send us your events!):

CLAWAR 2019 – August 26-28, 2019 – Kuala Lumpur, Malaysia
IEEE Africon 2019 – September 25-27, 2019 – Accra, Ghana
ISRR 2019 – October 6-10, 2019 – Hanoi, Vietnam
Ro-Man 2019 – October 14-18, 2019 – New Delhi, India
Humanoids 2019 – October 15-17, 2019 – Toronto, Canada
ARSO 2019 – October 31-1, 2019 – Beijing, China
ROSCon 2019 – October 31-1, 2019 – Macau
IROS 2019 – November 4-8, 2019 – Macau
Let us know if you have suggestions for next week, and enjoy today’s videos.

I know you’ve all been closely following our DARPA Subterranean Challenge coverage here and on Twitter, but here are short recap videos of each day just in case you missed something.

[ DARPA SubT ]

After Laikago, Unitree Robotics is now introducing AlienGo, which is looking mighty spry:

We’ve seen MIT’s Mini Cheetah doing backflips earlier this year, but apparently AlienGo is now the largest and heaviest quadruped to perform the maneuver.

[ Unitree ]

The majority of soft robots today rely on external power and control, keeping them tethered to off-board systems or rigged with hard components. Now, researchers from the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and Caltech have developed soft robotic systems, inspired by origami, that can move and change shape in response to external stimuli, paving the way for fully untethered soft robots.

The Rollbot begins as a flat sheet, about 8 centimeters long and 4 centimeters wide. When placed on a hot surface, about 200°C, one set of hinges folds and the robot curls into a pentagonal wheel.

Another set of hinges is embedded on each of the five sides of the wheel. A hinge folds when in contact with the hot surface, propelling the wheel to turn to the next side, where the next hinge folds. As they roll off the hot surface, the hinges unfold and are ready for the next cycle.

[ Harvard SEAS ]

A new research effort at Caltech aims to help people walk again by combining exoskeletons with spinal stimulation. This initiative, dubbed RoAM (Robotic Assisted Mobility), combines the research of two Caltech roboticists: Aaron Ames, who creates the algorithms that enable walking by bipedal robots and translates these to govern the motion of exoskeletons and prostheses; and Joel Burdick, whose transcutaneous spinal implants have already helped paraplegics in clinical trials to recover some leg function and, crucially, torso control.

[ Caltech ]

Once ExoMars lands, it’s going to have to get itself off of the descent stage and onto the surface, which could be tricky. But practice makes perfect, or as near as you can get on Earth.

That wheel walking technique is pretty cool, and it looks like ExoMars will be able to handle terrain that would scare NASA’s Mars rovers away.

[ ExoMars ]

I am honestly not sure whether this would make the game of golf more or less fun to watch:

[ Nissan ]

Finally, a really exciting use case for Misty!

It can pick up those balls too, right?

[ Misty ]

You know you’re an actual robot if this video doesn’t make you crave Peeps.

[ Soft Robotics ]

COMANOID investigates the deployment of robotic solutions in well-identified Airbus airliner assembly operations that are tedious for human workers and for which access is impossible for wheeled or rail-ported robotic platforms. This video presents a demonstration of autonomous placement of a part inside the aircraft fuselage. The task is performed by TORO, the torque-controlled humanoid robot developed at DLR.

[ COMANOID ]

It’s a little hard to see in this video, but this is a cable-suspended robot arm that has little tiny robot arms that it waves around to help damp down vibrations.

[ CoGiRo ]

This week in Robots in Depth, Per speaks with author Cristina Andersson.

In 2013 she organized events in Finland during European robotics week and found that many people was very interested but that there was also a big lack of knowledge.

She also talks about introducing robotics in society in a way that makes it easy for everyone to understand the benefits as this will make the process much easier. When people see the clear benefits in one field or situation they will be much more interested in bringing robotics in to their private or professional lives.

[ Robots in Depth ] Continue reading

Posted in Human Robots

#435614 3 Easy Ways to Evaluate AI Claims

When every other tech startup claims to use artificial intelligence, it can be tough to figure out if an AI service or product works as advertised. In the midst of the AI “gold rush,” how can you separate the nuggets from the fool’s gold?

There’s no shortage of cautionary tales involving overhyped AI claims. And applying AI technologies to health care, education, and law enforcement mean that getting it wrong can have real consequences for society—not just for investors who bet on the wrong unicorn.

So IEEE Spectrum asked experts to share their tips for how to identify AI hype in press releases, news articles, research papers, and IPO filings.

“It can be tricky, because I think the people who are out there selling the AI hype—selling this AI snake oil—are getting more sophisticated over time,” says Tim Hwang, director of the Harvard-MIT Ethics and Governance of AI Initiative.

The term “AI” is perhaps most frequently used to describe machine learning algorithms (and deep learning algorithms, which require even less human guidance) that analyze huge amounts of data and make predictions based on patterns that humans might miss. These popular forms of AI are mostly suited to specialized tasks, such as automatically recognizing certain objects within photos. For that reason, they are sometimes described as “weak” or “narrow” AI.

Some researchers and thought leaders like to talk about the idea of “artificial general intelligence” or “strong AI” that has human-level capacity and flexibility to handle many diverse intellectual tasks. But for now, this type of AI remains firmly in the realm of science fiction and is far from being realized in the real world.

“AI has no well-defined meaning and many so-called AI companies are simply trying to take advantage of the buzz around that term,” says Arvind Narayanan, a computer scientist at Princeton University. “Companies have even been caught claiming to use AI when, in fact, the task is done by human workers.”

Here are three ways to recognize AI hype.

Look for Buzzwords
One red flag is what Hwang calls the “hype salad.” This means stringing together the term “AI” with many other tech buzzwords such as “blockchain” or “Internet of Things.” That doesn’t automatically disqualify the technology, but spotting a high volume of buzzwords in a post, pitch, or presentation should raise questions about what exactly the company or individual has developed.

Other experts agree that strings of buzzwords can be a red flag. That’s especially true if the buzzwords are never really explained in technical detail, and are simply tossed around as vague, poorly-defined terms, says Marzyeh Ghassemi, a computer scientist and biomedical engineer at the University of Toronto in Canada.

“I think that if it looks like a Google search—picture ‘interpretable blockchain AI deep learning medicine’—it's probably not high-quality work,” Ghassemi says.

Hwang also suggests mentally replacing all mentions of “AI” in an article with the term “magical fairy dust.” It’s a way of seeing whether an individual or organization is treating the technology like magic. If so—that’s another good reason to ask more questions about what exactly the AI technology involves.

And even the visual imagery used to illustrate AI claims can indicate that an individual or organization is overselling the technology.

“I think that a lot of the people who work on machine learning on a day-to-day basis are pretty humble about the technology, because they’re largely confronted with how frequently it just breaks and doesn't work,” Hwang says. “And so I think that if you see a company or someone representing AI as a Terminator head, or a big glowing HAL eye or something like that, I think it’s also worth asking some questions.”

Interrogate the Data

It can be hard to evaluate AI claims without any relevant expertise, says Ghassemi at the University of Toronto. Even experts need to know the technical details of the AI algorithm in question and have some access to the training data that shaped the AI model’s predictions. Still, savvy readers with some basic knowledge of applied statistics can search for red flags.

To start, readers can look for possible bias in training data based on small sample sizes or a skewed population that fails to reflect the broader population, Ghassemi says. After all, an AI model trained only on health data from white men would not necessarily achieve similar results for other populations of patients.

“For me, a red flag is not demonstrating deep knowledge of how your labels are defined.”
—Marzyeh Ghassemi, University of Toronto

How machine learning and deep learning models perform also depends on how well humans labeled the sample datasets use to train these programs. This task can be straightforward when labeling photos of cats versus dogs, but gets more complicated when assigning disease diagnoses to certain patient cases.

Medical experts frequently disagree with each other on diagnoses—which is why many patients seek a second opinion. Not surprisingly, this ambiguity can also affect the diagnostic labels that experts assign in training datasets. “For me, a red flag is not demonstrating deep knowledge of how your labels are defined,” Ghassemi says.

Such training data can also reflect the cultural stereotypes and biases of the humans who labeled the data, says Narayanan at Princeton University. Like Ghassemi, he recommends taking a hard look at exactly what the AI has learned: “A good way to start critically evaluating AI claims is by asking questions about the training data.”

Another red flag is presenting an AI system’s performance through a single accuracy figure without much explanation, Narayanan says. Claiming that an AI model achieves “99 percent” accuracy doesn’t mean much without knowing the baseline for comparison—such as whether other systems have already achieved 99 percent accuracy—or how well that accuracy holds up in situations beyond the training dataset.

Narayanan also emphasized the need to ask questions about an AI model’s false positive rate—the rate of making wrong predictions about the presence of a given condition. Even if the false positive rate of a hypothetical AI service is just one percent, that could have major consequences if that service ends up screening millions of people for cancer.

Readers can also consider whether using AI in a given situation offers any meaningful improvement compared to traditional statistical methods, says Clayton Aldern, a data scientist and journalist who serves as managing director for Caldern LLC. He gave the hypothetical example of a “super-duper-fancy deep learning model” that achieves a prediction accuracy of 89 percent, compared to a “little polynomial regression model” that achieves 86 percent on the same dataset.

“We're talking about a three-percentage-point increase on something that you learned about in Algebra 1,” Aldern says. “So is it worth the hype?”

Don’t Ignore the Drawbacks

The hype surrounding AI isn’t just about the technical merits of services and products driven by machine learning. Overblown claims about the beneficial impacts of AI technology—or vague promises to address ethical issues related to deploying it—should also raise red flags.

“If a company promises to use its tech ethically, it is important to question if its business model aligns with that promise,” Narayanan says. “Even if employees have noble intentions, it is unrealistic to expect the company as a whole to resist financial imperatives.”

One example might be a company with a business model that depends on leveraging customers’ personal data. Such companies “tend to make empty promises when it comes to privacy,” Narayanan says. And, if companies hire workers to produce training data, it’s also worth asking whether the companies treat those workers ethically.

The transparency—or lack thereof—about any AI claim can also be telling. A company or research group can minimize concerns by publishing technical claims in peer-reviewed journals or allowing credible third parties to evaluate their AI without giving away big intellectual property secrets, Narayanan says. Excessive secrecy is a big red flag.

With these strategies, you don’t need to be a computer engineer or data scientist to start thinking critically about AI claims. And, Narayanan says, the world needs many people from different backgrounds for societies to fully consider the real-world implications of AI.

Editor’s Note: The original version of this story misspelled Clayton Aldern’s last name as Alderton. Continue reading

Posted in Human Robots