Tag Archives: there
#435614 3 Easy Ways to Evaluate AI Claims
When every other tech startup claims to use artificial intelligence, it can be tough to figure out if an AI service or product works as advertised. In the midst of the AI “gold rush,” how can you separate the nuggets from the fool’s gold?
There’s no shortage of cautionary tales involving overhyped AI claims. And applying AI technologies to health care, education, and law enforcement mean that getting it wrong can have real consequences for society—not just for investors who bet on the wrong unicorn.
So IEEE Spectrum asked experts to share their tips for how to identify AI hype in press releases, news articles, research papers, and IPO filings.
“It can be tricky, because I think the people who are out there selling the AI hype—selling this AI snake oil—are getting more sophisticated over time,” says Tim Hwang, director of the Harvard-MIT Ethics and Governance of AI Initiative.
The term “AI” is perhaps most frequently used to describe machine learning algorithms (and deep learning algorithms, which require even less human guidance) that analyze huge amounts of data and make predictions based on patterns that humans might miss. These popular forms of AI are mostly suited to specialized tasks, such as automatically recognizing certain objects within photos. For that reason, they are sometimes described as “weak” or “narrow” AI.
Some researchers and thought leaders like to talk about the idea of “artificial general intelligence” or “strong AI” that has human-level capacity and flexibility to handle many diverse intellectual tasks. But for now, this type of AI remains firmly in the realm of science fiction and is far from being realized in the real world.
“AI has no well-defined meaning and many so-called AI companies are simply trying to take advantage of the buzz around that term,” says Arvind Narayanan, a computer scientist at Princeton University. “Companies have even been caught claiming to use AI when, in fact, the task is done by human workers.”
Here are three ways to recognize AI hype.
Look for Buzzwords
One red flag is what Hwang calls the “hype salad.” This means stringing together the term “AI” with many other tech buzzwords such as “blockchain” or “Internet of Things.” That doesn’t automatically disqualify the technology, but spotting a high volume of buzzwords in a post, pitch, or presentation should raise questions about what exactly the company or individual has developed.
Other experts agree that strings of buzzwords can be a red flag. That’s especially true if the buzzwords are never really explained in technical detail, and are simply tossed around as vague, poorly-defined terms, says Marzyeh Ghassemi, a computer scientist and biomedical engineer at the University of Toronto in Canada.
“I think that if it looks like a Google search—picture ‘interpretable blockchain AI deep learning medicine’—it's probably not high-quality work,” Ghassemi says.
Hwang also suggests mentally replacing all mentions of “AI” in an article with the term “magical fairy dust.” It’s a way of seeing whether an individual or organization is treating the technology like magic. If so—that’s another good reason to ask more questions about what exactly the AI technology involves.
And even the visual imagery used to illustrate AI claims can indicate that an individual or organization is overselling the technology.
“I think that a lot of the people who work on machine learning on a day-to-day basis are pretty humble about the technology, because they’re largely confronted with how frequently it just breaks and doesn't work,” Hwang says. “And so I think that if you see a company or someone representing AI as a Terminator head, or a big glowing HAL eye or something like that, I think it’s also worth asking some questions.”
Interrogate the Data
It can be hard to evaluate AI claims without any relevant expertise, says Ghassemi at the University of Toronto. Even experts need to know the technical details of the AI algorithm in question and have some access to the training data that shaped the AI model’s predictions. Still, savvy readers with some basic knowledge of applied statistics can search for red flags.
To start, readers can look for possible bias in training data based on small sample sizes or a skewed population that fails to reflect the broader population, Ghassemi says. After all, an AI model trained only on health data from white men would not necessarily achieve similar results for other populations of patients.
“For me, a red flag is not demonstrating deep knowledge of how your labels are defined.”
—Marzyeh Ghassemi, University of Toronto
How machine learning and deep learning models perform also depends on how well humans labeled the sample datasets use to train these programs. This task can be straightforward when labeling photos of cats versus dogs, but gets more complicated when assigning disease diagnoses to certain patient cases.
Medical experts frequently disagree with each other on diagnoses—which is why many patients seek a second opinion. Not surprisingly, this ambiguity can also affect the diagnostic labels that experts assign in training datasets. “For me, a red flag is not demonstrating deep knowledge of how your labels are defined,” Ghassemi says.
Such training data can also reflect the cultural stereotypes and biases of the humans who labeled the data, says Narayanan at Princeton University. Like Ghassemi, he recommends taking a hard look at exactly what the AI has learned: “A good way to start critically evaluating AI claims is by asking questions about the training data.”
Another red flag is presenting an AI system’s performance through a single accuracy figure without much explanation, Narayanan says. Claiming that an AI model achieves “99 percent” accuracy doesn’t mean much without knowing the baseline for comparison—such as whether other systems have already achieved 99 percent accuracy—or how well that accuracy holds up in situations beyond the training dataset.
Narayanan also emphasized the need to ask questions about an AI model’s false positive rate—the rate of making wrong predictions about the presence of a given condition. Even if the false positive rate of a hypothetical AI service is just one percent, that could have major consequences if that service ends up screening millions of people for cancer.
Readers can also consider whether using AI in a given situation offers any meaningful improvement compared to traditional statistical methods, says Clayton Aldern, a data scientist and journalist who serves as managing director for Caldern LLC. He gave the hypothetical example of a “super-duper-fancy deep learning model” that achieves a prediction accuracy of 89 percent, compared to a “little polynomial regression model” that achieves 86 percent on the same dataset.
“We're talking about a three-percentage-point increase on something that you learned about in Algebra 1,” Aldern says. “So is it worth the hype?”
Don’t Ignore the Drawbacks
The hype surrounding AI isn’t just about the technical merits of services and products driven by machine learning. Overblown claims about the beneficial impacts of AI technology—or vague promises to address ethical issues related to deploying it—should also raise red flags.
“If a company promises to use its tech ethically, it is important to question if its business model aligns with that promise,” Narayanan says. “Even if employees have noble intentions, it is unrealistic to expect the company as a whole to resist financial imperatives.”
One example might be a company with a business model that depends on leveraging customers’ personal data. Such companies “tend to make empty promises when it comes to privacy,” Narayanan says. And, if companies hire workers to produce training data, it’s also worth asking whether the companies treat those workers ethically.
The transparency—or lack thereof—about any AI claim can also be telling. A company or research group can minimize concerns by publishing technical claims in peer-reviewed journals or allowing credible third parties to evaluate their AI without giving away big intellectual property secrets, Narayanan says. Excessive secrecy is a big red flag.
With these strategies, you don’t need to be a computer engineer or data scientist to start thinking critically about AI claims. And, Narayanan says, the world needs many people from different backgrounds for societies to fully consider the real-world implications of AI.
Editor’s Note: The original version of this story misspelled Clayton Aldern’s last name as Alderton. Continue reading
#435601 New Double 3 Robot Makes Telepresence ...
Today, Double Robotics is announcing Double 3, the latest major upgrade to its line of consumer(ish) telepresence robots. We had a (mostly) fantastic time testing out Double 2 back in 2016. One of the things that we found out back then was that it takes a lot of practice to remotely drive the robot around. Double 3 solves this problem by leveraging the substantial advances in 3D sensing and computing that have taken place over the past few years, giving their new robot a level of intelligence that promises to make telepresence more accessible for everyone.
Double 2’s iPad has been replaced by “a fully integrated solution”—which is a fancy way of saying a dedicated 9.7-inch touchscreen and a whole bunch of other stuff. That other stuff includes an NVIDIA Jetson TX2 AI computing module, a beamforming six-microphone array, an 8-watt speaker, a pair of 13-megapixel cameras (wide angle and zoom) on a tilting mount, five ultrasonic rangefinders, and most excitingly, a pair of Intel RealSense D430 depth sensors.
It’s those new depth sensors that really make Double 3 special. The D430 modules each uses a pair of stereo cameras with a pattern projector to generate 1280 x 720 depth data with a range of between 0.2 and 10 meters away. The Double 3 robot uses all of this high quality depth data to locate obstacles, but at this point, it still doesn’t drive completely autonomously. Instead, it presents the remote operator with a slick, augmented reality view of drivable areas in the form of a grid of dots. You just click where you want the robot to go, and it will skillfully take itself there while avoiding obstacles (including dynamic obstacles) and related mishaps along the way.
This effectively offloads the most stressful part of telepresence—not running into stuff—from the remote user to the robot itself, which is the way it should be. That makes it that much easier to encourage people to utilize telepresence for the first time. The way the system is implemented through augmented reality is particularly impressive, I think. It looks like it’s intuitive enough for an inexperienced user without being restrictive, and is a clever way of mitigating even significant amounts of lag.
Otherwise, Double 3’s mobility system is exactly the same as the one featured on Double 2. In fact, that you can stick a Double 3 head on a Double 2 body and it instantly becomes a Double 3. Double Robotics is thoughtfully offering this to current Double 2 owners as a significantly more affordable upgrade option than buying a whole new robot.
For more details on all of Double 3's new features, we spoke with the co-founders of Double Robotics, Marc DeVidts and David Cann.
IEEE Spectrum: Why use this augmented reality system instead of just letting the user click on a regular camera image? Why make things more visually complicated, especially for new users?
Marc DeVidts and David Cann: One of the things that we realized about nine months ago when we got this whole thing working was that without the mixed reality for driving, it was really too magical of an experience for the customer. Even us—we had a hard time understanding whether the robot could really see obstacles and understand where the floor is and that kind of thing. So, we said “What would be the best way of communicating this information to the user?” And the right way to do it ended up drawing the graphics directly onto the scene. It’s really awesome—we have a full, real time 3D scene with the depth information drawn on top of it. We’re starting with some relatively simple graphics, and we’ll be adding more graphics in the future to help the user understand what the robot is seeing.
How robust is the vision system when it comes to obstacle detection and avoidance? Does it work with featureless surfaces, IR absorbent surfaces, in low light, in direct sunlight, etc?
We’ve looked at all of those cases, and one of the reasons that we’re going with the RealSense is the projector that helps us to see blank walls. We also found that having two sensors—one facing the floor and one facing forward—gives us a great coverage area. Having ultrasonic sensors in there as well helps us to detect anything that we can't see with the cameras. They're sort of a last safety measure, especially useful for detecting glass.
It seems like there’s a lot more that you could do with this sensing and mapping capability. What else are you working on?
We're starting with this semi-autonomous driving variant, and we're doing a private beta of full mapping. So, we’re going to do full SLAM of your environment that will be mapped by multiple robots at the same time while you're driving, and then you'll be able to zoom out to a map and click anywhere and it will drive there. That's where we're going with it, but we want to take baby steps to get there. It's the obvious next step, I think, and there are a lot more possibilities there.
Do you expect developers to be excited for this new mapping capability?
We're using a very powerful computer in the robot, a NVIDIA Jetson TX2 running Ubuntu. There's room to grow. It’s actually really exciting to be able to see, in real time, the 3D pose of the robot along with all of the depth data that gets transformed in real time into one view that gives you a full map. Having all of that data and just putting those pieces together and getting everything to work has been a huge feat in of itself.
We have an extensive API for developers to do custom implementations, either for telepresence or other kinds of robotics research. Our system isn't running ROS, but we're going to be adding ROS adapters for all of our hardware components.
Telepresence robots depend heavily on wireless connectivity, which is usually not something that telepresence robotics companies like Double have direct control over. Have you found that connectivity has been getting significantly better since you first introduced Double?
When we started in 2013, we had a lot of customers that didn’t have WiFi in their hallways, just in the conference rooms. We very rarely hear about customers having WiFi connectivity issues these days. The bigger issue we see is when people are calling into the robot from home, where they don't have proper traffic management on their home network. The robot doesn't need a ton of bandwidth, but it does need consistent, low latency bandwidth. And so, if someone else in the house is watching Netflix or something like that, it’s going to saturate your connection. But for the most part, it’s gotten a lot better over the last few years, and it’s no longer a big problem for us.
Do you think 5G will make a significant difference to telepresence robots?
We’ll see. We like the low latency possibilities and the better bandwidth, but it's all going to be a matter of what kind of reception you get. LTE can be great, if you have good reception; it’s all about where the tower is. I’m pretty sure that WiFi is going to be the primary thing for at least the next few years.
DeVidts also mentioned that an unfortunate side effect of the new depth sensors is that hanging a t-shirt on your Double to give it some personality will likely render it partially blind, so that's just something to keep in mind. To make up for this, you can switch around the colorful trim surrounding the screen, which is nowhere near as fun.
When the Double 3 is ready for shipping in late September, US $2,000 will get you the new head with all the sensors and stuff, which seamlessly integrates with your Double 2 base. Buying Double 3 straight up (with the included charging dock) will run you $4,ooo. This is by no means an inexpensive robot, and my impression is that it’s not really designed for individual consumers. But for commercial, corporate, healthcare, or education applications, $4k for a robot as capable as the Double 3 is really quite a good deal—especially considering the kinds of use cases for which it’s ideal.
[ Double Robotics ] Continue reading