Tag Archives: ethics
#435614 3 Easy Ways to Evaluate AI Claims
When every other tech startup claims to use artificial intelligence, it can be tough to figure out if an AI service or product works as advertised. In the midst of the AI “gold rush,” how can you separate the nuggets from the fool’s gold?
There’s no shortage of cautionary tales involving overhyped AI claims. And applying AI technologies to health care, education, and law enforcement mean that getting it wrong can have real consequences for society—not just for investors who bet on the wrong unicorn.
So IEEE Spectrum asked experts to share their tips for how to identify AI hype in press releases, news articles, research papers, and IPO filings.
“It can be tricky, because I think the people who are out there selling the AI hype—selling this AI snake oil—are getting more sophisticated over time,” says Tim Hwang, director of the Harvard-MIT Ethics and Governance of AI Initiative.
The term “AI” is perhaps most frequently used to describe machine learning algorithms (and deep learning algorithms, which require even less human guidance) that analyze huge amounts of data and make predictions based on patterns that humans might miss. These popular forms of AI are mostly suited to specialized tasks, such as automatically recognizing certain objects within photos. For that reason, they are sometimes described as “weak” or “narrow” AI.
Some researchers and thought leaders like to talk about the idea of “artificial general intelligence” or “strong AI” that has human-level capacity and flexibility to handle many diverse intellectual tasks. But for now, this type of AI remains firmly in the realm of science fiction and is far from being realized in the real world.
“AI has no well-defined meaning and many so-called AI companies are simply trying to take advantage of the buzz around that term,” says Arvind Narayanan, a computer scientist at Princeton University. “Companies have even been caught claiming to use AI when, in fact, the task is done by human workers.”
Here are three ways to recognize AI hype.
Look for Buzzwords
One red flag is what Hwang calls the “hype salad.” This means stringing together the term “AI” with many other tech buzzwords such as “blockchain” or “Internet of Things.” That doesn’t automatically disqualify the technology, but spotting a high volume of buzzwords in a post, pitch, or presentation should raise questions about what exactly the company or individual has developed.
Other experts agree that strings of buzzwords can be a red flag. That’s especially true if the buzzwords are never really explained in technical detail, and are simply tossed around as vague, poorly-defined terms, says Marzyeh Ghassemi, a computer scientist and biomedical engineer at the University of Toronto in Canada.
“I think that if it looks like a Google search—picture ‘interpretable blockchain AI deep learning medicine’—it's probably not high-quality work,” Ghassemi says.
Hwang also suggests mentally replacing all mentions of “AI” in an article with the term “magical fairy dust.” It’s a way of seeing whether an individual or organization is treating the technology like magic. If so—that’s another good reason to ask more questions about what exactly the AI technology involves.
And even the visual imagery used to illustrate AI claims can indicate that an individual or organization is overselling the technology.
“I think that a lot of the people who work on machine learning on a day-to-day basis are pretty humble about the technology, because they’re largely confronted with how frequently it just breaks and doesn't work,” Hwang says. “And so I think that if you see a company or someone representing AI as a Terminator head, or a big glowing HAL eye or something like that, I think it’s also worth asking some questions.”
Interrogate the Data
It can be hard to evaluate AI claims without any relevant expertise, says Ghassemi at the University of Toronto. Even experts need to know the technical details of the AI algorithm in question and have some access to the training data that shaped the AI model’s predictions. Still, savvy readers with some basic knowledge of applied statistics can search for red flags.
To start, readers can look for possible bias in training data based on small sample sizes or a skewed population that fails to reflect the broader population, Ghassemi says. After all, an AI model trained only on health data from white men would not necessarily achieve similar results for other populations of patients.
“For me, a red flag is not demonstrating deep knowledge of how your labels are defined.”
—Marzyeh Ghassemi, University of Toronto
How machine learning and deep learning models perform also depends on how well humans labeled the sample datasets use to train these programs. This task can be straightforward when labeling photos of cats versus dogs, but gets more complicated when assigning disease diagnoses to certain patient cases.
Medical experts frequently disagree with each other on diagnoses—which is why many patients seek a second opinion. Not surprisingly, this ambiguity can also affect the diagnostic labels that experts assign in training datasets. “For me, a red flag is not demonstrating deep knowledge of how your labels are defined,” Ghassemi says.
Such training data can also reflect the cultural stereotypes and biases of the humans who labeled the data, says Narayanan at Princeton University. Like Ghassemi, he recommends taking a hard look at exactly what the AI has learned: “A good way to start critically evaluating AI claims is by asking questions about the training data.”
Another red flag is presenting an AI system’s performance through a single accuracy figure without much explanation, Narayanan says. Claiming that an AI model achieves “99 percent” accuracy doesn’t mean much without knowing the baseline for comparison—such as whether other systems have already achieved 99 percent accuracy—or how well that accuracy holds up in situations beyond the training dataset.
Narayanan also emphasized the need to ask questions about an AI model’s false positive rate—the rate of making wrong predictions about the presence of a given condition. Even if the false positive rate of a hypothetical AI service is just one percent, that could have major consequences if that service ends up screening millions of people for cancer.
Readers can also consider whether using AI in a given situation offers any meaningful improvement compared to traditional statistical methods, says Clayton Aldern, a data scientist and journalist who serves as managing director for Caldern LLC. He gave the hypothetical example of a “super-duper-fancy deep learning model” that achieves a prediction accuracy of 89 percent, compared to a “little polynomial regression model” that achieves 86 percent on the same dataset.
“We're talking about a three-percentage-point increase on something that you learned about in Algebra 1,” Aldern says. “So is it worth the hype?”
Don’t Ignore the Drawbacks
The hype surrounding AI isn’t just about the technical merits of services and products driven by machine learning. Overblown claims about the beneficial impacts of AI technology—or vague promises to address ethical issues related to deploying it—should also raise red flags.
“If a company promises to use its tech ethically, it is important to question if its business model aligns with that promise,” Narayanan says. “Even if employees have noble intentions, it is unrealistic to expect the company as a whole to resist financial imperatives.”
One example might be a company with a business model that depends on leveraging customers’ personal data. Such companies “tend to make empty promises when it comes to privacy,” Narayanan says. And, if companies hire workers to produce training data, it’s also worth asking whether the companies treat those workers ethically.
The transparency—or lack thereof—about any AI claim can also be telling. A company or research group can minimize concerns by publishing technical claims in peer-reviewed journals or allowing credible third parties to evaluate their AI without giving away big intellectual property secrets, Narayanan says. Excessive secrecy is a big red flag.
With these strategies, you don’t need to be a computer engineer or data scientist to start thinking critically about AI claims. And, Narayanan says, the world needs many people from different backgrounds for societies to fully consider the real-world implications of AI.
Editor’s Note: The original version of this story misspelled Clayton Aldern’s last name as Alderton. Continue reading
#434759 To Be Ethical, AI Must Become ...
As over-hyped as artificial intelligence is—everyone’s talking about it, few fully understand it, it might leave us all unemployed but also solve all the world’s problems—its list of accomplishments is growing. AI can now write realistic-sounding text, give a debating champ a run for his money, diagnose illnesses, and generate fake human faces—among much more.
After training these systems on massive datasets, their creators essentially just let them do their thing to arrive at certain conclusions or outcomes. The problem is that more often than not, even the creators don’t know exactly why they’ve arrived at those conclusions or outcomes. There’s no easy way to trace a machine learning system’s rationale, so to speak. The further we let AI go down this opaque path, the more likely we are to end up somewhere we don’t want to be—and may not be able to come back from.
In a panel at the South by Southwest interactive festival last week titled “Ethics and AI: How to plan for the unpredictable,” experts in the field shared their thoughts on building more transparent, explainable, and accountable AI systems.
Not New, but Different
Ryan Welsh, founder and director of explainable AI startup Kyndi, pointed out that having knowledge-based systems perform advanced tasks isn’t new; he cited logistical, scheduling, and tax software as examples. What’s new is the learning component, our inability to trace how that learning occurs, and the ethical implications that could result.
“Now we have these systems that are learning from data, and we’re trying to understand why they’re arriving at certain outcomes,” Welsh said. “We’ve never actually had this broad society discussion about ethics in those scenarios.”
Rather than continuing to build AIs with opaque inner workings, engineers must start focusing on explainability, which Welsh broke down into three subcategories. Transparency and interpretability come first, and refer to being able to find the units of high influence in a machine learning network, as well as the weights of those units and how they map to specific data and outputs.
Then there’s provenance: knowing where something comes from. In an ideal scenario, for example, Open AI’s new text generator would be able to generate citations in its text that reference academic (and human-created) papers or studies.
Explainability itself is the highest and final bar and refers to a system’s ability to explain itself in natural language to the average user by being able to say, “I generated this output because x, y, z.”
“Humans are unique in our ability and our desire to ask why,” said Josh Marcuse, executive director of the Defense Innovation Board, which advises Department of Defense senior leaders on innovation. “The reason we want explanations from people is so we can understand their belief system and see if we agree with it and want to continue to work with them.”
Similarly, we need to have the ability to interrogate AIs.
Two Types of Thinking
Welsh explained that one big barrier standing in the way of explainability is the tension between the deep learning community and the symbolic AI community, which see themselves as two different paradigms and historically haven’t collaborated much.
Symbolic or classical AI focuses on concepts and rules, while deep learning is centered around perceptions. In human thought this is the difference between, for example, deciding to pass a soccer ball to a teammate who is open (you make the decision because conceptually you know that only open players can receive passes), and registering that the ball is at your feet when someone else passes it to you (you’re taking in information without making a decision about it).
“Symbolic AI has abstractions and representation based on logic that’s more humanly comprehensible,” Welsh said. To truly mimic human thinking, AI needs to be able to both perceive information and conceptualize it. An example of perception (deep learning) in an AI is recognizing numbers within an image, while conceptualization (symbolic learning) would give those numbers a hierarchical order and extract rules from the hierachy (4 is greater than 3, and 5 is greater than 4, therefore 5 is also greater than 3).
Explainability comes in when the system can say, “I saw a, b, and c, and based on that decided x, y, or z.” DeepMind and others have recently published papers emphasizing the need to fuse the two paradigms together.
Implications Across Industries
One of the most prominent fields where AI ethics will come into play, and where the transparency and accountability of AI systems will be crucial, is defense. Marcuse said, “We’re accountable beings, and we’re responsible for the choices we make. Bringing in tech or AI to a battlefield doesn’t strip away that meaning and accountability.”
In fact, he added, rather than worrying about how AI might degrade human values, people should be asking how the tech could be used to help us make better moral choices.
It’s also important not to conflate AI with autonomy—a worst-case scenario that springs to mind is an intelligent destructive machine on a rampage. But in fact, Marcuse said, in the defense space, “We have autonomous systems today that don’t rely on AI, and most of the AI systems we’re contemplating won’t be autonomous.”
The US Department of Defense released its 2018 artificial intelligence strategy last month. It includes developing a robust and transparent set of principles for defense AI, investing in research and development for AI that’s reliable and secure, continuing to fund research in explainability, advocating for a global set of military AI guidelines, and finding ways to use AI to reduce the risk of civilian casualties and other collateral damage.
Though these were designed with defense-specific aims in mind, Marcuse said, their implications extend across industries. “The defense community thinks of their problems as being unique, that no one deals with the stakes and complexity we deal with. That’s just wrong,” he said. Making high-stakes decisions with technology is widespread; safety-critical systems are key to aviation, medicine, and self-driving cars, to name a few.
Marcuse believes the Department of Defense can invest in AI safety in a way that has far-reaching benefits. “We all depend on technology to keep us alive and safe, and no one wants machines to harm us,” he said.
A Creation Superior to Its Creator
That said, we’ve come to expect technology to meet our needs in just the way we want, all the time—servers must never be down, GPS had better not take us on a longer route, Google must always produce the answer we’re looking for.
With AI, though, our expectations of perfection may be less reasonable.
“Right now we’re holding machines to superhuman standards,” Marcuse said. “We expect them to be perfect and infallible.” Take self-driving cars. They’re conceived of, built by, and programmed by people, and people as a whole generally aren’t great drivers—just look at traffic accident death rates to confirm that. But the few times self-driving cars have had fatal accidents, there’s been an ensuing uproar and backlash against the industry, as well as talk of implementing more restrictive regulations.
This can be extrapolated to ethics more generally. We as humans have the ability to explain our decisions, but many of us aren’t very good at doing so. As Marcuse put it, “People are emotional, they confabulate, they lie, they’re full of unconscious motivations. They don’t pass the explainability test.”
Why, then, should explainability be the standard for AI?
Even if humans aren’t good at explaining our choices, at least we can try, and we can answer questions that probe at our decision-making process. A deep learning system can’t do this yet, so working towards being able to identify which input data the systems are triggering on to make decisions—even if the decisions and the process aren’t perfect—is the direction we need to head.
Image Credit: a-image / Shutterstock.com Continue reading