Tag Archives: huge
This is part six of a six-part series on the history of natural language processing.
In February of this year, OpenAI, one of the foremost artificial intelligence labs in the world, announced that a team of researchers had built a powerful new text generator called the Generative Pre-Trained Transformer 2, or GPT-2 for short. The researchers used a reinforcement learning algorithm to train their system on a broad set of natural language processing (NLP) capabilities, including reading comprehension, machine translation, and the ability to generate long strings of coherent text.
But as is often the case with NLP technology, the tool held both great promise and great peril. Researchers and policy makers at the lab were concerned that their system, if widely released, could be exploited by bad actors and misappropriated for “malicious purposes.”
The people of OpenAI, which defines its mission as “discovering and enacting the path to safe artificial general intelligence,” were concerned that GPT-2 could be used to flood the Internet with fake text, thereby degrading an already fragile information ecosystem. For this reason, OpenAI decided that it would not release the full version of GPT-2 to the public or other researchers.
GPT-2 is an example of a technique in NLP called language modeling, whereby the computational system internalizes a statistical blueprint of a text so it’s able to mimic it. Just like the predictive text on your phone—which selects words based on words you’ve used before—GPT-2 can look at a string of text and then predict what the next word is likely to be based on the probabilities inherent in that text.
GPT-2 can be seen as a descendant of the statistical language modeling that the Russian mathematician A. A. Markov developed in the early 20th century (covered in part three of this series).
GPT-2 used cutting-edge machine learning algorithms to do linguistic analysis with over 1.5 million parameters.
What’s different with GPT-2, though, is the scale of the textual data modeled by the system. Whereas Markov analyzed a string of 20,000 letters to create a rudimentary model that could predict the likelihood of the next letter of a text being a consonant or a vowel, GPT-2 used 8 million articles scraped from Reddit to predict what the next word might be within that entire dataset.
And whereas Markov manually trained his model by counting only two parameters—vowels and consonants—GPT-2 used cutting-edge machine learning algorithms to do linguistic analysis with over 1.5 million parameters, burning through huge amounts of computational power in the process.
The results were impressive. In their blog post, OpenAI reported that GPT-2 could generate synthetic text in response to prompts, mimicking whatever style of text it was shown. If you prompt the system with a line of William Blake’s poetry, it can generate a line back in the Romantic poet’s style. If you prompt the system with a cake recipe, you get a newly invented recipe in response.
Perhaps the most compelling feature of GPT-2 is that it can answer questions accurately. For example, when OpenAI researchers asked the system, “Who wrote the book The Origin of Species?”—it responded: “Charles Darwin.” While only able to respond accurately some of the time, the feature does seem to be a limited realization of Gottfried Leibniz’s dream of a language-generating machine that could answer any and all human questions (described in part two of this series).
After observing the power of the new system in practice, OpenAI elected not to release the fully trained model. In the lead up to its release in February, there had been heightened awareness about “deepfakes”—synthetic images and videos, generated via machine learning techniques, in which people do and say things they haven’t really done and said. Researchers at OpenAI worried that GPT-2 could be used to essentially create deepfake text, making it harder for people to trust textual information online.
Responses to this decision varied. On one hand, OpenAI’s caution prompted an overblown reaction in the media, with articles about the “dangerous” technology feeding into the Frankenstein narrative that often surrounds developments in AI.
Others took issue with OpenAI’s self-promotion, with some even suggesting that OpenAI purposefully exaggerated GPT-2s power in order to create hype—while contravening a norm in the AI research community, where labs routinely share data, code, and pre-trained models. As machine learning researcher Zachary Lipton tweeted, “Perhaps what's *most remarkable* about the @OpenAI controversy is how *unremarkable* the technology is. Despite their outsize attention & budget, the research itself is perfectly ordinary—right in the main branch of deep learning NLP research.”
OpenAI stood by its decision to release only a limited version of GPT-2, but has since released larger models for other researchers and the public to experiment with. As yet, there has been no reported case of a widely distributed fake news article generated by the system. But there have been a number of interesting spin-off projects, including GPT-2 poetry and a webpage where you can prompt the system with questions yourself.
Mimicking humans on Reddit, the bots have long conversations about a variety of topics, including conspiracy theories and
Star Wars movies.
There’s even a Reddit group populated entirely with text produced by GPT-2-powered bots. Mimicking humans on Reddit, the bots have long conversations about a variety of topics, including conspiracy theories and Star Wars movies.
This bot-powered conversation may signify the new condition of life online, where language is increasingly created by a combination of human and non-human agents, and where maintaining the distinction between human and non-human, despite our best efforts, is increasingly difficult.
The idea of using rules, mechanisms, and algorithms to generate language has inspired people in many different cultures throughout history. But it’s in the online world that this powerful form of wordcraft may really find its natural milieu—in an environment where the identity of speakers becomes more ambiguous, and perhaps, less relevant. It remains to be seen what the consequences will be for language, communication, and our sense of human identity, which is so bound up with our ability to speak in natural language.
This is the sixth installment of a six-part series on the history of natural language processing. Last week’s post explained how an innocent Microsoft chatbot turned instantly racist on Twitter.
You can also check out our prior series on the untold history of AI. Continue reading
Artificial intelligence is going to overhaul the way we live and work. But will the changes it brings be for the better? As the technology slowly develops (let’s remember that right now, we’re still very much in the narrow AI space and nowhere near an artificial general intelligence), whether it will end up doing us more harm than good is a question at the top of everyone’s mind.
What kind of response might we get if we posed this question to an AI itself?
Last week at the Cambridge Union in England, IBM did just that. Its Project Debater (an AI that narrowly lost a debate to human debating champion Harish Natarajan in February) gave the opening arguments in a debate about the promise and peril of artificial intelligence.
Critical thinking, linking different lines of thought, and anticipating counter-arguments are all valuable debating skills that humans can practice and refine. While these skills are tougher for an AI to get good at since they often require deeper contextual understanding, AI does have a major edge over humans in absorbing and analyzing information. In the February debate, Project Debater used IBM’s cloud computing infrastructure to read hundreds of millions of documents and extract relevant details to construct an argument.
This time around, Debater looked through 1,100 arguments for or against AI. The arguments were submitted to IBM by the public during the week prior to the debate, through a website set up for that purpose. Of the 1,100 submissions, the AI classified 570 as anti-AI, or of the opinion that the technology will bring more harm to humanity than good. 511 arguments were found to be pro-AI, and the rest were irrelevant to the topic at hand.
Debater grouped the arguments into five themes; the technology’s ability to take over dangerous or monotonous jobs was a pro-AI theme, and on the flip side was its potential to perpetuate the biases of its creators. “AI companies still have too little expertise on how to properly assess datasets and filter out bias,” the tall black box that houses Project Debater said. “AI will take human bias and will fixate it for generations.”
After Project Debater kicked off the debate by giving opening arguments for both sides, two teams of people took over, elaborating on its points and coming up with their own counter-arguments.
In the end, an audience poll voted in favor of the pro-AI side, but just barely; 51.2 percent of voters felt convinced that AI can help us more than it can hurt us.
The software’s natural language processing was able to identify racist, obscene, or otherwise inappropriate comments and weed them out as being irrelevant to the debate. But it also repeated the same arguments multiple times, and mixed up a statement about bias as being pro-AI rather than anti-AI.
IBM has been working on Project Debater for over six years, and though it aims to iron out small glitches like these, the system’s goal isn’t to ultimately outwit and defeat humans. On the contrary, the AI is meant to support our decision-making by taking in and processing huge amounts of information in a nuanced way, more quickly than we ever could.
IBM engineer Noam Slonim envisions Project Debater’s tech being used, for example, by a government seeking citizens’ feedback about a new policy. “This technology can help to establish an interesting and effective communication channel between the decision maker and the people that are going to be impacted by the decision,” he said.
As for the question of whether AI will do more good or harm, perhaps Sylvie Delacroix put it best. A professor of law and ethics at the University of Birmingham who argued on the pro-AI side of the debate, she pointed out that the impact AI will have depends on the way we design it, saying “AI is only as good as the data it has been fed.”
She’s right; rather than asking what sort of impact AI will have on humanity, we should start by asking what sort of impact we want it to have. The people working on AI—not AIs themselves—are ultimately responsible for how much good or harm will be done.
Image Credit: IBM Project Debater at Cambridge Union Society, photo courtesy of IBM Research Continue reading
Machine learning algorithms are starting to exceed human performance in many narrow and specific domains, such as image recognition and certain types of medical diagnoses. They’re also rapidly improving in more complex domains such as generating eerily human-like text. We increasingly rely on machine learning algorithms to make decisions on a wide range of topics, from what we collectively spend billions of hours watching to who gets the job.
But machine learning algorithms cannot explain the decisions they make.
How can we justify putting these systems in charge of decisions that affect people’s lives if we don’t understand how they’re arriving at those decisions?
This desire to get more than raw numbers from machine learning algorithms has led to a renewed focus on explainable AI: algorithms that can make a decision or take an action, and tell you the reasons behind it.
What Makes You Say That?
In some circumstances, you can see a road to explainable AI already. Take OpenAI’s GTP-2 model, or IBM’s Project Debater. Both of these generate text based on a large corpus of training data, and try to make it as relevant as possible to the prompt that’s given. If these models were also able to provide a quick run-down of the top few sources in that corpus of training data they were drawing information from, it may be easier to understand where the “argument” (or poetic essay about unicorns) was coming from.
This is similar to the approach Google is now looking at for its image classifiers. Many algorithms are more sensitive to textures and the relationship between adjacent pixels in an image, rather than recognizing objects by their outlines as humans do. This leads to strange results: some algorithms can happily identify a totally scrambled image of a polar bear, but not a polar bear silhouette.
Previous attempts to make image classifiers explainable relied on significance mapping. In this method, the algorithm would highlight the areas of the image that contributed the most statistical weight to making the decision. This is usually determined by changing groups of pixels in the image and seeing which contribute to the biggest change in the algorithm’s impression of what the image is. For example, if the algorithm is trying to recognize a stop sign, changing the background is unlikely to be as important as changing the sign.
Google’s new approach changes the way that its algorithm recognizes objects, by examining them at several different resolutions and searching for matches to different “sub-objects” within the main object. You or I might recognize an ambulance from its flashing lights, its tires, and its logo; we might zoom in on the basketball held by an NBA player to deduce their occupation, and so on. By linking the overall categorization of an image to these “concepts,” the algorithm can explain its decision: I categorized this as a cat because of its tail and whiskers.
Even in this experiment, though, the “psychology” of the algorithm in decision-making is counter-intuitive. For example, in the basketball case, the most important factor in making the decision was actually the player’s jerseys rather than the basketball.
Can You Explain What You Don’t Understand?
While it may seem trivial, the conflict here is a fundamental one in approaches to artificial intelligence. Namely, how far can you get with mere statistical associations between huge sets of data, and how much do you need to introduce abstract concepts for real intelligence to arise?
At one end of the spectrum, Good Old-Fashioned AI or GOFAI dreamed up machines that would be entirely based on symbolic logic. The machine would be hard-coded with the concept of a dog, a flower, cars, and so forth, alongside all of the symbolic “rules” which we internalize, allowing us to distinguish between dogs, flowers, and cars. (You can imagine a similar approach to a conversational AI would teach it words and strict grammatical structures from the top down, rather than “learning” languages from statistical associations between letters and words in training data, as GPT-2 broadly does.)
Such a system would be able to explain itself, because it would deal in high-level, human-understandable concepts. The equation is closer to: “ball” + “stitches” + “white” = “baseball”, rather than a set of millions of numbers linking various pathways together. There are elements of GOFAI in Google’s new approach to explaining its image recognition: the new algorithm can recognize objects based on the sub-objects they contain. To do this, it requires at least a rudimentary understanding of what those sub-objects look like, and the rules that link objects to sub-objects, such as “cats have whiskers.”
The issue, of course, is the—maybe impossible—labor-intensive task of defining all these symbolic concepts and every conceivable rule that could possibly link them together by hand. The difficulty of creating systems like this, which could handle the “combinatorial explosion” present in reality, helped to lead to the first AI winter.
Meanwhile, neural networks rely on training themselves on vast sets of data. Without the “labeling” of supervised learning, this process might bear no relation to any concepts a human could understand (and therefore be utterly inexplicable).
Somewhere between these two, hope explainable AI enthusiasts, is a happy medium that can crunch colossal amounts of data, giving us all of the benefits that recent, neural-network AI has bestowed, while showing its working in terms that humans can understand.
Image Credit: Image by Seanbatty from Pixabay Continue reading