Tag Archives: russian
#436258 For Centuries, People Dreamed of a ...
This is part six of a six-part series on the history of natural language processing.
In February of this year, OpenAI, one of the foremost artificial intelligence labs in the world, announced that a team of researchers had built a powerful new text generator called the Generative Pre-Trained Transformer 2, or GPT-2 for short. The researchers used a reinforcement learning algorithm to train their system on a broad set of natural language processing (NLP) capabilities, including reading comprehension, machine translation, and the ability to generate long strings of coherent text.
But as is often the case with NLP technology, the tool held both great promise and great peril. Researchers and policy makers at the lab were concerned that their system, if widely released, could be exploited by bad actors and misappropriated for “malicious purposes.”
The people of OpenAI, which defines its mission as “discovering and enacting the path to safe artificial general intelligence,” were concerned that GPT-2 could be used to flood the Internet with fake text, thereby degrading an already fragile information ecosystem. For this reason, OpenAI decided that it would not release the full version of GPT-2 to the public or other researchers.
GPT-2 is an example of a technique in NLP called language modeling, whereby the computational system internalizes a statistical blueprint of a text so it’s able to mimic it. Just like the predictive text on your phone—which selects words based on words you’ve used before—GPT-2 can look at a string of text and then predict what the next word is likely to be based on the probabilities inherent in that text.
GPT-2 can be seen as a descendant of the statistical language modeling that the Russian mathematician A. A. Markov developed in the early 20th century (covered in part three of this series).
GPT-2 used cutting-edge machine learning algorithms to do linguistic analysis with over 1.5 million parameters.
What’s different with GPT-2, though, is the scale of the textual data modeled by the system. Whereas Markov analyzed a string of 20,000 letters to create a rudimentary model that could predict the likelihood of the next letter of a text being a consonant or a vowel, GPT-2 used 8 million articles scraped from Reddit to predict what the next word might be within that entire dataset.
And whereas Markov manually trained his model by counting only two parameters—vowels and consonants—GPT-2 used cutting-edge machine learning algorithms to do linguistic analysis with over 1.5 million parameters, burning through huge amounts of computational power in the process.
The results were impressive. In their blog post, OpenAI reported that GPT-2 could generate synthetic text in response to prompts, mimicking whatever style of text it was shown. If you prompt the system with a line of William Blake’s poetry, it can generate a line back in the Romantic poet’s style. If you prompt the system with a cake recipe, you get a newly invented recipe in response.
Perhaps the most compelling feature of GPT-2 is that it can answer questions accurately. For example, when OpenAI researchers asked the system, “Who wrote the book The Origin of Species?”—it responded: “Charles Darwin.” While only able to respond accurately some of the time, the feature does seem to be a limited realization of Gottfried Leibniz’s dream of a language-generating machine that could answer any and all human questions (described in part two of this series).
After observing the power of the new system in practice, OpenAI elected not to release the fully trained model. In the lead up to its release in February, there had been heightened awareness about “deepfakes”—synthetic images and videos, generated via machine learning techniques, in which people do and say things they haven’t really done and said. Researchers at OpenAI worried that GPT-2 could be used to essentially create deepfake text, making it harder for people to trust textual information online.
Responses to this decision varied. On one hand, OpenAI’s caution prompted an overblown reaction in the media, with articles about the “dangerous” technology feeding into the Frankenstein narrative that often surrounds developments in AI.
Others took issue with OpenAI’s self-promotion, with some even suggesting that OpenAI purposefully exaggerated GPT-2s power in order to create hype—while contravening a norm in the AI research community, where labs routinely share data, code, and pre-trained models. As machine learning researcher Zachary Lipton tweeted, “Perhaps what's *most remarkable* about the @OpenAI controversy is how *unremarkable* the technology is. Despite their outsize attention & budget, the research itself is perfectly ordinary—right in the main branch of deep learning NLP research.”
OpenAI stood by its decision to release only a limited version of GPT-2, but has since released larger models for other researchers and the public to experiment with. As yet, there has been no reported case of a widely distributed fake news article generated by the system. But there have been a number of interesting spin-off projects, including GPT-2 poetry and a webpage where you can prompt the system with questions yourself.
Mimicking humans on Reddit, the bots have long conversations about a variety of topics, including conspiracy theories and
Star Wars movies.
There’s even a Reddit group populated entirely with text produced by GPT-2-powered bots. Mimicking humans on Reddit, the bots have long conversations about a variety of topics, including conspiracy theories and Star Wars movies.
This bot-powered conversation may signify the new condition of life online, where language is increasingly created by a combination of human and non-human agents, and where maintaining the distinction between human and non-human, despite our best efforts, is increasingly difficult.
The idea of using rules, mechanisms, and algorithms to generate language has inspired people in many different cultures throughout history. But it’s in the online world that this powerful form of wordcraft may really find its natural milieu—in an environment where the identity of speakers becomes more ambiguous, and perhaps, less relevant. It remains to be seen what the consequences will be for language, communication, and our sense of human identity, which is so bound up with our ability to speak in natural language.
This is the sixth installment of a six-part series on the history of natural language processing. Last week’s post explained how an innocent Microsoft chatbot turned instantly racist on Twitter.
You can also check out our prior series on the untold history of AI. Continue reading
#435512 Russian Humanoid Robot to Pilot Soyuz ...
Skybot F-850 will spend a week on the ISS charming astronauts with its sense of humor Continue reading
#435436 Undeclared Wars in Cyberspace Are ...
The US is at war. That’s probably not exactly news, as the country has been engaged in one type of conflict or another for most of its history. The last time we officially declared war was after Japan bombed Pearl Harbor in December 1941.
Our biggest undeclared war today is not being fought by drones in the mountains of Afghanistan or even through the less-lethal barrage of threats over the nuclear programs in North Korea and Iran. In this particular war, it is the US that is under attack and on the defensive.
This is cyberwarfare.
The definition of what constitutes a cyber attack is a broad one, according to Greg White, executive director of the Center for Infrastructure Assurance and Security (CIAS) at The University of Texas at San Antonio (UTSA).
At the level of nation-state attacks, cyberwarfare could involve “attacking systems during peacetime—such as our power grid or election systems—or it could be during war time in which case the attacks may be designed to cause destruction, damage, deception, or death,” he told Singularity Hub.
For the US, the Pearl Harbor of cyberwarfare occurred during 2016 with the Russian interference in the presidential election. However, according to White, an Air Force veteran who has been involved in computer and network security since 1986, the history of cyber war can be traced back much further, to at least the first Gulf War of the early 1990s.
“We started experimenting with cyber attacks during the first Gulf War, so this has been going on a long time,” he said. “Espionage was the prime reason before that. After the war, the possibility of expanding the types of targets utilized expanded somewhat. What is really interesting is the use of social media and things like websites for [psychological operation] purposes during a conflict.”
The 2008 conflict between Russia and the Republic of Georgia is often cited as a cyberwarfare case study due to the large scale and overt nature of the cyber attacks. Russian hackers managed to bring down more than 50 news, government, and financial websites through denial-of-service attacks. In addition, about 35 percent of Georgia’s internet networks suffered decreased functionality during the attacks, coinciding with the Russian invasion of South Ossetia.
The cyberwar also offers lessons for today on Russia’s approach to cyberspace as a tool for “holistic psychological manipulation and information warfare,” according to a 2018 report called Understanding Cyberwarfare from the Modern War Institute at West Point.
US Fights Back
News in recent years has highlighted how Russian hackers have attacked various US government entities and critical infrastructure such as energy and manufacturing. In particular, a shadowy group known as Unit 26165 within the country’s military intelligence directorate is believed to be behind the 2016 US election interference campaign.
However, the US hasn’t been standing idly by. Since at least 2012, the US has put reconnaissance probes into the control systems of the Russian electric grid, The New York Times reported. More recently, we learned that the US military has gone on the offensive, putting “crippling malware” inside the Russian power grid as the U.S. Cyber Command flexes its online muscles thanks to new authority granted to it last year.
“Access to the power grid that is obtained now could be used to shut something important down in the future when we are in a war,” White noted. “Espionage is part of the whole program. It is important to remember that cyber has just provided a new domain in which to conduct the types of activities we have been doing in the real world for years.”
The US is also beginning to pour more money into cybersecurity. The 2020 fiscal budget calls for spending $17.4 billion throughout the government on cyber-related activities, with the Department of Defense (DoD) alone earmarked for $9.6 billion.
Despite the growing emphasis on cybersecurity in the US and around the world, the demand for skilled security professionals is well outpacing the supply, with a projected shortfall of nearly three million open or unfilled positions according to the non-profit IT security organization (ISC)².
UTSA is rare among US educational institutions in that security courses and research are being conducted across three different colleges, according to White. About 10 percent of the school’s 30,000-plus students are enrolled in a cyber-related program, he added, and UTSA is one of only 21 schools that has received the Cyber Operations Center of Excellence designation from the National Security Agency.
“This track in the computer science program is specifically designed to prepare students for the type of jobs they might be involved in if they went to work for the DoD,” White said.
However, White is extremely doubtful there will ever be enough cyber security professionals to meet demand. “I’ve been preaching that we’ve got to worry about cybersecurity in the workforce, not just the cybersecurity workforce, not just cybersecurity professionals. Everybody has a responsibility for cybersecurity.”
Artificial Intelligence in Cybersecurity
Indeed, humans are often seen as the weak link in cybersecurity. That point was driven home at a cybersecurity roundtable discussion during this year’s Brainstorm Tech conference in Aspen, Colorado.
Participant Dorian Daley, general counsel at Oracle, said insider threats are at the top of the list when it comes to cybersecurity. “Sadly, I think some of the biggest challenges are people, and I mean that in a number of ways. A lot of the breaches really come from insiders. So the more that you can automate things and you can eliminate human malicious conduct, the better.”
White noted that automation is already the norm in cybersecurity. “Humans can’t react as fast as systems can launch attacks, so we need to rely on automated defenses as well,” he said. “This doesn’t mean that humans are not in the loop, but much of what is done these days is ‘scripted’.”
The use of artificial intelligence, machine learning, and other advanced automation techniques have been part of the cybersecurity conversation for quite some time, according to White, such as pattern analysis to look for specific behaviors that might indicate an attack is underway.
“What we are seeing quite a bit of today falls under the heading of big data and data analytics,” he explained.
But there are signs that AI is going off-script when it comes to cyber attacks. In the hands of threat groups, AI applications could lead to an increase in the number of cyberattacks, wrote Michelle Cantos, a strategic intelligence analyst at cybersecurity firm FireEye.
“Current AI technology used by businesses to analyze consumer behavior and find new customer bases can be appropriated to help attackers find better targets,” she said. “Adversaries can use AI to analyze datasets and generate recommendations for high-value targets they think the adversary should hit.”
In fact, security researchers have already demonstrated how a machine learning system could be used for malicious purposes. The Social Network Automated Phishing with Reconnaissance system, or SNAP_R, generated more than four times as many spear-phishing tweets on Twitter than a human—and was just as successful at targeting victims in order to steal sensitive information.
Cyber war is upon us. And like the current war on terrorism, there are many battlefields from which the enemy can attack and then disappear. While total victory is highly unlikely in the traditional sense, innovations through AI and other technologies can help keep the lights on against the next cyber attack.
Image Credit: pinkeyes / Shutterstock.com Continue reading
#434673 The World’s Most Valuable AI ...
It recognizes our faces. It knows the videos we might like. And it can even, perhaps, recommend the best course of action to take to maximize our personal health.
Artificial intelligence and its subset of disciplines—such as machine learning, natural language processing, and computer vision—are seemingly becoming integrated into our daily lives whether we like it or not. What was once sci-fi is now ubiquitous research and development in company and university labs around the world.
Similarly, the startups working on many of these AI technologies have seen their proverbial stock rise. More than 30 of these companies are now valued at over a billion dollars, according to data research firm CB Insights, which itself employs algorithms to provide insights into the tech business world.
Private companies with a billion-dollar valuation were so uncommon not that long ago that they were dubbed unicorns. Now there are 325 of these once-rare creatures, with a combined valuation north of a trillion dollars, as CB Insights maintains a running count of this exclusive Unicorn Club.
The subset of AI startups accounts for about 10 percent of the total membership, growing rapidly in just 4 years from 0 to 32. Last year, an unprecedented 17 AI startups broke the billion-dollar barrier, with 2018 also a record year for venture capital into private US AI companies at $9.3 billion, CB Insights reported.
What exactly is all this money funding?
AI Keeps an Eye Out for You
Let’s start with the bad news first.
Facial recognition is probably one of the most ubiquitous applications of AI today. It’s actually a decades-old technology often credited to a man named Woodrow Bledsoe, who used an instrument called a RAND tablet that could semi-autonomously match faces from a database. That was in the 1960s.
Today, most of us are familiar with facial recognition as a way to unlock our smartphones. But the technology has gained notoriety as a surveillance tool of law enforcement, particularly in China.
It’s no secret that the facial recognition algorithms developed by several of the AI unicorns from China—SenseTime, CloudWalk, and Face++ (also known as Megvii)—are used to monitor the country’s 1.3 billion citizens. Police there are even equipped with AI-powered eyeglasses for such purposes.
A fourth billion-dollar Chinese startup, Yitu Technologies, also produces a platform for facial recognition in the security realm, and develops AI systems in healthcare on top of that. For example, its CARE.AITM Intelligent 4D Imaging System for Chest CT can reputedly identify in real time a variety of lesions for the possible early detection of cancer.
The AI Doctor Is In
As Peter Diamandis recently noted, AI is rapidly augmenting healthcare and longevity. He mentioned another AI unicorn from China in this regard—iCarbonX, which plans to use machines to develop personalized health plans for every individual.
A couple of AI unicorns on the hardware side of healthcare are OrCam Technologies and Butterfly. The former, an Israeli company, has developed a wearable device for the vision impaired called MyEye that attaches to one’s eyeglasses. The device can identify people and products, as well as read text, conveying the information through discrete audio.
Butterfly Network, out of Connecticut, has completely upended the healthcare market with a handheld ultrasound machine that works with a smartphone.
“Orcam and Butterfly are amazing examples of how machine learning can be integrated into solutions that provide a step-function improvement over state of the art in ultra-competitive markets,” noted Andrew Byrnes, investment director at Comet Labs, a venture capital firm focused on AI and robotics, in an email exchange with Singularity Hub.
AI in the Driver’s Seat
Comet Labs’ portfolio includes two AI unicorns, Megvii and Pony.ai.
The latter is one of three billion-dollar startups developing the AI technology behind self-driving cars, with the other two being Momenta.ai and Zoox.
Founded in 2016 near San Francisco (with another headquarters in China), Pony.ai debuted its latest self-driving system, called PonyAlpha, last year. The platform uses multiple sensors (LiDAR, cameras, and radar) to navigate its environment, but its “sensor fusion technology” makes things simple by choosing the most reliable sensor data for any given driving scenario.
Zoox is another San Francisco area startup founded a couple of years earlier. In late 2018, it got the green light from the state of California to be the first autonomous vehicle company to transport a passenger as part of a pilot program. Meanwhile, China-based Momenta.ai is testing level four autonomy for its self-driving system. Autonomous driving levels are ranked zero to five, with level five being equal to a human behind the wheel.
The hype around autonomous driving is currently in overdrive, and Byrnes thinks regulatory roadblocks will keep most self-driving cars in idle for the foreseeable future. The exception, he said, is China, which is adopting a “systems” approach to autonomy for passenger transport.
“If [autonomous mobility] solves bigger problems like traffic that can elicit government backing, then that has the potential to go big fast,” he said. “This is why we believe Pony.ai will be a winner in the space.”
AI in the Back Office
An AI-powered technology that perhaps only fans of the cult classic Office Space might appreciate has suddenly taken the business world by storm—robotic process automation (RPA).
RPA companies take the mundane back office work, such as filling out invoices or processing insurance claims, and turn it over to bots. The intelligent part comes into play because these bots can tackle unstructured data, such as text in an email or even video and pictures, in order to accomplish an increasing variety of tasks.
Both Automation Anywhere and UiPath are older companies, founded in 2003 and 2005, respectively. However, since just 2017, they have raised nearly a combined $1 billion in disclosed capital.
Cybersecurity Embraces AI
Cybersecurity is another industry where AI is driving investment into startups. Sporting imposing names like CrowdStrike, Darktrace, and Tanium, these cybersecurity companies employ different machine-learning techniques to protect computers and other IT assets beyond the latest software update or virus scan.
Darktrace, for instance, takes its inspiration from the human immune system. Its algorithms can purportedly “learn” the unique pattern of each device and user on a network, detecting emerging problems before things spin out of control.
All three companies are used by major corporations and governments around the world. CrowdStrike itself made headlines a few years ago when it linked the hacking of the Democratic National Committee email servers to the Russian government.
Looking Forward
I could go on, and introduce you to the world’s most valuable startup, a Chinese company called Bytedance that is valued at $75 billion for news curation and an app to create 15-second viral videos. But that’s probably not where VC firms like Comet Labs are generally putting their money.
Byrnes sees real value in startups that are taking “data-driven approaches to problems specific to unique industries.” Take the example of Chicago-based unicorn Uptake Technologies, which analyzes incoming data from machines, from wind turbines to tractors, to predict problems before they occur with the machinery. A not-yet unicorn called PingThings in the Comet Labs portfolio does similar predictive analytics for the energy utilities sector.
“One question we like asking is, ‘What does the state of the art look like in your industry in three to five years?’” Byrnes said. “We ask that a lot, then we go out and find the technology-focused teams building those things.”
Image Credit: Andrey Suslov / Shutterstock.com Continue reading
#433807 The How, Why, and Whether of Custom ...
A digital afterlife may soon be within reach, but it might not be for your benefit.
The reams of data we’re creating could soon make it possible to create digital avatars that live on after we die, aimed at comforting our loved ones or sharing our experience with future generations.
That may seem like a disappointing downgrade from the vision promised by the more optimistic futurists, where we upload our consciousness to the cloud and live forever in machines. But it might be a realistic possibility in the not-too-distant future—and the first steps have already been taken.
After her friend died in a car crash, Eugenia Kuyda, co-founder of Russian AI startup Luka, trained a neural network-powered chatbot on their shared message history to mimic him. Journalist and amateur coder James Vlahos took a more involved approach, carrying out extensive interviews with his terminally ill father so that he could create a digital clone of him when he died.
For those of us without the time or expertise to build our own artificial intelligence-powered avatar, startup Eternime is offering to take your social media posts and interactions as well as basic personal information to build a copy of you that could then interact with relatives once you’re gone. The service is so far only running a private beta with a handful of people, but with 40,000 on its waiting list, it’s clear there’s a market.
Comforting—Or Creepy?
The whole idea may seem eerily similar to the Black Mirror episode Be Right Back, in which a woman pays a company to create a digital copy of her deceased husband and eventually a realistic robot replica. And given the show’s focus on the emotional turmoil she goes through, people might question whether the idea is a sensible one.
But it’s hard to say at this stage whether being able to interact with an approximation of a deceased loved one would be a help or a hindrance in the grieving process. The fear is that it could make it harder for people to “let go” or “move on,” but others think it could play a useful therapeutic role, reminding people that just because someone is dead it doesn’t mean they’re gone, and providing a novel way for them to express and come to terms with their feelings.
While at present most envisage these digital resurrections as a way to memorialize loved ones, there are also more ambitious plans to use the technology as a way to preserve expertise and experience. A project at MIT called Augmented Eternity is investigating whether we could use AI to trawl through someone’s digital footprints and extract both their knowledge and elements of their personality.
Project leader Hossein Rahnama says he’s already working with a CEO who wants to leave behind a digital avatar that future executives could consult with after he’s gone. And you wouldn’t necessarily have to wait until you’re dead—experts could create virtual clones of themselves that could dispense advice on demand to far more people. These clones could soon be more than simple chatbots, too. Hollywood has already started spending millions of dollars to create 3D scans of its most bankable stars so that they can keep acting beyond the grave.
It’s easy to see the appeal of the idea; imagine if we could bring back Stephen Hawking or Tim Cook to share their wisdom with us. And what if we could create a digital brain trust combining the experience and wisdom of all the world’s greatest thinkers, accessible on demand?
But there are still huge hurdles ahead before we could create truly accurate representations of people by simply trawling through their digital remains. The first problem is data. Most peoples’ digital footprints only started reaching significant proportions in the last decade or so, and cover a relatively small period of their lives. It could take many years before there’s enough data to create more than just a superficial imitation of someone.
And that’s assuming that the data we produce is truly representative of who we are. Carefully-crafted Instagram profiles and cautiously-worded work emails hardly capture the messy realities of most peoples’ lives.
Perhaps if the idea is simply to create a bank of someone’s knowledge and expertise, accurately capturing the essence of their character would be less important. But these clones would also be static. Real people continually learn and change, but a digital avatar is a snapshot of someone’s character and opinions at the point they died. An inability to adapt as the world around them changes could put a shelf life on the usefulness of these replicas.
Who’s Calling the (Digital) Shots?
It won’t stop people trying, though, and that raises a potentially more important question: Who gets to make the calls about our digital afterlife? The subjects, their families, or the companies that hold their data?
In most countries, the law is currently pretty hazy on this topic. Companies like Google and Facebook have processes to let you choose who should take control of your accounts in the event of your death. But if you’ve forgotten to do that, the fate of your virtual remains comes down to a tangle of federal law, local law, and tech company terms of service.
This lack of regulation could create incentives and opportunities for unscrupulous behavior. The voice of a deceased loved one could be a highly persuasive tool for exploitation, and digital replicas of respected experts could be powerful means of pushing a hidden agenda.
That means there’s a pressing need for clear and unambiguous rules. Researchers at Oxford University recently suggested ethical guidelines that would treat our digital remains the same way museums and archaeologists are required to treat mortal remains—with dignity and in the interest of society.
Whether those kinds of guidelines are ever enshrined in law remains to be seen, but ultimately they may decide whether the digital afterlife turns out to be heaven or hell.
Image Credit: frankie’s / Shutterstock.com Continue reading