Tag Archives: conversations
#436258 For Centuries, People Dreamed of a ...
This is part six of a six-part series on the history of natural language processing.
In February of this year, OpenAI, one of the foremost artificial intelligence labs in the world, announced that a team of researchers had built a powerful new text generator called the Generative Pre-Trained Transformer 2, or GPT-2 for short. The researchers used a reinforcement learning algorithm to train their system on a broad set of natural language processing (NLP) capabilities, including reading comprehension, machine translation, and the ability to generate long strings of coherent text.
But as is often the case with NLP technology, the tool held both great promise and great peril. Researchers and policy makers at the lab were concerned that their system, if widely released, could be exploited by bad actors and misappropriated for “malicious purposes.”
The people of OpenAI, which defines its mission as “discovering and enacting the path to safe artificial general intelligence,” were concerned that GPT-2 could be used to flood the Internet with fake text, thereby degrading an already fragile information ecosystem. For this reason, OpenAI decided that it would not release the full version of GPT-2 to the public or other researchers.
GPT-2 is an example of a technique in NLP called language modeling, whereby the computational system internalizes a statistical blueprint of a text so it’s able to mimic it. Just like the predictive text on your phone—which selects words based on words you’ve used before—GPT-2 can look at a string of text and then predict what the next word is likely to be based on the probabilities inherent in that text.
GPT-2 can be seen as a descendant of the statistical language modeling that the Russian mathematician A. A. Markov developed in the early 20th century (covered in part three of this series).
GPT-2 used cutting-edge machine learning algorithms to do linguistic analysis with over 1.5 million parameters.
What’s different with GPT-2, though, is the scale of the textual data modeled by the system. Whereas Markov analyzed a string of 20,000 letters to create a rudimentary model that could predict the likelihood of the next letter of a text being a consonant or a vowel, GPT-2 used 8 million articles scraped from Reddit to predict what the next word might be within that entire dataset.
And whereas Markov manually trained his model by counting only two parameters—vowels and consonants—GPT-2 used cutting-edge machine learning algorithms to do linguistic analysis with over 1.5 million parameters, burning through huge amounts of computational power in the process.
The results were impressive. In their blog post, OpenAI reported that GPT-2 could generate synthetic text in response to prompts, mimicking whatever style of text it was shown. If you prompt the system with a line of William Blake’s poetry, it can generate a line back in the Romantic poet’s style. If you prompt the system with a cake recipe, you get a newly invented recipe in response.
Perhaps the most compelling feature of GPT-2 is that it can answer questions accurately. For example, when OpenAI researchers asked the system, “Who wrote the book The Origin of Species?”—it responded: “Charles Darwin.” While only able to respond accurately some of the time, the feature does seem to be a limited realization of Gottfried Leibniz’s dream of a language-generating machine that could answer any and all human questions (described in part two of this series).
After observing the power of the new system in practice, OpenAI elected not to release the fully trained model. In the lead up to its release in February, there had been heightened awareness about “deepfakes”—synthetic images and videos, generated via machine learning techniques, in which people do and say things they haven’t really done and said. Researchers at OpenAI worried that GPT-2 could be used to essentially create deepfake text, making it harder for people to trust textual information online.
Responses to this decision varied. On one hand, OpenAI’s caution prompted an overblown reaction in the media, with articles about the “dangerous” technology feeding into the Frankenstein narrative that often surrounds developments in AI.
Others took issue with OpenAI’s self-promotion, with some even suggesting that OpenAI purposefully exaggerated GPT-2s power in order to create hype—while contravening a norm in the AI research community, where labs routinely share data, code, and pre-trained models. As machine learning researcher Zachary Lipton tweeted, “Perhaps what's *most remarkable* about the @OpenAI controversy is how *unremarkable* the technology is. Despite their outsize attention & budget, the research itself is perfectly ordinary—right in the main branch of deep learning NLP research.”
OpenAI stood by its decision to release only a limited version of GPT-2, but has since released larger models for other researchers and the public to experiment with. As yet, there has been no reported case of a widely distributed fake news article generated by the system. But there have been a number of interesting spin-off projects, including GPT-2 poetry and a webpage where you can prompt the system with questions yourself.
Mimicking humans on Reddit, the bots have long conversations about a variety of topics, including conspiracy theories and
Star Wars movies.
There’s even a Reddit group populated entirely with text produced by GPT-2-powered bots. Mimicking humans on Reddit, the bots have long conversations about a variety of topics, including conspiracy theories and Star Wars movies.
This bot-powered conversation may signify the new condition of life online, where language is increasingly created by a combination of human and non-human agents, and where maintaining the distinction between human and non-human, despite our best efforts, is increasingly difficult.
The idea of using rules, mechanisms, and algorithms to generate language has inspired people in many different cultures throughout history. But it’s in the online world that this powerful form of wordcraft may really find its natural milieu—in an environment where the identity of speakers becomes more ambiguous, and perhaps, less relevant. It remains to be seen what the consequences will be for language, communication, and our sense of human identity, which is so bound up with our ability to speak in natural language.
This is the sixth installment of a six-part series on the history of natural language processing. Last week’s post explained how an innocent Microsoft chatbot turned instantly racist on Twitter.
You can also check out our prior series on the untold history of AI. Continue reading
#436044 Want a Really Hard Machine Learning ...
What’s the world’s hardest machine learning problem? Autonomous vehicles? Robots that can walk? Cancer detection?
Nope, says Julian Sanchez. It’s agriculture.
Sanchez might be a little biased. He is the director of precision agriculture for John Deere, and is in charge of adding intelligence to traditional farm vehicles. But he does have a little perspective, having spent time working on software for both medical devices and air traffic control systems.
I met with Sanchez and Alexey Rostapshov, head of digital innovation at John Deere Labs, at the organization’s San Francisco offices last month. Labs launched in 2017 to take advantage of the area’s tech expertise, both to apply machine learning to in-house agricultural problems and to work with partners to build technologies that play nicely with Deere’s big green machines. Deere’s neighbors in San Francisco’s tech-heavy South of Market are LinkedIn, Salesforce, and Planet Labs, which puts it in a good position for recruiting.
“We’ve literally had folks knock on the door and say, ‘What are you doing here?’” says Rostapshov, and some return to drop off resumes.
Here’s why Sanchez believes agriculture is such a big challenge for artificial intelligence.
“It’s not just about driving tractors around,” he says, although autonomous driving technologies are part of the mix. (John Deere is doing a lot of work with precision GPS to improve autonomous driving, for example, and allow tractors to plan their own routes around fields.)
But more complex than the driving problem, says Sanchez, are the classification problems.
Corn: A Classic Classification Problem
Photo: Tekla Perry
One key effort, Sanchez says, are AI systems “that allow me to tell whether grain being harvested is good quality or low quality and to make automatic adjustment systems for the harvester.” The company is already selling an early version of this image analysis technology. But the many differences between grain types, and grains grown under different conditions, make this task a tough one for machine learning.
“Take corn,” Sanchez says. “Let’s say we are building a deep learning algorithm to detect this corn. And we take lots of pictures of kernels to give it. Say we pick those kernels in central Illinois. But, one mile over, the farmer planted a slightly different hybrid which has slightly different coloration of yellow. Meanwhile, this other farm harvested three days later in a field five miles away; it’s the same hybrid, but it also looks different.
“It’s an overwhelming classification challenge, and that’s just for corn. But you are not only doing it for corn, you have to add 20 more varieties of grain to the mix; and some, like canola, are almost microscopic.”
Even the ground conditions vary dramatically—far more than road conditions, Sanchez points out.
“Let’s say we are building a deep learning algorithm to detect how much residue is left on the soil after a harvest, including stubble and some chaff. Let’s drive 2,000 acres of fields in the Midwest looking at residue. That’s great, but I guarantee that if you go drive those the next year, it will look significantly different.
“Deep learning is great at interpolating conditions between what it knows; it is not good at extrapolating to situations it hasn’t seen. And in agriculture, you always feel that there is a set of conditions that you haven’t yet classified.”
A Flood of Big Data
The scale of the data is also daunting, Rostapshov points out. “We are one of the largest users of cloud computing services in the world,” he says. “We are gathering 5 to 15 million measurements per second from 130,000 connected machines globally. We have over 150 million acres in our databases, using petabytes and petabytes [of storage]. We process more data than Twitter does.”
Much of this information is so-called dirty data, that is, it doesn’t share the same format or structure, because it’s coming not only from a wide variety of John Deere machines, but also includes data from some 100 other companies that have access to the platform, including weather information, aerial imagery, and soil analyses.
As a result, says Sanchez, Deere has had to make “tremendous investments in back-end data cleanup.”
Deep learning is great at interpolating conditions between what it knows; it is not good at extrapolating to situations it hasn’t seen.”
—Julian Sanchez, John Deere
“We have gotten progressively more skilled at that problem,” he says. “We started simply by cleaning up our own data. You’d think it would be nice and neat, since it’s coming from our own machines, but there is a wide variety of different models and different years. Then we started geospatially tagging the agronomic data—the information about where you are applying herbicides and fertilizer and the like—coming in from our vehicles. When we started bringing in other data, from drones, say, we were already good at cleaning it up.”
John Deere’s Hiring Pitch
Hard problems can be a good thing to have for a company looking to hire machine learning engineers.
“Our opening line to potential recruits,” Sanchez says, “is ‘This stuff matters.’ Then, if we get a chance to talk to them more, we follow up with ‘Not only does this stuff matter, but the problems are really hard and interesting.’ When we explain the variability in farming and how we have to apply all the latest tools to these problems, we get their attention.”
Software engineers “know that feeding a growing population is a massive problem and are excited about the prospect of making a difference,” Rostapshov says.
Only 20 engineers work in the San Francisco labs right now, and that’s on a busy day—some of the researchers spend part of their time at Blue River Technology, a startup based in Sunnyvale that was acquired by Deere in 2017. About half of the researchers are focusing on AI. The Lab is in the process of doubling its office space (no word on staffing plans for that expansion yet).
“We are one of the largest users of cloud computing services in the world.”
—Alexey Rostapshov, John Deere Labs
Company-wide, Deere has thousands of software engineers, with many using AI and machine learning tools in their work, and about the same number of mechanical and electrical engineers, Sanchez reports. “If you look at our hiring 10 years ago,” he says, “it was heavily weighted to mechanical engineers. But if you look at those numbers now, it is by a large majority [engineers working] in the software space. We still need mechanical engineers—we do build green machines—but if you go by our footprint of tech talent, it is pretty safe to call John Deere a software company. And if you follow the key conversations that are happening in the company right now, 95 percent of them are software-related.”
For now, these software engineers are focused on developing technologies that allow farmers to “do more with less,” Sanchez says. Meaning, to get more and better crops from less fuel, less seed, less fertilizer, less pesticide, and fewer workers, and putting together building blocks that, he says, could eventually lead to fully autonomous farm vehicles. The data Deere collects today, for the most part, stays in silos (the virtual kind), with AI algorithms that analyze specific sets of data to provide guidance to individual farmers. At some point, however, with tools to anonymize data and buy-in from farmers, aggregating data could provide some powerful insights.
“We are not asking farmers for that yet,” Sanchez says. “We are not doing aggregation to look for patterns. We are focused on offering technology that allows an individual farmer to use less, on positioning ourselves to be in a neutral spot. We are not about selling you more seed or more fertilizer. So we are building up a good trust level. In the long term, we can have conversations about doing more with deep learning.” Continue reading
#434580 How Genome Sequencing and Senolytics Can ...
The causes of aging are extremely complex and unclear. With the dramatic demonetization of genome reading and editing over the past decade, and Big Pharma, startups, and the FDA starting to face aging as a disease, we are starting to find practical ways to extend our healthspan.
Here, in Part 2 of a series of blogs on longevity and vitality, I explore how genome sequencing and editing, along with new classes of anti-aging drugs, are augmenting our biology to further extend our healthy lives.
In this blog I’ll cover two classes of emerging technologies:
Genome Sequencing and Editing;
Senolytics, Nutraceuticals & Pharmaceuticals.
Let’s dive in.
Genome Sequencing & Editing
Your genome is the software that runs your body.
A sequence of 3.2 billion letters makes you “you.” These base pairs of A’s, T’s, C’s, and G’s determine your hair color, your height, your personality, your propensity to disease, your lifespan, and so on.
Until recently, it’s been very difficult to rapidly and cheaply “read” these letters—and even more difficult to understand what they mean.
Since 2001, the cost to sequence a whole human genome has plummeted exponentially, outpacing Moore’s Law threefold. From an initial cost of $3.7 billion, it dropped to $10 million in 2006, and to $5,000 in 2012.
Today, the cost of genome sequencing has dropped below $500, and according to Illumina, the world’s leading sequencing company, the process will soon cost about $100 and take about an hour to complete.
This represents one of the most powerful and transformative technology revolutions in healthcare.
When we understand your genome, we’ll be able to understand how to optimize “you.”
We’ll know the perfect foods, the perfect drugs, the perfect exercise regimen, and the perfect supplements, just for you.
We’ll understand what microbiome types, or gut flora, are ideal for you (more on this in a later blog).
We’ll accurately predict how specific sedatives and medicines will impact you.
We’ll learn which diseases and illnesses you’re most likely to develop and, more importantly, how to best prevent them from developing in the first place (rather than trying to cure them after the fact).
CRISPR Gene Editing
In addition to reading the human genome, scientists can now edit a genome using a naturally-occurring biological system discovered in 1987 called CRISPR/Cas9.
Short for Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9, the editing system was adapted from a naturally-occurring defense system found in bacteria.
Here’s how it works:
The bacteria capture snippets of DNA from invading viruses (or bacteriophage) and use them to create DNA segments known as CRISPR arrays.
The CRISPR arrays allow the bacteria to “remember” the viruses (or closely related ones), and defend against future invasions.
If the viruses attack again, the bacteria produce RNA segments from the CRISPR arrays to target the viruses’ DNA. The bacteria then use Cas9 to cut the DNA apart, which disables the virus.
Most importantly, CRISPR is cheap, quick, easy to use, and more accurate than all previous gene editing methods. As a result, CRISPR/Cas9 has swept through labs around the world as the way to edit a genome.
A short search in the literature will show an exponential rise in the number of CRISPR-related publications and patents.
2018: Filled With CRISPR Breakthroughs
Early results are impressive. Researchers from the University of Chicago recently used CRISPR to genetically engineer cocaine resistance into mice.
Researchers at the University of Texas Southwestern Medical Center used CRISPR to reverse the gene defect causing Duchenne muscular dystrophy (DMD) in dogs (DMD is the most common fatal genetic disease in children).
With great power comes great responsibility, and moral and ethical dilemmas.
In 2015, Chinese scientists sparked global controversy when they first edited human embryo cells in the lab with the goal of modifying genes that would make the child resistant to smallpox, HIV, and cholera.
Three years later, in November 2018, researcher He Jiankui informed the world that the first set of CRISPR-engineered female twins had been delivered.
To accomplish his goal, Jiankui deleted a region of a receptor on the surface of white blood cells known as CCR5, introducing a rare, natural genetic variation that makes it more difficult for HIV to infect its favorite target, white blood cells.
Setting aside the significant ethical conversations, CRISPR will soon provide us the tools to eliminate diseases, create hardier offspring, produce new environmentally resistant crops, and even wipe out pathogens.
Senolytics, Nutraceuticals & Pharmaceuticals
Over the arc of your life, the cells in your body divide until they reach what is known as the Hayflick limit, or the number of times a normal human cell population will divide before cell division stops, which is typically about 50 divisions.
What normally follows next is programmed cell death or destruction by the immune system. A very small fraction of cells, however, become senescent cells and evade this fate to linger indefinitely.
These lingering cells secrete a potent mix of molecules that triggers chronic inflammation, damages the surrounding tissue structures, and changes the behavior of nearby cells for the worse.
Senescent cells appear to be one of the root causes of aging, causing everything from fibrosis and blood vessel calcification, to localized inflammatory conditions such as osteoarthritis, to diminished lung function.
Fortunately, both the scientific and entrepreneurial communities have begun to work on senolytic therapies, moving the technology for selectively destroying senescent cells out of the laboratory and into a half-dozen startup companies.
Prominent companies in the field include the following:
Unity Biotechnology is developing senolytic medicines to selectively eliminate senescent cells with an initial focus on delivering localized therapy in osteoarthritis, ophthalmology and pulmonary disease.
Oisin Biotechnologiesis pioneering a programmable gene therapy that can destroy cells based on their internal biochemistry.
SIWA Therapeuticsis working on an immunotherapy approach to the problem of senescent cells.
In recent years, researchers have identified or designed a handful of senolytic compounds that can curb aging by regulating senescent cells. Two of these drugs that have gained mainstay research traction are rapamycin and metformin.
Rapamycin
Originally extracted from bacteria found on Easter Island, Rapamycin acts on the m-TOR (mechanistic target of rapamycin) pathway to selectively block a key protein that facilitates cell division.
Currently, rapamycin derivatives are widely used as immunosuppression in organ and bone marrow transplants. Research now suggests that use results in prolonged lifespan and enhanced cognitive and immune function.
PureTech Health subsidiary resTORbio (which started 2018 by going public) is working on a rapamycin-based drug intended to enhance immunity and reduce infection. Their clinical-stage RTB101 drug works by inhibiting part of the mTOR pathway.
Results of the drug’s recent clinical trial include:
Decreased incidence of infection
Improved influenza vaccination response
A 30.6 percent decrease in respiratory tract infections
Impressive, to say the least.
Metformin
Metformin is a widely-used generic drug for mitigating liver sugar production in Type 2 diabetes patients.
Researchers have found that Metformin also reduces oxidative stress and inflammation, which otherwise increase as we age.
There is strong evidence that Metformin can augment cellular regeneration and dramatically mitigate cellular senescence by reducing both oxidative stress and inflammation.
Over 100 studies registered on ClinicalTrials.gov are currently following up on strong evidence of Metformin’s protective effect against cancer.
Nutraceuticals and NAD+
Beyond cellular senescence, certain critical nutrients and proteins tend to decline as a function of age. Nutraceuticals combat aging by supplementing and replenishing these declining nutrient levels.
NAD+ exists in every cell, participating in every process from DNA repair to creating the energy vital for cellular processes. It’s been shown that NAD+ levels decline as we age.
The Elysium Health Basis supplement aims to elevate NAD+ levels in the body to extend one’s lifespan. Elysium’s clinical study reports that Basis increases NAD+ levels consistently by a sustained 40 percent.
Conclusion
These are just a taste of the tremendous momentum that longevity and aging technology has right now. As artificial intelligence and quantum computing transform how we decode our DNA and how we discover drugs, genetics and pharmaceuticals will become truly personalized.
The next blog in this series will demonstrate how artificial intelligence is converging with genetics and pharmaceuticals to transform how we approach longevity, aging, and vitality.
We are edging closer to a dramatically extended healthspan—where 100 is the new 60. What will you create, where will you explore, and how will you spend your time if you are able to add an additional 40 healthy years to your life?
Join Me
Abundance Digital is my online educational portal and community of abundance-minded entrepreneurs. You’ll find weekly video updates from Peter, a curated newsfeed of exponential news, and a place to share your bold ideas. Click here to learn more and sign up.
Image Credit: ktsdesign / Shutterstock.com Continue reading