Tag Archives: eyes
The boundaries between digital and physical space are disappearing at a breakneck pace. What was once static and boring is becoming dynamic and magical.
For all of human history, looking at the world through our eyes was the same experience for everyone. Beyond the bounds of an over-active imagination, what you see is the same as what I see.
But all of this is about to change. Over the next two to five years, the world around us is about to light up with layer upon layer of rich, fun, meaningful, engaging, and dynamic data. Data you can see and interact with.
This magical future ahead is called the Spatial Web and will transform every aspect of our lives, from retail and advertising, to work and education, to entertainment and social interaction.
Massive change is underway as a result of a series of converging technologies, from 5G global networks and ubiquitous artificial intelligence, to 30+ billion connected devices (known as the IoT), each of which will generate scores of real-world data every second, everywhere.
The current AI explosion will make everything smart, autonomous, and self-programming. Blockchain and cloud-enabled services will support a secure data layer, putting data back in the hands of users and allowing us to build complex rule-based infrastructure in tomorrow’s virtual worlds.
And with the rise of online-merge-offline (OMO) environments, two-dimensional screens will no longer serve as our exclusive portal to the web. Instead, virtual and augmented reality eyewear will allow us to interface with a digitally-mapped world, richly layered with visual data.
Welcome to the Spatial Web. Over the next few months, I’ll be doing a deep dive into the Spatial Web (a.k.a. Web 3.0), covering what it is, how it works, and its vast implications across industries, from real estate and healthcare to entertainment and the future of work. In this blog, I’ll discuss the what, how, and why of Web 3.0—humanity’s first major foray into our virtual-physical hybrid selves (BTW, this year at Abundance360, we’ll be doing a deep dive into the Spatial Web with the leaders of HTC, Magic Leap, and High-Fidelity).
Let’s dive in.
What is the Spatial Web?
While we humans exist in three dimensions, our web today is flat.
The web was designed for shared information, absorbed through a flat screen. But as proliferating sensors, ubiquitous AI, and interconnected networks blur the lines between our physical and online worlds, we need a spatial web to help us digitally map a three-dimensional world.
To put Web 3.0 in context, let’s take a trip down memory lane. In the late 1980s, the newly-birthed world wide web consisted of static web pages and one-way information—a monumental system of publishing and linking information unlike any unified data system before it. To connect, we had to dial up through unstable modems and struggle through insufferably slow connection speeds.
But emerging from this revolutionary (albeit non-interactive) infodump, Web 2.0 has connected the planet more in one decade than empires did in millennia.
Granting democratized participation through newly interactive sites and applications, today’s web era has turbocharged information-sharing and created ripple effects of scientific discovery, economic growth, and technological progress on an unprecedented scale.
We’ve seen the explosion of social networking sites, wikis, and online collaboration platforms. Consumers have become creators; physically isolated users have been handed a global microphone; and entrepreneurs can now access billions of potential customers.
But if Web 2.0 took the world by storm, the Spatial Web emerging today will leave it in the dust.
While there’s no clear consensus about its definition, the Spatial Web refers to a computing environment that exists in three-dimensional space—a twinning of real and virtual realities—enabled via billions of connected devices and accessed through the interfaces of virtual and augmented reality.
In this way, the Spatial Web will enable us to both build a twin of our physical reality in the virtual realm and bring the digital into our real environments.
It’s the next era of web-like technologies:
Spatial computing technologies, like augmented and virtual reality;
Physical computing technologies, like IoT and robotic sensors;
And decentralized computing: both blockchain—which enables greater security and data authentication—and edge computing, which pushes computing power to where it’s most needed, speeding everything up.
Geared with natural language search, data mining, machine learning, and AI recommendation agents, the Spatial Web is a growing expanse of services and information, navigable with the use of ever-more-sophisticated AI assistants and revolutionary new interfaces.
Where Web 1.0 consisted of static documents and read-only data, Web 2.0 introduced multimedia content, interactive web applications, and social media on two-dimensional screens. But converging technologies are quickly transcending the laptop, and will even disrupt the smartphone in the next decade.
With the rise of wearables, smart glasses, AR / VR interfaces, and the IoT, the Spatial Web will integrate seamlessly into our physical environment, overlaying every conversation, every road, every object, conference room, and classroom with intuitively-presented data and AI-aided interaction.
Think: the Oasis in Ready Player One, where anyone can create digital personas, build and invest in smart assets, do business, complete effortless peer-to-peer transactions, and collect real estate in a virtual world.
Or imagine a virtual replica or “digital twin” of your office, each conference room authenticated on the blockchain, requiring a cryptographic key for entry.
As I’ve discussed with my good friend and “VR guru” Philip Rosedale, I’m absolutely clear that in the not-too-distant future, every physical element of every building in the world is going to be fully digitized, existing as a virtual incarnation or even as N number of these. “Meet me at the top of the Empire State Building?” “Sure, which one?”
This digitization of life means that suddenly every piece of information can become spatial, every environment can be smarter by virtue of AI, and every data point about me and my assets—both virtual and physical—can be reliably stored, secured, enhanced, and monetized.
In essence, the Spatial Web lets us interface with digitally-enhanced versions of our physical environment and build out entirely fictional virtual worlds—capable of running simulations, supporting entire economies, and even birthing new political systems.
But while I’ll get into the weeds of different use cases next week, let’s first concretize.
How Does It Work?
Let’s start with the stack. In the PC days, we had a database accompanied by a program that could ingest that data and present it to us as digestible information on a screen.
Then, in the early days of the web, data migrated to servers. Information was fed through a website, with which you would interface via a browser—whether Mosaic or Mozilla.
And then came the cloud.
Resident at either the edge of the cloud or on your phone, today’s rapidly proliferating apps now allow us to interact with previously read-only data, interfacing through a smartphone. But as Siri and Alexa have brought us verbal interfaces, AI-geared phone cameras can now determine your identity, and sensors are beginning to read our gestures.
And now we’re not only looking at our screens but through them, as the convergence of AI and AR begins to digitally populate our physical worlds.
While Pokémon Go sent millions of mobile game-players on virtual treasure hunts, IKEA is just one of the many companies letting you map virtual furniture within your physical home—simulating everything from cabinets to entire kitchens. No longer the one-sided recipients, we’re beginning to see through sensors, creatively inserting digital content in our everyday environments.
Let’s take a look at how the latest incarnation might work. In this new Web 3.0 stack, my personal AI would act as an intermediary, accessing public or privately-authorized data through the blockchain on my behalf, and then feed it through an interface layer composed of everything from my VR headset, to numerous wearables, to my smart environment (IoT-connected devices or even in-home robots).
But as we attempt to build a smart world with smart infrastructure, smart supply chains and smart everything else, we need a set of basic standards with addresses for people, places, and things. Just like our web today relies on the Internet Protocol (TCP/IP) and other infrastructure, by which your computer is addressed and data packets are transferred, we need infrastructure for the Spatial Web.
And a select group of players is already stepping in to fill this void. Proposing new structural designs for Web 3.0, some are attempting to evolve today’s web model from text-based web pages in 2D to three-dimensional AR and VR web experiences located in both digitally-mapped physical worlds and newly-created virtual ones.
With a spatial programming language analogous to HTML, imagine building a linkable address for any physical or virtual space, granting it a format that then makes it interchangeable and interoperable with all other spaces.
But it doesn’t stop there.
As soon as we populate a virtual room with content, we then need to encode who sees it, who can buy it, who can move it…
And the Spatial Web’s eventual governing system (for posting content on a centralized grid) would allow us to address everything from the room you’re sitting in, to the chair on the other side of the table, to the building across the street.
Just as we have a DNS for the web and the purchasing of web domains, once we give addresses to spaces (akin to granting URLs), we then have the ability to identify and visit addressable locations, physical objects, individuals, or pieces of digital content in cyberspace.
And these not only apply to virtual worlds, but to the real world itself. As new mapping technologies emerge, we can now map rooms, objects, and large-scale environments into virtual space with increasing accuracy.
We might then dictate who gets to move your coffee mug in a virtual conference room, or when a team gets to use the room itself. Rules and permissions would be set in the grid, decentralized governance systems, or in the application layer.
Taken one step further, imagine then monetizing smart spaces and smart assets. If you have booked the virtual conference room, perhaps you’ll let me pay you 0.25 BTC to let me use it instead?
But given the Spatial Web’s enormous technological complexity, what’s allowing it to emerge now?
Why Is It Happening Now?
While countless entrepreneurs have already started harnessing blockchain technologies to build decentralized apps (or dApps), two major developments are allowing today’s birth of Web 3.0:
High-resolution wireless VR/AR headsets are finally catapulting virtual and augmented reality out of a prolonged winter.
The International Data Corporation (IDC) predicts the VR and AR headset market will reach 65.9 million units by 2022. Already in the next 18 months, 2 billion devices will be enabled with AR. And tech giants across the board have long begun investing heavy sums.
In early 2019, HTC is releasing the VIVE Focus, a wireless self-contained VR headset. At the same time, Facebook is charging ahead with its Project Santa Cruz—the Oculus division’s next-generation standalone, wireless VR headset. And Magic Leap has finally rolled out its long-awaited Magic Leap One mixed reality headset.
Mass deployment of 5G will drive 10 to 100-gigabit connection speeds in the next 6 years, matching hardware progress with the needed speed to create virtual worlds.
We’ve already seen tremendous leaps in display technology. But as connectivity speeds converge with accelerating GPUs, we’ll start to experience seamless VR and AR interfaces with ever-expanding virtual worlds.
And with such democratizing speeds, every user will be able to develop in VR.
But accompanying these two catalysts is also an important shift towards the decentralized web and a demand for user-controlled data.
Converging technologies, from immutable ledgers and blockchain to machine learning, are now enabling the more direct, decentralized use of web applications and creation of user content. With no central point of control, middlemen are removed from the equation and anyone can create an address, independently interacting with the network.
Enabled by a permission-less blockchain, any user—regardless of birthplace, gender, ethnicity, wealth, or citizenship—would thus be able to establish digital assets and transfer them seamlessly, granting us a more democratized Internet.
And with data stored on distributed nodes, this also means no single point of failure. One could have multiple backups, accessible only with digital authorization, leaving users immune to any single server failure.
Implications Abound–What’s Next…
With a newly-built stack and an interface built from numerous converging technologies, the Spatial Web will transform every facet of our everyday lives—from the way we organize and access our data, to our social and business interactions, to the way we train employees and educate our children.
We’re about to start spending more time in the virtual world than ever before. Beyond entertainment or gameplay, our livelihoods, work, and even personal decisions are already becoming mediated by a web electrified with AI and newly-emerging interfaces.
In our next blog on the Spatial Web, I’ll do a deep dive into the myriad industry implications of Web 3.0, offering tangible use cases across sectors.
Abundance-Digital Online Community: I’ve created a Digital/Online community of bold, abundance-minded entrepreneurs called Abundance-Digital. Abundance-Digital is my ‘on ramp’ for exponential entrepreneurs – those who want to get involved and play at a higher level. Click here to learn more.
Image Credit: Comeback01 / Shutterstock.com Continue reading →
A new technique using artificial intelligence to manipulate video content gives new meaning to the expression “talking head.”
An international team of researchers showcased the latest advancement in synthesizing facial expressions—including mouth, eyes, eyebrows, and even head position—in video at this month’s 2018 SIGGRAPH, a conference on innovations in computer graphics, animation, virtual reality, and other forms of digital wizardry.
The project is called Deep Video Portraits. It relies on a type of AI called generative adversarial networks (GANs) to modify a “target” actor based on the facial and head movement of a “source” actor. As the name implies, GANs pit two opposing neural networks against one another to create a realistic talking head, right down to the sneer or raised eyebrow.
In this case, the adversaries are actually working together: One neural network generates content, while the other rejects or approves each effort. The back-and-forth interplay between the two eventually produces a realistic result that can easily fool the human eye, including reproducing a static scene behind the head as it bobs back and forth.
The researchers say the technique can be used by the film industry for a variety of purposes, from editing facial expressions of actors for matching dubbed voices to repositioning an actor’s head in post-production. AI can not only produce highly realistic results, but much quicker ones compared to the manual processes used today, according to the researchers. You can read the full paper of their work here.
“Deep Video Portraits shows how such a visual effect could be created with less effort in the future,” said Christian Richardt, from the University of Bath’s motion capture research center CAMERA, in a press release. “With our approach, even the positioning of an actor’s head and their facial expression could be easily edited to change camera angles or subtly change the framing of a scene to tell the story better.”
AI Tech Different Than So-Called “Deepfakes”
The work is far from the first to employ AI to manipulate video and audio. At last year’s SIGGRAPH conference, researchers from the University of Washington showcased their work using algorithms that inserted audio recordings from a person in one instance into a separate video of the same person in a different context.
In this case, they “faked” a video using a speech from former President Barack Obama addressing a mass shooting incident during his presidency. The AI-doctored video injects the audio into an unrelated video of the president while also blending the facial and mouth movements, creating a pretty credible job of lip synching.
A previous paper by many of the same scientists on the Deep Video Portraits project detailed how they were first able to manipulate a video in real time of a talking head (in this case, actor and former California governor Arnold Schwarzenegger). The Face2Face system pulled off this bit of digital trickery using a depth-sensing camera that tracked the facial expressions of an Asian female source actor.
A less sophisticated method of swapping faces using a machine learning software dubbed FakeApp emerged earlier this year. Predictably, the tech—requiring numerous photos of the source actor in order to train the neural network—was used for more juvenile pursuits, such as injecting a person’s face onto a porn star.
The application gave rise to the term “deepfakes,” which is now used somewhat ubiquitously to describe all such instances of AI-manipulated video—much to the chagrin of some of the researchers involved in more legitimate uses.
Fighting AI-Created Video Forgeries
However, the researchers are keenly aware that their work—intended for benign uses such as in the film industry or even to correct gaze and head positions for more natural interactions through video teleconferencing—could be used for nefarious purposes. Fake news is the most obvious concern.
“With ever-improving video editing technology, we must also start being more critical about the video content we consume every day, especially if there is no proof of origin,” said Michael Zollhöfer, a visiting assistant professor at Stanford University and member of the Deep Video Portraits team, in the press release.
Toward that end, the research team is training the same adversarial neural networks to spot video forgeries. They also strongly recommend that developers clearly watermark videos that are edited through AI or otherwise, and denote clearly what part and element of the scene was modified.
To catch less ethical users, the US Department of Defense, through the Defense Advanced Research Projects Agency (DARPA), is supporting a program called Media Forensics. This latest DARPA challenge enlists researchers to develop technologies to automatically assess the integrity of an image or video, as part of an end-to-end media forensics platform.
The DARPA official in charge of the program, Matthew Turek, did tell MIT Technology Review that so far the program has “discovered subtle cues in current GAN-manipulated images and videos that allow us to detect the presence of alterations.” In one reported example, researchers have targeted eyes, which rarely blink in the case of “deepfakes” like those created by FakeApp, because the AI is trained on still pictures. That method would seem to be less effective to spot the sort of forgeries created by Deep Video Portraits, which appears to flawlessly match the entire facial and head movements between the source and target actors.
“We believe that the field of digital forensics should and will receive a lot more attention in the future to develop approaches that can automatically prove the authenticity of a video clip,” Zollhöfer said. “This will lead to ever-better approaches that can spot such modifications even if we humans might not be able to spot them with our own eyes.
Image Credit: Tancha / Shutterstock.com Continue reading →
Recently, I picked up Kai-Fu Lee’s newest book, AI Superpowers.
Kai-Fu Lee is one of the most plugged-in AI investors on the planet, managing over $2 billion between six funds and over 300 portfolio companies in the US and China.
Drawing from his pioneering work in AI, executive leadership at Microsoft, Apple, and Google (where he served as founding president of Google China), and his founding of VC fund Sinovation Ventures, Lee shares invaluable insights about:
The four factors driving today’s AI ecosystems;
China’s extraordinary inroads in AI implementation;
Where autonomous systems are headed;
How we’ll need to adapt.
With a foothold in both Beijing and Silicon Valley, Lee looks at the power balance between Chinese and US tech behemoths—each turbocharging new applications of deep learning and sweeping up global markets in the process.
In this post, I’ll be discussing Lee’s “Four Waves of AI,” an excellent framework for discussing where AI is today and where it’s going. I’ll also be featuring some of the hottest Chinese tech companies leading the charge, worth watching right now.
I’m super excited that this Tuesday, I’ve scored the opportunity to sit down with Kai-Fu Lee to discuss his book in detail via a webinar.
With Sino-US competition heating up, who will own the future of technology?
Let’s dive in.
The First Wave: Internet AI
In this first stage of AI deployment, we’re dealing primarily with recommendation engines—algorithmic systems that learn from masses of user data to curate online content personalized to each one of us.
Think Amazon’s spot-on product recommendations, or that “Up Next” YouTube video you just have to watch before getting back to work, or Facebook ads that seem to know what you’ll buy before you do.
Powered by the data flowing through our networks, internet AI leverages the fact that users automatically label data as we browse. Clicking versus not clicking; lingering on a web page longer than we did on another; hovering over a Facebook video to see what happens at the end.
These cascades of labeled data build a detailed picture of our personalities, habits, demands, and desires: the perfect recipe for more tailored content to keep us on a given platform.
Currently, Lee estimates that Chinese and American companies stand head-to-head when it comes to deployment of internet AI. But given China’s data advantage, he predicts that Chinese tech giants will have a slight lead (60-40) over their US counterparts in the next five years.
While you’ve most definitely heard of Alibaba and Baidu, you’ve probably never stumbled upon Toutiao.
Starting out as a copycat of America’s wildly popular Buzzfeed, Toutiao reached a valuation of $20 billion by 2017, dwarfing Buzzfeed’s valuation by more than a factor of 10. But with almost 120 million daily active users, Toutiao doesn’t just stop at creating viral content.
Equipped with natural-language processing and computer vision, Toutiao’s AI engines survey a vast network of different sites and contributors, rewriting headlines to optimize for user engagement, and processing each user’s online behavior—clicks, comments, engagement time—to curate individualized news feeds for millions of consumers.
And as users grow more engaged with Toutiao’s content, the company’s algorithms get better and better at recommending content, optimizing headlines, and delivering a truly personalized feed.
It’s this kind of positive feedback loop that fuels today’s AI giants surfing the wave of internet AI.
The Second Wave: Business AI
While internet AI takes advantage of the fact that netizens are constantly labeling data via clicks and other engagement metrics, business AI jumps on the data that traditional companies have already labeled in the past.
Think banks issuing loans and recording repayment rates; hospitals archiving diagnoses, imaging data, and subsequent health outcomes; or courts noting conviction history, recidivism, and flight.
While we humans make predictions based on obvious root causes (strong features), AI algorithms can process thousands of weakly correlated variables (weak features) that may have much more to do with a given outcome than the usual suspects.
By scouting out hidden correlations that escape our linear cause-and-effect logic, business AI leverages labeled data to train algorithms that outperform even the most veteran of experts.
Apply these data-trained AI engines to banking, insurance, and legal sentencing, and you get minimized default rates, optimized premiums, and plummeting recidivism rates.
While Lee confidently places America in the lead (90-10) for business AI, China’s substantial lag in structured industry data could actually work in its favor going forward.
In industries where Chinese startups can leapfrog over legacy systems, China has a major advantage.
Take Chinese app Smart Finance, for instance.
While Americans embraced credit and debit cards in the 1970s, China was still in the throes of its Cultural Revolution, largely missing the bus on this technology.
Fast forward to 2017, and China’s mobile payment spending outnumbered that of Americans’ by a ratio of 50 to 1. Without the competition of deeply entrenched credit cards, mobile payments were an obvious upgrade to China’s cash-heavy economy, embraced by 70 percent of China’s 753 million smartphone users by the end of 2017.
But by leapfrogging over credit cards and into mobile payments, China largely left behind the notion of credit.
And here’s where Smart Finance comes in.
An AI-powered app for microfinance, Smart Finance depends almost exclusively on its algorithms to make millions of microloans. For each potential borrower, the app simply requests access to a portion of the user’s phone data.
On the basis of variables as subtle as your typing speed and battery percentage, Smart Finance can predict with astounding accuracy your likelihood of repaying a $300 loan.
Such deployments of business AI and internet AI are already revolutionizing our industries and individual lifestyles. But still on the horizon lie two even more monumental waves— perception AI and autonomous AI.
The Third Wave: Perception AI
In this wave, AI gets an upgrade with eyes, ears, and myriad other senses, merging the digital world with our physical environments.
As sensors and smart devices proliferate through our homes and cities, we are on the verge of entering a trillion-sensor economy.
Companies like China’s Xiaomi are putting out millions of IoT-connected devices, and teams of researchers have already begun prototyping smart dust—solar cell- and sensor-geared particulates that can store and communicate troves of data anywhere, anytime.
As Kai-Fu explains, perception AI “will bring the convenience and abundance of the online world into our offline reality.” Sensor-enabled hardware devices will turn everything from hospitals to cars to schools into online-merge-offline (OMO) environments.
Imagine walking into a grocery store, scanning your face to pull up your most common purchases, and then picking up a virtual assistant (VA) shopping cart. Having pre-loaded your data, the cart adjusts your usual grocery list with voice input, reminds you to get your spouse’s favorite wine for an upcoming anniversary, and guides you through a personalized store route.
While we haven’t yet leveraged the full potential of perception AI, China and the US are already making incredible strides. Given China’s hardware advantage, Lee predicts China currently has a 60-40 edge over its American tech counterparts.
Now the go-to city for startups building robots, drones, wearable technology, and IoT infrastructure, Shenzhen has turned into a powerhouse for intelligent hardware, as I discussed last week. Turbocharging output of sensors and electronic parts via thousands of factories, Shenzhen’s skilled engineers can prototype and iterate new products at unprecedented scale and speed.
With the added fuel of Chinese government support and a relaxed Chinese attitude toward data privacy, China’s lead may even reach 80-20 in the next five years.
Jumping on this wave are companies like Xiaomi, which aims to turn bathrooms, kitchens, and living rooms into smart OMO environments. Having invested in 220 companies and incubated 29 startups that produce its products, Xiaomi surpassed 85 million intelligent home devices by the end of 2017, making it the world’s largest network of these connected products.
One KFC restaurant in China has even teamed up with Alipay (Alibaba’s mobile payments platform) to pioneer a ‘pay-with-your-face’ feature. Forget cash, cards, and cell phones, and let OMO do the work.
The Fourth Wave: Autonomous AI
But the most monumental—and unpredictable—wave is the fourth and final: autonomous AI.
Integrating all previous waves, autonomous AI gives machines the ability to sense and respond to the world around them, enabling AI to move and act productively.
While today’s machines can outperform us on repetitive tasks in structured and even unstructured environments (think Boston Dynamics’ humanoid Atlas or oncoming autonomous vehicles), machines with the power to see, hear, touch and optimize data will be a whole new ballgame.
Think: swarms of drones that can selectively spray and harvest entire farms with computer vision and remarkable dexterity, heat-resistant drones that can put out forest fires 100X more efficiently, or Level 5 autonomous vehicles that navigate smart roads and traffic systems all on their own.
While autonomous AI will first involve robots that create direct economic value—automating tasks on a one-to-one replacement basis—these intelligent machines will ultimately revamp entire industries from the ground up.
Kai-Fu Lee currently puts America in a commanding lead of 90-10 in autonomous AI, especially when it comes to self-driving vehicles. But Chinese government efforts are quickly ramping up the competition.
Already in China’s Zhejiang province, highway regulators and government officials have plans to build China’s first intelligent superhighway, outfitted with sensors, road-embedded solar panels and wireless communication between cars, roads and drivers.
Aimed at increasing transit efficiency by up to 30 percent while minimizing fatalities, the project may one day allow autonomous electric vehicles to continuously charge as they drive.
A similar government-fueled project involves Beijing’s new neighbor Xiong’an. Projected to take in over $580 billion in infrastructure spending over the next 20 years, Xiong’an New Area could one day become the world’s first city built around autonomous vehicles.
Baidu is already working with Xiong’an’s local government to build out this AI city with an environmental focus. Possibilities include sensor-geared cement, computer vision-enabled traffic lights, intersections with facial recognition, and parking lots-turned parks.
Lastly, Lee predicts China will almost certainly lead the charge in autonomous drones. Already, Shenzhen is home to premier drone maker DJI—a company I’ll be visiting with 24 top executives later this month as part of my annual China Platinum Trip.
Named “the best company I have ever encountered” by Chris Anderson, DJI owns an estimated 50 percent of the North American drone market, supercharged by Shenzhen’s extraordinary maker movement.
While the long-term Sino-US competitive balance in fourth wave AI remains to be seen, one thing is certain: in a matter of decades, we will witness the rise of AI-embedded cityscapes and autonomous machines that can interact with the real world and help solve today’s most pressing grand challenges.
Webinar with Dr. Kai-Fu Lee: Dr. Kai-Fu Lee — one of the world’s most respected experts on AI — and I will discuss his latest book AI Superpowers: China, Silicon Valley, and the New World Order. Artificial Intelligence is reshaping the world as we know it. With U.S.-Sino competition heating up, who will own the future of technology? Register here for the free webinar on September 4th, 2018 from 11:00am–12:30pm PST.
Image Credit: Elena11 / Shutterstock.com Continue reading →
By now, you’ve probably seen Google’s new Duplex software, which promises to call people on your behalf to book appointments for haircuts and the like. As yet, it only exists in demo form, but already it seems like Google has made a big stride towards capturing a market that plenty of companies have had their eye on for quite some time. This software is impressive, but it raises questions.
Many of you will be familiar with the stilted, robotic conversations you can have with early chatbots that are, essentially, glorified menus. Instead of pressing 1 to confirm or 2 to re-enter, some of these bots would allow for simple commands like “Yes” or “No,” replacing the buttons with limited ability to recognize a few words. Using them was often a far more frustrating experience than attempting to use a menu—there are few things more irritating than a robot saying, “Sorry, your response was not recognized.”
Google Duplex scheduling a hair salon appointment:
Google Duplex calling a restaurant:
Even getting the response recognized is hard enough. After all, there are countless different nuances and accents to baffle voice recognition software, and endless turns of phrase that amount to saying the same thing that can confound natural language processing (NLP), especially if you like your phrasing quirky.
You may think that standard customer-service type conversations all travel the same route, using similar words and phrasing. But when there are over 80,000 ways to order coffee, and making a mistake is frowned upon, even simple tasks require high accuracy over a huge dataset.
Advances in audio processing, neural networks, and NLP, as well as raw computing power, have meant that basic recognition of what someone is trying to say is less of an issue. Soundhound’s virtual assistant prides itself on being able to process complicated requests (perhaps needlessly complicated).
The deeper issue, as with all attempts to develop conversational machines, is one of understanding context. There are so many ways a conversation can go that attempting to construct a conversation two or three layers deep quickly runs into problems. Multiply the thousands of things people might say by the thousands they might say next, and the combinatorics of the challenge runs away from most chatbots, leaving them as either glorified menus, gimmicks, or rather bizarre to talk to.
Yet Google, who surely remembers from Glass the risk of premature debuts for technology, especially the kind that ask you to rethink how you interact with or trust in software, must have faith in Duplex to show it on the world stage. We know that startups like Semantic Machines and x.ai have received serious funding to perform very similar functions, using natural-language conversations to perform computing tasks, schedule meetings, book hotels, or purchase items.
It’s no great leap to imagine Google will soon do the same, bringing us closer to a world of onboard computing, where Lens labels the world around us and their assistant arranges it for us (all the while gathering more and more data it can convert into personalized ads). The early demos showed some clever tricks for keeping the conversation within a fairly narrow realm where the AI should be comfortable and competent, and the blog post that accompanied the release shows just how much effort has gone into the technology.
Yet given the privacy and ethics funk the tech industry finds itself in, and people’s general unease about AI, the main reaction to Duplex’s impressive demo was concern. The voice sounded too natural, bringing to mind Lyrebird and their warnings of deepfakes. You might trust “Do the Right Thing” Google with this technology, but it could usher in an era when automated robo-callers are far more convincing.
A more human-like voice may sound like a perfectly innocuous improvement, but the fact that the assistant interjects naturalistic “umm” and “mm-hm” responses to more perfectly mimic a human rubbed a lot of people the wrong way. This wasn’t just a voice assistant trying to sound less grinding and robotic; it was actively trying to deceive people into thinking they were talking to a human.
Google is running the risk of trying to get to conversational AI by going straight through the uncanny valley.
“Google’s experiments do appear to have been designed to deceive,” said Dr. Thomas King of the Oxford Internet Institute’s Digital Ethics Lab, according to Techcrunch. “Their main hypothesis was ‘can you distinguish this from a real person?’ In this case it’s unclear why their hypothesis was about deception and not the user experience… there should be some kind of mechanism there to let people know what it is they are speaking to.”
From Google’s perspective, being able to say “90 percent of callers can’t tell the difference between this and a human personal assistant” is an excellent marketing ploy, even though statistics about how many interactions are successful might be more relevant.
In fact, Duplex runs contrary to pretty much every major recommendation about ethics for the use of robotics or artificial intelligence, not to mention certain eavesdropping laws. Transparency is key to holding machines (and the people who design them) accountable, especially when it comes to decision-making.
Then there are the more subtle social issues. One prominent effect social media has had is to allow people to silo themselves; in echo chambers of like-minded individuals, it’s hard to see how other opinions exist. Technology exacerbates this by removing the evolutionary cues that go along with face-to-face interaction. Confronted with a pair of human eyes, people are more generous. Confronted with a Twitter avatar or a Facebook interface, people hurl abuse and criticism they’d never dream of using in a public setting.
Now that we can use technology to interact with ever fewer people, will it change us? Is it fair to offload the burden of dealing with a robot onto the poor human at the other end of the line, who might have to deal with dozens of such calls a day? Google has said that if the AI is in trouble, it will put you through to a human, which might help save receptionists from the hell of trying to explain a concept to dozens of dumbfounded AI assistants all day. But there’s always the risk that failures will be blamed on the person and not the machine.
As AI advances, could we end up treating the dwindling number of people in these “customer-facing” roles as the buggiest part of a fully automatic service? Will people start accusing each other of being robots on the phone, as well as on Twitter?
Google has provided plenty of reassurances about how the system will be used. They have said they will ensure that the system is identified, and it’s hardly difficult to resolve this problem; a slight change in the script from their demo would do it. For now, consumers will likely appreciate moves that make it clear whether the “intelligent agents” that make major decisions for us, that we interact with daily, and that hide behind social media avatars or phone numbers are real or artificial.
Image Credit: Besjunior / Shutterstock.com Continue reading →
In March 2011, Japan was hit by a catastrophic earthquake that triggered a terrible tsunami. Thousands were killed and billions of dollars of damage was done in one of the worst disasters of modern times. For a few perilous weeks, though, the eyes of the world were focused on the Fukushima Daiichi nuclear power plant. Its safety systems were unable to cope with the tsunami damage, and there were widespread fears of another catastrophic meltdown that could spread radiation over several countries, like the Chernobyl disaster in the 1980s. A heroic effort that included dumping seawater into the reactor core prevented an even bigger catastrophe. As it is, a hundred thousand people are still evacuated from the area, and it will likely take many years and hundreds of billions of dollars before the region is safe.
Because radiation is so dangerous to humans, the natural solution to the Fukushima disaster was to send in robots to monitor levels of radiation and attempt to begin the clean-up process. The techno-optimists in Japan had discovered a challenge, deep in the heart of that reactor core, that even their optimism could not solve. The radiation fried the circuits of the robots that were sent in, even those specifically designed and built to deal with the Fukushima catastrophe. The power plant slowly became a vast robot graveyard. While some robots initially saw success in measuring radiation levels around the plant—and, recently, a robot was able to identify the melted uranium fuel at the heart of the disaster—hopes of them playing a substantial role in the clean-up are starting to diminish.
In Tokyo’s neon Shibuya district, it can sometimes seem like it’s brighter at night than it is during the daytime. In karaoke booths on the twelfth floor—because everything is on the twelfth floor—overlooking the brightly-lit streets, businessmen unwind by blasting out pop hits. It can feel like the most artificial place on Earth; your senses are dazzled by the futuristic techno-optimism. Stock footage of the area has become symbolic of futurism and modernity.
Japan has had a reputation for being a nation of futurists for a long time. We’ve already described how tech giant Softbank, headed by visionary founder Masayoshi Son, is investing billions in a technological future, including plans for the world’s largest solar farm.
When Google sold pioneering robotics company Boston Dynamics in 2017, Softbank added it to their portfolio, alongside the famous Nao and Pepper robots. Some may think that Son is taking a gamble in pursuing a robotics project even Google couldn’t succeed in, but this is a man who lost nearly everything in the dot-com crash of 2000. The fact that even this reversal didn’t dent his optimism and faith in technology is telling. But how long can it last?
The failure of Japan’s robots to deal with the immense challenge of Fukushima has sparked something of a crisis of conscience within the industry. Disaster response is an obvious stepping-stone technology for robots. Initially, producing a humanoid robot will be very costly, and the robot will be less capable than a human; building a robot to wait tables might not be particularly economical yet. Building a robot to do jobs that are too dangerous for humans is far more viable. Yet, at Fukushima, in one of the most advanced nations in the world, many of the robots weren’t up to the task.
Nowhere was this crisis more felt than Honda; the company had developed ASIMO, which stunned the world in 2000 and continues to fascinate as an iconic humanoid robot. Despite all this technological advancement, however, Honda knew that ASIMO was still too unreliable for the real world.
It was Fukushima that triggered a sea-change in Honda’s approach to robotics. Two years after the disaster, there were rumblings that Honda was developing a disaster robot, and in October 2017, the prototype was revealed to the public for the first time. It’s not yet ready for deployment in disaster zones, however. Interestingly, the creators chose not to give it dexterous hands but instead to assume that remotely-operated tools fitted to the robot would be a better solution for the range of circumstances it might encounter.
This shift in focus for humanoid robots away from entertainment and amusement like ASIMO, and towards being practically useful, has been mirrored across the world.
In 2015, also inspired by the Fukushima disaster and the lack of disaster-ready robots, the DARPA Robotics Challenge tested humanoid robots with a range of tasks that might be needed in emergency response, such as driving cars, opening doors, and climbing stairs. The Terminator-like ATLAS robot from Boston Dynamics, alongside Korean robot HUBO, took many of the plaudits, and CHIMP also put in an impressive display by being able to right itself after falling.
Yet the DARPA Robotics Challenge showed us just how far the robots are from truly being as useful as we’d like, or maybe even as we would imagine. Many robots took hours to complete the tasks, which were highly idealized to suit them. Climbing stairs proved a particular challenge. Those who watched were more likely to see a robot that had fallen over, struggling to get up, rather than heroic superbots striding in to save the day. The “striding” proved a particular problem, with the fastest robot HUBO managing this by resorting to wheels in its knees when the legs weren’t necessary.
Fukushima may have brought a sea-change over futuristic Japan, but before robots will really begin to enter our everyday lives, they will need to prove their worth. In the interim, aerial drone robots designed to examine infrastructure damage after disasters may well see earlier deployment and more success.
It’s a considerable challenge.
Building a humanoid robot is expensive; if these multi-million-dollar machines can’t help in a crisis, people may begin to question the worth of investing in them in the first place (unless your aim is just to make viral videos). This could lead to a further crisis of confidence among the Japanese, who are starting to rely on humanoid robotics as a solution to the crisis of the aging population. The Japanese government, as part of its robots strategy, has already invested $44 million in their development.
But if they continue to fail when put to the test, that will raise serious concerns. In Tokyo’s Akihabara district, you can see all kinds of flash robotic toys for sale in the neon-lit superstores, and dancing, acting robots like Robothespian can entertain crowds all over the world. But if we want these machines to be anything more than toys—partners, helpers, even saviors—more work needs to be done.
At the same time, those who participated in the DARPA Robotics Challenge in 2015 won’t be too concerned if people were underwhelmed by the performance of their disaster relief robots. Back in 2004, nearly every participant in the DARPA Grand Challenge crashed, caught fire, or failed on the starting line. To an outside observer, the whole thing would have seemed like an unmitigated disaster, and a pointless investment. What was the task in 2004? Developing a self-driving car. A lot can change in a decade.
Image Credit: MARCUSZ2527 / Shutterstock.com Continue reading →