Tag Archives: Visual

#435172 DARPA’s New Project Is Investing ...

When Elon Musk and DARPA both hop aboard the cyborg hypetrain, you know brain-machine interfaces (BMIs) are about to achieve the impossible.

BMIs, already the stuff of science fiction, facilitate crosstalk between biological wetware with external computers, turning human users into literal cyborgs. Yet mind-controlled robotic arms, microelectrode “nerve patches”, or “memory Band-Aids” are still purely experimental medical treatments for those with nervous system impairments.

With the Next-Generation Nonsurgical Neurotechnology (N3) program, DARPA is looking to expand BMIs to the military. This month, the project tapped six academic teams to engineer radically different BMIs to hook up machines to the brains of able-bodied soldiers. The goal is to ditch surgery altogether—while minimizing any biological interventions—to link up brain and machine.

Rather than microelectrodes, which are currently surgically inserted into the brain to hijack neural communication, the project is looking to acoustic signals, electromagnetic waves, nanotechnology, genetically-enhanced neurons, and infrared beams for their next-gen BMIs.

It’s a radical departure from current protocol, with potentially thrilling—or devastating—impact. Wireless BMIs could dramatically boost bodily functions of veterans with neural damage or post-traumatic stress disorder (PTSD), or allow a single soldier to control swarms of AI-enabled drones with his or her mind. Or, similar to the Black Mirror episode Men Against Fire, it could cloud the perception of soldiers, distancing them from the emotional guilt of warfare.

When trickled down to civilian use, these new technologies are poised to revolutionize medical treatment. Or they could galvanize the transhumanist movement with an inconceivably powerful tool that fundamentally alters society—for better or worse.

Here’s what you need to know.

Radical Upgrades
The four-year N3 program focuses on two main aspects: noninvasive and “minutely” invasive neural interfaces to both read and write into the brain.

Because noninvasive technologies sit on the scalp, their sensors and stimulators will likely measure entire networks of neurons, such as those controlling movement. These systems could then allow soldiers to remotely pilot robots in the field—drones, rescue bots, or carriers like Boston Dynamics’ BigDog. The system could even boost multitasking prowess—mind-controlling multiple weapons at once—similar to how able-bodied humans can operate a third robotic arm in addition to their own two.

In contrast, minutely invasive technologies allow scientists to deliver nanotransducers without surgery: for example, an injection of a virus carrying light-sensitive sensors, or other chemical, biotech, or self-assembled nanobots that can reach individual neurons and control their activity independently without damaging sensitive tissue. The proposed use for these technologies isn’t yet well-specified, but as animal experiments have shown, controlling the activity of single neurons at multiple points is sufficient to program artificial memories of fear, desire, and experiences directly into the brain.

“A neural interface that enables fast, effective, and intuitive hands-free interaction with military systems by able-bodied warfighters is the ultimate program goal,” DARPA wrote in its funding brief, released early last year.

The only technologies that will be considered must have a viable path toward eventual use in healthy human subjects.

“Final N3 deliverables will include a complete integrated bidirectional brain-machine interface system,” the project description states. This doesn’t just include hardware, but also new algorithms tailored to these system, demonstrated in a “Department of Defense-relevant application.”

The Tools
Right off the bat, the usual tools of the BMI trade, including microelectrodes, MRI, or transcranial magnetic stimulation (TMS) are off the table. These popular technologies rely on surgery, heavy machinery, or personnel to sit very still—conditions unlikely in the real world.

The six teams will tap into three different kinds of natural phenomena for communication: magnetism, light beams, and acoustic waves.

Dr. Jacob Robinson at Rice University, for example, is combining genetic engineering, infrared laser beams, and nanomagnets for a bidirectional system. The $18 million project, MOANA (Magnetic, Optical and Acoustic Neural Access device) uses viruses to deliver two extra genes into the brain. One encodes a protein that sits on top of neurons and emits infrared light when the cell activates. Red and infrared light can penetrate through the skull. This lets a skull cap, embedded with light emitters and detectors, pick up these signals for subsequent decoding. Ultra-fast and utra-sensitvie photodetectors will further allow the cap to ignore scattered light and tease out relevant signals emanating from targeted portions of the brain, the team explained.

The other new gene helps write commands into the brain. This protein tethers iron nanoparticles to the neurons’ activation mechanism. Using magnetic coils on the headset, the team can then remotely stimulate magnetic super-neurons to fire while leaving others alone. Although the team plans to start in cell cultures and animals, their goal is to eventually transmit a visual image from one person to another. “In four years we hope to demonstrate direct, brain-to-brain communication at the speed of thought and without brain surgery,” said Robinson.

Other projects in N3 are just are ambitious.

The Carnegie Mellon team, for example, plans to use ultrasound waves to pinpoint light interaction in targeted brain regions, which can then be measured through a wearable “hat.” To write into the brain, they propose a flexible, wearable electrical mini-generator that counterbalances the noisy effect of the skull and scalp to target specific neural groups.

Similarly, a group at Johns Hopkins is also measuring light path changes in the brain to correlate them with regional brain activity to “read” wetware commands.

The Teledyne Scientific & Imaging group, in contrast, is turning to tiny light-powered “magnetometers” to detect small, localized magnetic fields that neurons generate when they fire, and match these signals to brain output.

The nonprofit Battelle team gets even fancier with their ”BrainSTORMS” nanotransducers: magnetic nanoparticles wrapped in a piezoelectric shell. The shell can convert electrical signals from neurons into magnetic ones and vice-versa. This allows external transceivers to wirelessly pick up the transformed signals and stimulate the brain through a bidirectional highway.

The magnetometers can be delivered into the brain through a nasal spray or other non-invasive methods, and magnetically guided towards targeted brain regions. When no longer needed, they can once again be steered out of the brain and into the bloodstream, where the body can excrete them without harm.

Four-Year Miracle
Mind-blown? Yeah, same. However, the challenges facing the teams are enormous.

DARPA’s stated goal is to hook up at least 16 sites in the brain with the BMI, with a lag of less than 50 milliseconds—on the scale of average human visual perception. That’s crazy high resolution for devices sitting outside the brain, both in space and time. Brain tissue, blood vessels, and the scalp and skull are all barriers that scatter and dissipate neural signals. All six teams will need to figure out the least computationally-intensive ways to fish out relevant brain signals from background noise, and triangulate them to the appropriate brain region to decipher intent.

In the long run, four years and an average $20 million per project isn’t much to potentially transform our relationship with machines—for better or worse. DARPA, to its credit, is keenly aware of potential misuse of remote brain control. The program is under the guidance of a panel of external advisors with expertise in bioethical issues. And although DARPA’s focus is on enabling able-bodied soldiers to better tackle combat challenges, it’s hard to argue that wireless, non-invasive BMIs will also benefit those most in need: veterans and other people with debilitating nerve damage. To this end, the program is heavily engaging the FDA to ensure it meets safety and efficacy regulations for human use.

Will we be there in just four years? I’m skeptical. But these electrical, optical, acoustic, magnetic, and genetic BMIs, as crazy as they sound, seem inevitable.

“DARPA is preparing for a future in which a combination of unmanned systems, AI, and cyber operations may cause conflicts to play out on timelines that are too short for humans to effectively manage with current technology alone,” said Al Emondi, the N3 program manager.

The question is, now that we know what’s in store, how should the rest of us prepare?

Image Credit: With permission from DARPA N3 project. Continue reading

Posted in Human Robots

#435127 Teaching AI the Concept of ‘Similar, ...

As a human you instinctively know that a leopard is closer to a cat than a motorbike, but the way we train most AI makes them oblivious to these kinds of relations. Building the concept of similarity into our algorithms could make them far more capable, writes the author of a new paper in Science Robotics.

Convolutional neural networks have revolutionized the field of computer vision to the point that machines are now outperforming humans on some of the most challenging visual tasks. But the way we train them to analyze images is very different from the way humans learn, says Atsuto Maki, an associate professor at KTH Royal Institute of Technology.

“Imagine that you are two years old and being quizzed on what you see in a photo of a leopard,” he writes. “You might answer ‘a cat’ and your parents might say, ‘yeah, not quite but similar’.”

In contrast, the way we train neural networks rarely gives that kind of partial credit. They are typically trained to have very high confidence in the correct label and consider all incorrect labels, whether ”cat” or “motorbike,” equally wrong. That’s a mistake, says Maki, because ignoring the fact that something can be “less wrong” means you’re not exploiting all of the information in the training data.

Even when models are trained this way, there will be small differences in the probabilities assigned to incorrect labels that can tell you a lot about how well the model can generalize what it has learned to unseen data.

If you show a model a picture of a leopard and it gives “cat” a probability of five percent and “motorbike” one percent, that suggests it picked up on the fact that a cat is closer to a leopard than a motorbike. In contrast, if the figures are the other way around it means the model hasn’t learned the broad features that make cats and leopards similar, something that could potentially be helpful when analyzing new data.

If we could boost this ability to identify similarities between classes we should be able to create more flexible models better able to generalize, says Maki. And recent research has demonstrated how variations of an approach called regularization might help us achieve that goal.

Neural networks are prone to a problem called “overfitting,” which refers to a tendency to pay too much attention to tiny details and noise specific to their training set. When that happens, models will perform excellently on their training data but poorly when applied to unseen test data without these particular quirks.

Regularization is used to circumvent this problem, typically by reducing the network’s capacity to learn all this unnecessary information and therefore boost its ability to generalize to new data. Techniques are varied, but generally involve modifying the network’s structure or the strength of the weights between artificial neurons.

More recently, though, researchers have suggested new regularization approaches that work by encouraging a broader spread of probabilities across all classes. This essentially helps them capture more of the class similarities, says Maki, and therefore boosts their ability to generalize.

One such approach was devised in 2017 by Google Brain researchers, led by deep learning pioneer Geoffrey Hinton. They introduced a penalty to their training process that directly punished overconfident predictions in the model’s outputs, and a technique called label smoothing that prevents the largest probability becoming much larger than all others. This meant the probabilities were lower for correct labels and higher for incorrect ones, which was found to boost performance of models on varied tasks from image classification to speech recognition.

Another came from Maki himself in 2017 and achieves the same goal, but by suppressing high values in the model’s feature vector—the mathematical construct that describes all of an object’s important characteristics. This has a knock-on effect on the spread of output probabilities and also helped boost performance on various image classification tasks.

While it’s still early days for the approach, the fact that humans are able to exploit these kinds of similarities to learn more efficiently suggests that models that incorporate them hold promise. Maki points out that it could be particularly useful in applications such as robotic grasping, where distinguishing various similar objects is important.

Image Credit: Marianna Kalashnyk / Shutterstock.com Continue reading

Posted in Human Robots

#435070 5 Breakthroughs Coming Soon in Augmented ...

Convergence is accelerating disruption… everywhere! Exponential technologies are colliding into each other, reinventing products, services, and industries.

In this third installment of my Convergence Catalyzer series, I’ll be synthesizing key insights from my annual entrepreneurs’ mastermind event, Abundance 360. This five-blog series looks at 3D printing, artificial intelligence, VR/AR, energy and transportation, and blockchain.

Today, let’s dive into virtual and augmented reality.

Today’s most prominent tech giants are leaping onto the VR/AR scene, each driving forward new and upcoming product lines. Think: Microsoft’s HoloLens, Facebook’s Oculus, Amazon’s Sumerian, and Google’s Cardboard (Apple plans to release a headset by 2021).

And as plummeting prices meet exponential advancements in VR/AR hardware, this burgeoning disruptor is on its way out of the early adopters’ market and into the majority of consumers’ homes.

My good friend Philip Rosedale is my go-to expert on AR/VR and one of the foremost creators of today’s most cutting-edge virtual worlds. After creating the virtual civilization Second Life in 2013, now populated by almost 1 million active users, Philip went on to co-found High Fidelity, which explores the future of next-generation shared VR.

In just the next five years, he predicts five emerging trends will take hold, together disrupting major players and birthing new ones.

Let’s dive in…

Top 5 Predictions for VR/AR Breakthroughs (2019-2024)
“If you think you kind of understand what’s going on with that tech today, you probably don’t,” says Philip. “We’re still in the middle of landing the airplane of all these new devices.”

(1) Transition from PC-based to standalone mobile VR devices

Historically, VR devices have relied on PC connections, usually involving wires and clunky hardware that restrict a user’s field of motion. However, as VR enters the dematerialization stage, we are about to witness the rapid rise of a standalone and highly mobile VR experience economy.

Oculus Go, the leading standalone mobile VR device on the market, requires only a mobile app for setup and can be transported anywhere with WiFi.

With a consumer audience in mind, the 32GB headset is priced at $200 and shares an app ecosystem with Samsung’s Gear VR. While Google Daydream are also standalone VR devices, they require a docked mobile phone instead of the built-in screen of Oculus Go.

In the AR space, Lenovo’s standalone Microsoft’s HoloLens 2 leads the way in providing tetherless experiences.

Freeing headsets from the constraints of heavy hardware will make VR/AR increasingly interactive and transportable, a seamless add-on whenever, wherever. Within a matter of years, it may be as simple as carrying lightweight VR goggles wherever you go and throwing them on at a moment’s notice.

(2) Wide field-of-view AR displays

Microsoft’s HoloLens 2 leads the AR industry in headset comfort and display quality. The most significant issue with their prior version was the limited rectangular field of view (FOV).

By implementing laser technology to create a microelectromechanical systems (MEMS) display, however, HoloLens 2 can position waveguides in front of users’ eyes, directed by mirrors. Subsequently enlarging images can be accomplished by shifting the angles of these mirrors. Coupled with a 47 pixel per degree resolution, HoloLens 2 has now doubled its predecessor’s FOV. Microsoft anticipates the release of its headset by the end of this year at a $3,500 price point, first targeting businesses and eventually rolling it out to consumers.

Magic Leap provides a similar FOV but with lower resolution than the HoloLens 2. The Meta 2 boasts an even wider 90-degree FOV, but requires a cable attachment. The race to achieve the natural human 120-degree horizontal FOV continues.

“The technology to expand the field of view is going to make those devices much more usable by giving you bigger than a small box to look through,” Rosedale explains.

(3) Mapping of real world to enable persistent AR ‘mirror worlds’

‘Mirror worlds’ are alternative dimensions of reality that can blanket a physical space. While seated in your office, the floor beneath you could dissolve into a calm lake and each desk into a sailboat. In the classroom, mirror worlds would convert pencils into magic wands and tabletops into touch screens.

Pokémon Go provides an introductory glimpse into the mirror world concept and its massive potential to unite people in real action.

To create these mirror worlds, AR headsets must precisely understand the architecture of the surrounding world. Rosedale predicts the scanning accuracy of devices will improve rapidly over the next five years to make these alternate dimensions possible.

(4) 5G mobile devices reduce latency to imperceptible levels

Verizon has already launched 5G networks in Minneapolis and Chicago, compatible with the Moto Z3. Sprint plans to follow with its own 5G launch in May. Samsung, LG, Huawei, and ZTE have all announced upcoming 5G devices.

“5G is rolling out this year and it’s going to materially affect particularly my work, which is making you feel like you’re talking to somebody else directly face to face,” explains Rosedale. “5G is critical because currently the cell devices impose too much delay, so it doesn’t feel real to talk to somebody face to face on these devices.”

To operate seamlessly from anywhere on the planet, standalone VR/AR devices will require a strong 5G network. Enhancing real-time connectivity in VR/AR will transform the communication methods of tomorrow.

(5) Eye-tracking and facial expressions built in for full natural communication

Companies like Pupil Labs and Tobii provide eye tracking hardware add-ons and software to VR/AR headsets. This technology allows for foveated rendering, which renders a given scene in high resolution only in the fovea region, while the peripheral regions appear in lower resolution, conserving processing power.

As seen in the HoloLens 2, eye tracking can also be used to identify users and customize lens widths to provide a comfortable, personalized experience for each individual.

According to Rosedale, “The fundamental opportunity for both VR and AR is to improve human communication.” He points out that current VR/AR headsets miss many of the subtle yet important aspects of communication. Eye movements and microexpressions provide valuable insight into a user’s emotions and desires.

Coupled with emotion-detecting AI software, such as Affectiva, VR/AR devices might soon convey much more richly textured and expressive interactions between any two people, transcending physical boundaries and even language gaps.

Final Thoughts
As these promising trends begin to transform the market, VR/AR will undoubtedly revolutionize our lives… possibly to the point at which our virtual worlds become just as consequential and enriching as our physical world.

A boon for next-gen education, VR/AR will empower youth and adults alike with holistic learning that incorporates social, emotional, and creative components through visceral experiences, storytelling, and simulation. Traveling to another time, manipulating the insides of a cell, or even designing a new city will become daily phenomena of tomorrow’s classrooms.

In real estate, buyers will increasingly make decisions through virtual tours. Corporate offices might evolve into spaces that only exist in ‘mirror worlds’ or grow virtual duplicates for remote workers.

In healthcare, accuracy of diagnosis will skyrocket, while surgeons gain access to digital aids as they conduct life-saving procedures. Or take manufacturing, wherein training and assembly will become exponentially more efficient as visual cues guide complex tasks.

In the mere matter of a decade, VR and AR will unlock limitless applications for new and converging industries. And as virtual worlds converge with AI, 3D printing, computing advancements and beyond, today’s experience economies will explode in scale and scope. Prepare yourself for the exciting disruption ahead!

Join Me
Abundance-Digital Online Community: Stay ahead of technological advancements, and turn your passion into action. Abundance Digital is now part of Singularity University. Learn more.

Image Credit: Mariia Korneeva / Shutterstock.com Continue reading

Posted in Human Robots

#435056 How Researchers Used AI to Better ...

A few years back, DeepMind’s Demis Hassabis famously prophesized that AI and neuroscience will positively feed into each other in a “virtuous circle.” If realized, this would fundamentally expand our insight into intelligence, both machine and human.

We’ve already seen some proofs of concept, at least in the brain-to-AI direction. For example, memory replay, a biological mechanism that fortifies our memories during sleep, also boosted AI learning when abstractly appropriated into deep learning models. Reinforcement learning, loosely based on our motivation circuits, is now behind some of AI’s most powerful tools.

Hassabis is about to be proven right again.

Last week, two studies independently tapped into the power of ANNs to solve a 70-year-old neuroscience mystery: how does our visual system perceive reality?

The first, published in Cell, used generative networks to evolve DeepDream-like images that hyper-activate complex visual neurons in monkeys. These machine artworks are pure nightmare fuel to the human eye; but together, they revealed a fundamental “visual hieroglyph” that may form a basic rule for how we piece together visual stimuli to process sight into perception.

In the second study, a team used a deep ANN model—one thought to mimic biological vision—to synthesize new patterns tailored to control certain networks of visual neurons in the monkey brain. When directly shown to monkeys, the team found that the machine-generated artworks could reliably activate predicted populations of neurons. Future improved ANN models could allow even better control, giving neuroscientists a powerful noninvasive tool to study the brain. The work was published in Science.

The individual results, though fascinating, aren’t necessarily the point. Rather, they illustrate how scientists are now striving to complete the virtuous circle: tapping AI to probe natural intelligence. Vision is only the beginning—the tools can potentially be expanded into other sensory domains. And the more we understand about natural brains, the better we can engineer artificial ones.

It’s a “great example of leveraging artificial intelligence to study organic intelligence,” commented Dr. Roman Sandler at Kernel.co on Twitter.

Why Vision?
ANNs and biological vision have quite the history.

In the late 1950s, the legendary neuroscientist duo David Hubel and Torsten Wiesel became some of the first to use mathematical equations to understand how neurons in the brain work together.

In a series of experiments—many using cats—the team carefully dissected the structure and function of the visual cortex. Using myriads of images, they revealed that vision is processed in a hierarchy: neurons in “earlier” brain regions, those closer to the eyes, tend to activate when they “see” simple patterns such as lines. As we move deeper into the brain, from the early V1 to a nub located slightly behind our ears, the IT cortex, neurons increasingly respond to more complex or abstract patterns, including faces, animals, and objects. The discovery led some scientists to call certain IT neurons “Jennifer Aniston cells,” which fire in response to pictures of the actress regardless of lighting, angle, or haircut. That is, IT neurons somehow extract visual information into the “gist” of things.

That’s not trivial. The complex neural connections that lead to increasing abstraction of what we see into what we think we see—what we perceive—is a central question in machine vision: how can we teach machines to transform numbers encoding stimuli into dots, lines, and angles that eventually form “perceptions” and “gists”? The answer could transform self-driving cars, facial recognition, and other computer vision applications as they learn to better generalize.

Hubel and Wiesel’s Nobel-prize-winning studies heavily influenced the birth of ANNs and deep learning. Much of earlier ANN “feed-forward” model structures are based on our visual system; even today, the idea of increasing layers of abstraction—for perception or reasoning—guide computer scientists to build AI that can better generalize. The early romance between vision and deep learning is perhaps the bond that kicked off our current AI revolution.

It only seems fair that AI would feed back into vision neuroscience.

Hieroglyphs and Controllers
In the Cell study, a team led by Dr. Margaret Livingstone at Harvard Medical School tapped into generative networks to unravel IT neurons’ complex visual alphabet.

Scientists have long known that neurons in earlier visual regions (V1) tend to fire in response to “grating patches” oriented in certain ways. Using a limited set of these patches like letters, V1 neurons can “express a visual sentence” and represent any image, said Dr. Arash Afraz at the National Institute of Health, who was not involved in the study.

But how IT neurons operate remained a mystery. Here, the team used a combination of genetic algorithms and deep generative networks to “evolve” computer art for every studied neuron. In seven monkeys, the team implanted electrodes into various parts of the visual IT region so that they could monitor the activity of a single neuron.

The team showed each monkey an initial set of 40 images. They then picked the top 10 images that stimulated the highest neural activity, and married them to 30 new images to “evolve” the next generation of images. After 250 generations, the technique, XDREAM, generated a slew of images that mashed up contorted face-like shapes with lines, gratings, and abstract shapes.

This image shows the evolution of an optimum image for stimulating a visual neuron in a monkey. Image Credit: Ponce, Xiao, and Schade et al. – Cell.
“The evolved images look quite counter-intuitive,” explained Afraz. Some clearly show detailed structures that resemble natural images, while others show complex structures that can’t be characterized by our puny human brains.

This figure shows natural images (right) and images evolved by neurons in the inferotemporal cortex of a monkey (left). Image Credit: Ponce, Xiao, and Schade et al. – Cell.
“What started to emerge during each experiment were pictures that were reminiscent of shapes in the world but were not actual objects in the world,” said study author Carlos Ponce. “We were seeing something that was more like the language cells use with each other.”

This image was evolved by a neuron in the inferotemporal cortex of a monkey using AI. Image Credit: Ponce, Xiao, and Schade et al. – Cell.
Although IT neurons don’t seem to use a simple letter alphabet, it does rely on a vast array of characters like hieroglyphs or Chinese characters, “each loaded with more information,” said Afraz.

The adaptive nature of XDREAM turns it into a powerful tool to probe the inner workings of our brains—particularly for revealing discrepancies between biology and models.

The Science study, led by Dr. James DiCarlo at MIT, takes a similar approach. Using ANNs to generate new patterns and images, the team was able to selectively predict and independently control neuron populations in a high-level visual region called V4.

“So far, what has been done with these models is predicting what the neural responses would be to other stimuli that they have not seen before,” said study author Dr. Pouya Bashivan. “The main difference here is that we are going one step further and using the models to drive the neurons into desired states.”

It suggests that our current ANN models for visual computation “implicitly capture a great deal of visual knowledge” which we can’t really describe, but which the brain uses to turn vision information into perception, the authors said. By testing AI-generated images on biological vision, however, the team concluded that today’s ANNs have a degree of understanding and generalization. The results could potentially help engineer even more accurate ANN models of biological vision, which in turn could feed back into machine vision.

“One thing is clear already: Improved ANN models … have led to control of a high-level neural population that was previously out of reach,” the authors said. “The results presented here have likely only scratched the surface of what is possible with such implemented characterizations of the brain’s neural networks.”

To Afraz, the power of AI here is to find cracks in human perception—both our computational models of sensory processes, as well as our evolved biological software itself. AI can be used “as a perfect adversarial tool to discover design cracks” of IT, said Afraz, such as finding computer art that “fools” a neuron into thinking the object is something else.

“As artificial intelligence researchers develop models that work as well as the brain does—or even better—we will still need to understand which networks are more likely to behave safely and further human goals,” said Ponce. “More efficient AI can be grounded by knowledge of how the brain works.”

Image Credit: Sangoiri / Shutterstock.com Continue reading

Posted in Human Robots

#434818 Watch These Robots Do Tasks You Thought ...

Robots have been masters of manufacturing at speed and precision for decades, but give them a seemingly simple task like stacking shelves, and they quickly get stuck. That’s changing, though, as engineers build systems that can take on the deceptively tricky tasks most humans can do with their eyes closed.

Boston Dynamics is famous for dramatic reveals of robots performing mind-blowing feats that also leave you scratching your head as to what the market is—think the bipedal Atlas doing backflips or Spot the galloping robot dog.

Last week, the company released a video of a robot called Handle that looks like an ostrich on wheels carrying out the seemingly mundane task of stacking boxes in a warehouse.

It might seem like a step backward, but this is exactly the kind of practical task robots have long struggled with. While the speed and precision of industrial robots has seen them take over many functions in modern factories, they’re generally limited to highly prescribed tasks carried out in meticulously-controlled environments.

That’s because despite their mechanical sophistication, most are still surprisingly dumb. They can carry out precision welding on a car or rapidly assemble electronics, but only by rigidly following a prescribed set of motions. Moving cardboard boxes around a warehouse might seem simple to a human, but it actually involves a variety of tasks machines still find pretty difficult—perceiving your surroundings, navigating, and interacting with objects in a dynamic environment.

But the release of this video suggests Boston Dynamics thinks these kinds of applications are close to prime time. Last week the company doubled down by announcing the acquisition of start-up Kinema Systems, which builds computer vision systems for robots working in warehouses.

It’s not the only company making strides in this area. On the same day the video went live, Google unveiled a robot arm called TossingBot that can pick random objects from a box and quickly toss them into another container beyond its reach, which could prove very useful for sorting items in a warehouse. The machine can train on new objects in just an hour or two, and can pick and toss up to 500 items an hour with better accuracy than any of the humans who tried the task.

And an apple-picking robot built by Abundant Robotics is currently on New Zealand farms navigating between rows of apple trees using LIDAR and computer vision to single out ripe apples before using a vacuum tube to suck them off the tree.

In most cases, advances in machine learning and computer vision brought about by the recent AI boom are the keys to these rapidly improving capabilities. Robots have historically had to be painstakingly programmed by humans to solve each new task, but deep learning is making it possible for them to quickly train themselves on a variety of perception, navigation, and dexterity tasks.

It’s not been simple, though, and the application of deep learning in robotics has lagged behind other areas. A major limitation is that the process typically requires huge amounts of training data. That’s fine when you’re dealing with image classification, but when that data needs to be generated by real-world robots it can make the approach impractical. Simulations offer the possibility to run this training faster than real time, but it’s proved difficult to translate policies learned in virtual environments into the real world.

Recent years have seen significant progress on these fronts, though, and the increasing integration of modern machine learning with robotics. In October, OpenAI imbued a robotic hand with human-level dexterity by training an algorithm in a simulation using reinforcement learning before transferring it to the real-world device. The key to ensuring the translation went smoothly was injecting random noise into the simulation to mimic some of the unpredictability of the real world.

And just a couple of weeks ago, MIT researchers demonstrated a new technique that let a robot arm learn to manipulate new objects with far less training data than is usually required. By getting the algorithm to focus on a few key points on the object necessary for picking it up, the system could learn to pick up a previously unseen object after seeing only a few dozen examples (rather than the hundreds or thousands typically required).

How quickly these innovations will trickle down to practical applications remains to be seen, but a number of startups as well as logistics behemoth Amazon are developing robots designed to flexibly pick and place the wide variety of items found in your average warehouse.

Whether the economics of using robots to replace humans at these kinds of menial tasks makes sense yet is still unclear. The collapse of collaborative robotics pioneer Rethink Robotics last year suggests there are still plenty of challenges.

But at the same time, the number of robotic warehouses is expected to leap from 4,000 today to 50,000 by 2025. It may not be long until robots are muscling in on tasks we’ve long assumed only humans could do.

Image Credit: Visual Generation / Shutterstock.com Continue reading

Posted in Human Robots