#430147 Deep Learning at the Speed of Light on ...
Deep learning has transformed the field of artificial intelligence, but the limitations of conventional computer hardware are already hindering progress. Researchers at MIT think their new “nanophotonic” processor could be the answer by carrying out deep learning at the speed of light.
In the 1980s, scientists and engineers hailed optical computing as the next great revolution in information technology, but it turned out that bulky components like fiber optic cables and lenses didn’t make for particularly robust or compact computers.
In particular, they found it extremely challenging to make scalable optical logic gates, and therefore impractical to make general optical computers, according to MIT physics post-doc Yichen Shen. One thing light is good at, though, is multiplying matrices—arrays of numbers arranged in columns and rows. You can actually mathematically explain the way a lens acts on a beam of light in terms of matrix multiplications.
This also happens to be a core component of the calculations involved in deep learning. Combined with advances in nanophotonics—the study of light’s behavior at the nanometer scale—this has led to a resurgence in interest in optical computing.
“Deep learning is mainly matrix multiplications, so it works very well with the nature of light,” says Shen. “With light you can make deep learning computing much faster and thousands of times more energy-efficient.”
To demonstrate this, Shen and his MIT colleagues have designed an all-optical chip that can implement artificial neural networks—the brain-inspired algorithms at the heart of deep learning.
In a recent paper in Nature, the group describes a chip made up of 56 interferometers—components that allow the researchers to control how beams of light interfere with each other to carry out mathematical operations.
The processor can be reprogrammed by applying a small voltage to the waveguides that direct beams of light around the processor, which heats them and causes them to change shape.
The chip is best suited to inference tasks, the researchers say, where the algorithm is put to practical use by applying a learned model to analyze new data, for instance to detect objects in an image.
It isn’t great at learning, because heating the waveguides is relatively slow compared to how electronic systems are reprogrammed. So, in their study, the researchers trained the algorithm on a computer before transferring the learned model to the nanophotonic processor to carry out the inference task.
That’s not a major issue. For many practical applications it’s not necessary to carry out learning and inference on the same chip. Google recently made headlines for designing its own deep learning chip, the TPU, which is also specifically designed for inference and most companies that use a lot of machine learning split the two jobs.
“In many cases they update these models once every couple of months and the rest of the time the fixed model is just doing inference,” says Shen. “People usually separate these tasks. They typically have a server just doing training and another just doing inference, so I don’t see a big problem making a chip focused on inference.”
Once the model has been programmed into the chip, it can then carry out computations at the speed of light using less than one-thousandth the energy per operation compared to conventional electronic chips.
There are limitations, though. Because the chip deals with light waves that operate on the scale of a few microns, there are fundamental limits to how small these chips can get.
“The wavelength really sets the limit of how small the waveguides can be. We won’t be able to make devices significantly smaller. Maybe by a factor of four, but physics will ultimately stop us,” says MIT graduate student Nicholas Harris, who co-authored the paper.
That means it would be difficult to implement neural nets much larger than a few thousand neurons. However, the vast majority of current deep learning algorithms are well within that limit.
The system did achieve a significantly lower accuracy on the task than a standard computer implementing the same deep learning model, correctly identifying 76.7 percent of vowels compared to 91.7 percent.
But Harris says they think this was largely due to interference between the various heating elements used to program the waveguides, and that it should be easy to fix by using thermal isolation trenches or extra calibration steps.
Importantly, the chips are also built using the same fabrication technology as conventional computer chips, so scaling up production should be easy. Shen said the group has already had interest in their technology from prominent chipmakers.
Pierre-Alexandre Blanche, a professor of optics at the University of Arizona, said he’s very excited by the paper, which he said complements his own work. But he cautioned against getting too carried away.
“This is another milestone in the progress toward useful optical computing. But we are still far away to be competitive with electronics,” he told Singularity Hub in an email. “The argumentation about scalability, power consumption, speed etc. [in the paper] use a lot of conditional tense and assumptions which demonstrate that, if there is potential indeed, there is still a lot of research to be done.”
In particular, he pointed out that the system was only a partial solution to the problem. While the vast majority of neuronal computation involves multiplication of matrices, there is another component: calculating a non-linear response.
In the current paper this aspect of the computation was simulated on a regular computer. The researchers say in future models this function could be carried out by a so-called “saturable absorber” integrated into the waveguides that absorbs less light as the intensity increases.
But Blanche notes that this is not a trivial problem and something his group is actually currently working on. “It is not like you can buy one at the drug store,” he says. Bhavin Shastri, a post-doc at Princeton whose group is also working on nanophotonic chips for implementing neural networks, said the research was important, as enabling matrix multiplications is a key step to enabling full-fledged photonic neural networks.
“Overall, this area of research is poised to usher in an exciting and promising field,” he added. “Neural networks implemented in photonic hardware could revolutionize how machines interact with ultrafast physical phenomena. Silicon photonics combines the analog device performance of photonics with the cost and scalability of silicon manufacturing.”