Tag Archives: voice recognition

#435593 AI at the Speed of Light

Neural networks shine for solving tough problems such as facial and voice recognition, but conventional electronic versions are limited in speed and hungry for power. In theory, optics could beat digital electronic computers in the matrix calculations used in neural networks. However, optics had been limited by their inability to do some complex calculations that had required electronics. Now new experiments show that all-optical neural networks can tackle those problems.

The key attraction of neural networks is their massive interconnections among processors, comparable to the complex interconnections among neurons in the brain. This lets them perform many operations simultaneously, like the human brain does when looking at faces or listening to speech, making them more efficient for facial and voice recognition than traditional electronic computers that execute one instruction at a time.

Today's electronic neural networks have reached eight million neurons, but their future use in artificial intelligence may be limited by their high power usage and limited parallelism in connections. Optical connections through lenses are inherently parallel. The lens in your eye simultaneously focuses light from across your field of view onto the retina in the back of your eye, where an array of light-detecting nerve cells detects the light. Each cell then relays the signal it receives to neurons in the brain that process the visual signals to show us an image.

Glass lenses process optical signals by focusing light, which performs a complex mathematical operation called a Fourier transform that preserves the information in the original scene but rearranges is completely. One use of Fourier transforms is converting time variations in signal intensity into a plot of the frequencies present in the signal. The military used this trick in the 1950s to convert raw radar return signals recorded by an aircraft in flight into a three-dimensional image of the landscape viewed by the plane. Today that conversion is done electronically, but the vacuum-tube computers of the 1950s were not up to the task.

Development of neural networks for artificial intelligence started with electronics, but their AI applications have been limited by their slow processing and need for extensive computing resources. Some researchers have developed hybrid neural networks, in which optics perform simple linear operations, but electronics perform more complex nonlinear calculations. Now two groups have demonstrated simple all-optical neural networks that do all processing with light.

In May, Wolfram Pernice of the Institute of Physics at the University of Münster in Germany and colleagues reported testing an all-optical “neuron” in which signals change target materials between liquid and solid states, an effect that has been used for optical data storage. They demonstrated nonlinear processing, and produced output pulses like those from organic neurons. They then produced an integrated photonic circuit that incorporated four optical neurons operating at different wavelengths, each of which connected to 15 optical synapses. The photonic circuit contained more than 140 components and could recognize simple optical patterns. The group wrote that their device is scalable, and that the technology promises “access to the high speed and high bandwidth inherent to optical systems, thus enabling the direct processing of optical telecommunication and visual data.”

Now a group at the Hong Kong University of Science and Technology reports in Optica that they have made an all-optical neural network based on a different process, electromagnetically induced transparency, in which incident light affects how atoms shift between quantum-mechanical energy levels. The process is nonlinear and can be triggered by very weak light signals, says Shengwang Du, a physics professor and coauthor of the paper.

In their demonstration, they illuminated rubidium-85 atoms cooled by lasers to about 10 microKelvin (10 microdegrees above absolute zero). Although the technique may seem unusually complex, Du said the system was the most accessible one in the lab that could produce the desired effects. “As a pure quantum atomic system [it] is ideal for this proof-of-principle experiment,” he says.

Next, they plan to scale up the demonstration using a hot atomic vapor center, which is less expensive, does not require time-consuming preparation of cold atoms, and can be integrated with photonic chips. Du says the major challenges are reducing cost of the nonlinear processing medium and increasing the scale of the all-optical neural network for more complex tasks.

“Their demonstration seems valid,” says Volker Sorger, an electrical engineer at George Washington University in Washington who was not involved in either demonstration. He says the all-optical approach is attractive because it offers very high parallelism, but the update rate is limited to about 100 hertz because of the liquid crystals used in their test, and he is not completely convinced their approach can be scaled error-free. Continue reading

Posted in Human Robots

#432880 Google’s Duplex Raises the Question: ...

By now, you’ve probably seen Google’s new Duplex software, which promises to call people on your behalf to book appointments for haircuts and the like. As yet, it only exists in demo form, but already it seems like Google has made a big stride towards capturing a market that plenty of companies have had their eye on for quite some time. This software is impressive, but it raises questions.

Many of you will be familiar with the stilted, robotic conversations you can have with early chatbots that are, essentially, glorified menus. Instead of pressing 1 to confirm or 2 to re-enter, some of these bots would allow for simple commands like “Yes” or “No,” replacing the buttons with limited ability to recognize a few words. Using them was often a far more frustrating experience than attempting to use a menu—there are few things more irritating than a robot saying, “Sorry, your response was not recognized.”

Google Duplex scheduling a hair salon appointment:

Google Duplex calling a restaurant:

Even getting the response recognized is hard enough. After all, there are countless different nuances and accents to baffle voice recognition software, and endless turns of phrase that amount to saying the same thing that can confound natural language processing (NLP), especially if you like your phrasing quirky.

You may think that standard customer-service type conversations all travel the same route, using similar words and phrasing. But when there are over 80,000 ways to order coffee, and making a mistake is frowned upon, even simple tasks require high accuracy over a huge dataset.

Advances in audio processing, neural networks, and NLP, as well as raw computing power, have meant that basic recognition of what someone is trying to say is less of an issue. Soundhound’s virtual assistant prides itself on being able to process complicated requests (perhaps needlessly complicated).

The deeper issue, as with all attempts to develop conversational machines, is one of understanding context. There are so many ways a conversation can go that attempting to construct a conversation two or three layers deep quickly runs into problems. Multiply the thousands of things people might say by the thousands they might say next, and the combinatorics of the challenge runs away from most chatbots, leaving them as either glorified menus, gimmicks, or rather bizarre to talk to.

Yet Google, who surely remembers from Glass the risk of premature debuts for technology, especially the kind that ask you to rethink how you interact with or trust in software, must have faith in Duplex to show it on the world stage. We know that startups like Semantic Machines and x.ai have received serious funding to perform very similar functions, using natural-language conversations to perform computing tasks, schedule meetings, book hotels, or purchase items.

It’s no great leap to imagine Google will soon do the same, bringing us closer to a world of onboard computing, where Lens labels the world around us and their assistant arranges it for us (all the while gathering more and more data it can convert into personalized ads). The early demos showed some clever tricks for keeping the conversation within a fairly narrow realm where the AI should be comfortable and competent, and the blog post that accompanied the release shows just how much effort has gone into the technology.

Yet given the privacy and ethics funk the tech industry finds itself in, and people’s general unease about AI, the main reaction to Duplex’s impressive demo was concern. The voice sounded too natural, bringing to mind Lyrebird and their warnings of deepfakes. You might trust “Do the Right Thing” Google with this technology, but it could usher in an era when automated robo-callers are far more convincing.

A more human-like voice may sound like a perfectly innocuous improvement, but the fact that the assistant interjects naturalistic “umm” and “mm-hm” responses to more perfectly mimic a human rubbed a lot of people the wrong way. This wasn’t just a voice assistant trying to sound less grinding and robotic; it was actively trying to deceive people into thinking they were talking to a human.

Google is running the risk of trying to get to conversational AI by going straight through the uncanny valley.

“Google’s experiments do appear to have been designed to deceive,” said Dr. Thomas King of the Oxford Internet Institute’s Digital Ethics Lab, according to Techcrunch. “Their main hypothesis was ‘can you distinguish this from a real person?’ In this case it’s unclear why their hypothesis was about deception and not the user experience… there should be some kind of mechanism there to let people know what it is they are speaking to.”

From Google’s perspective, being able to say “90 percent of callers can’t tell the difference between this and a human personal assistant” is an excellent marketing ploy, even though statistics about how many interactions are successful might be more relevant.

In fact, Duplex runs contrary to pretty much every major recommendation about ethics for the use of robotics or artificial intelligence, not to mention certain eavesdropping laws. Transparency is key to holding machines (and the people who design them) accountable, especially when it comes to decision-making.

Then there are the more subtle social issues. One prominent effect social media has had is to allow people to silo themselves; in echo chambers of like-minded individuals, it’s hard to see how other opinions exist. Technology exacerbates this by removing the evolutionary cues that go along with face-to-face interaction. Confronted with a pair of human eyes, people are more generous. Confronted with a Twitter avatar or a Facebook interface, people hurl abuse and criticism they’d never dream of using in a public setting.

Now that we can use technology to interact with ever fewer people, will it change us? Is it fair to offload the burden of dealing with a robot onto the poor human at the other end of the line, who might have to deal with dozens of such calls a day? Google has said that if the AI is in trouble, it will put you through to a human, which might help save receptionists from the hell of trying to explain a concept to dozens of dumbfounded AI assistants all day. But there’s always the risk that failures will be blamed on the person and not the machine.

As AI advances, could we end up treating the dwindling number of people in these “customer-facing” roles as the buggiest part of a fully automatic service? Will people start accusing each other of being robots on the phone, as well as on Twitter?

Google has provided plenty of reassurances about how the system will be used. They have said they will ensure that the system is identified, and it’s hardly difficult to resolve this problem; a slight change in the script from their demo would do it. For now, consumers will likely appreciate moves that make it clear whether the “intelligent agents” that make major decisions for us, that we interact with daily, and that hide behind social media avatars or phone numbers are real or artificial.

Image Credit: Besjunior / Shutterstock.com Continue reading

Posted in Human Robots

#431315 Better Than Smart Speakers? Japan Is ...

While American internet giants are developing speakers, Japanese companies are working on robots and holograms. They all share a common goal: to create the future platform for the Internet of Things (IoT) and smart homes.
Names like Bocco, EMIEW3, Xperia Assistant, and Gatebox may not ring a bell to most outside of Japan, but Sony, Hitachi, Sharp, and Softbank most certainly do. The companies, along with Japanese start-ups, have developed robots, robot concepts, and even holograms like the ones hiding behind the short list of names.
While there are distinct differences between the various systems, they share the potential to act as a remote control for IoT devices and smart homes. It is a very different direction than that taken by companies like Google, Amazon, and Apple, who have so far focused on building IoT speaker systems.
Bocco robot. Image Credit: Yukai Engineering
“Technology companies are pursuing the platform—or smartphone if you will—for IoT. My impression is that Japanese companies—and Japanese consumers—prefer that such a platform should not just be an object, but a companion,” says Kosuke Tatsumi, designer at Yukai Engineering, a startup that has developed the Bocco robot system.
At Hitachi, a spokesperson said that the company’s human symbiotic service robot, EMIEW3, robot is currently in the field, doing proof-of-value tests at customer sites to investigate needs and potential solutions. This could include working as an interactive control system for the Internet of Things:
“EMIEW3 is able to communicate with humans, thus receive instructions, and as it is connected to a robotics IT platform, it is very much capable of interacting with IoT-based systems,” the spokesperson said.
The power of speech is getting feet
Gartner analysis predicts that there will be 8.4 billion internet-connected devices—collectively making up the Internet of Things—by the end of 2017. 5.2 billion of those devices are in the consumer category. By the end of 2020, the number of IoT devices will rise to 12.8 billion—and that is just in the consumer category.
As a child of the 80s, I can vividly remember how fun it was to have separate remote controls for TV, video, and stereo. I can imagine a situation where my internet-connected refrigerator and ditto thermostat, television, and toaster try to work out who I’m talking to and what I want them to do.
Consensus seems to be that speech will be the way to interact with many/most IoT devices. The same goes for a form of virtual assistant functioning as the IoT platform—or remote control. Almost everything else is still an open ballgame, despite an early surge for speaker-based systems, like those from Amazon, Google, and Apple.
Why robots could rule
Famous android creator and robot scientist Dr. Hiroshi Ishiguro sees the interaction between humans and the AI embedded in speakers or robots as central to both approaches. From there, the approaches differ greatly.
Image Credit: Hiroshi Ishiguro Laboratories
“It is about more than the difference of form. Speaking to an Amazon Echo is not a natural kind of interaction for humans. That is part of what we in Japan are creating in many human-like robot systems,” he says. “The human brain is constructed to recognize and interact with humans. This is part of why it makes sense to focus on developing the body for the AI mind as well as the AI mind itself. In a way, you can describe it as the difference between developing an assistant, which could be said to be what many American companies are currently doing, and a companion, which is more the focus here in Japan.”
Another advantage is that robots are more kawaii—a multifaceted Japanese word that can be translated as “cute”—than speakers are. This makes it easy for people to relate to them and forgive them.
“People are more willing to forgive children when they make mistakes, and the same is true with a robot like Bocco, which is designed to look kawaii and childlike,” Kosuke Tatsumi explains.
Japanese robots and holograms with IoT-control capabilities
So, what exactly do these robot and hologram companions look like, what can they do, and who’s making them? Here are seven examples of Japanese companies working to go a step beyond smart speakers with personable robots and holograms.
1. In 2016 Sony’s mobile division demonstrated the Xperia Agent concept robot that recognizes individual users, is voice controlled, and can do things like control your television and receive calls from services like Skype.

2. Sharp launched their Home Assistant at CES 2016. A robot-like, voice-controlled assistant that can to control, among other things, air conditioning units, and televisions. Sharp has also launched a robotic phone called RoBoHon.
3. Gatebox has created a holographic virtual assistant. Evil tongues will say that it is primarily the expression of an otaku (Japanese for nerd) dream of living with a manga heroine. Gatebox is, however, able to control things like lights, TVs, and other systems through API integration. It also provides its owner with weather-related advice like “remember your umbrella, it looks like it will rain later.” Gatebox can be controlled by voice, gesture, or via an app.
4. Hitachi’s EMIEW3 robot is designed to assist people in businesses and public spaces. It is connected to a robot IT-platform via the cloud that acts as a “remote brain.” Hitachi is currently investigating the business use cases for EMIEW3. This could include the role of controlling platform for IoT devices.

5. Softbank’s Pepper robot has been used as a platform to control use of medical IoT devices such as smart thermometers by Avatarion. The company has also developed various in-house systems that enable Pepper to control IoT-devices like a coffee machine. A user simply asks Pepper to brew a cup of coffee, and it starts the coffee machine for you.
6. Yukai Engineering’s Bocco registers when a person (e.g., young child) comes home and acts as a communication center between that person and other members of the household (e.g., parent still at work). The company is working on integrating voice recognition, voice control, and having Bocco control things like the lights and other connected IoT devices.
7. Last year Toyota launched the Kirobo Mini, a companion robot which aims to, among other things, help its owner by suggesting “places to visit, routes for travel, and music to listen to” during the drive.

Today, Japan. Tomorrow…?
One of the key questions is whether this emerging phenomenon is a purely Japanese thing. If the country’s love of robots makes it fundamentally different. Japan is, after all, a country where new units of Softbank’s Pepper robot routinely sell out in minutes and the RoBoHon robot-phone has its own cafe nights in Tokyo.
It is a country where TV introduces you to friendly, helpful robots like Doraemon and Astro Boy. I, on the other hand, first met robots in the shape of Arnold Schwarzenegger’s Terminator and struggled to work out why robots seemed intent on permanently borrowing things like clothes and motorcycles, not to mention why they hated people called Sarah.
However, research suggests that a big part of the reason why Japanese seem to like robots is a combination of exposure and positive experiences that leads to greater acceptance of them. As robots spread to more and more industries—and into our homes—our acceptance of them will grow.
The argument is also backed by a project by Avatarion, which used Softbank’s Nao-robot as a classroom representative for children who were in the hospital.
“What we found was that the other children quickly adapted to interacting with the robot and treating it as the physical representation of the child who was in hospital. They accepted it very quickly,” Thierry Perronnet, General Manager of Avatarion, explains.
His company has also developed solutions where Softbank’s Pepper robot is used as an in-home nurse and controls various medical IoT devices.
If robots end up becoming our preferred method for controlling IoT devices, it is by no means certain that said robots will be coming from Japan.
“I think that the goal for both Japanese and American companies—including the likes of Google, Amazon, Microsoft, and Apple—is to create human-like interaction. For this to happen, technology needs to evolve and adapt to us and how we are used to interacting with others, in other words, have a more human form. Humans’ speed of evolution cannot keep up with technology’s, so it must be the technology that changes,” Dr. Ishiguro says.
Image Credit: Sony Mobile Communications Continue reading

Posted in Human Robots

#430579 What These Lifelike Androids Can Teach ...

For Dr. Hiroshi Ishiguro, one of the most interesting things about androids is the changing questions they pose us, their creators, as they evolve. Does it, for example, do something to the concept of being human if a human-made creation starts telling you about what kind of boys ‘she’ likes?
If you want to know the answer to the boys question, you need to ask ERICA, one of Dr. Ishiguro’s advanced androids. Beneath her plastic skull and silicone skin, wires connect to AI software systems that bring her to life. Her ability to respond goes far beyond standard inquiries. Spend a little time with her, and the feeling of a distinct personality starts to emerge. From time to time, she works as a receptionist at Dr. Ishiguro and his team’s Osaka University labs. One of her android sisters is an actor who has starred in plays and a film.

ERICA’s ‘brother’ is an android version of Dr. Ishiguro himself, which has represented its creator at various events while the biological Ishiguro can remain in his offices in Japan. Microphones and cameras capture Ishiguro’s voice and face movements, which are relayed to the android. Apart from mimicking its creator, the Geminoid™ android is also capable of lifelike blinking, fidgeting, and breathing movements.
Say hello to relaxation
As technological development continues to accelerate, so do the possibilities for androids. From a position as receptionist, ERICA may well branch out into many other professions in the coming years. Companion for the elderly, comic book storyteller (an ancient profession in Japan), pop star, conversational foreign language partner, and newscaster are some of the roles and responsibilities Dr. Ishiguro sees androids taking on in the near future.
“Androids are not uncanny anymore. Most people adapt to interacting with Erica very quickly. Actually, I think that in interacting with androids, which are still different from us, we get a better appreciation of interacting with other cultures. In both cases, we are talking with someone who is different from us and learn to overcome those differences,” he says.
A lot has been written about how robots will take our jobs. Dr. Ishiguro believes these fears are blown somewhat out of proportion.
“Robots and androids will take over many simple jobs. Initially there might be some job-related issues, but new schemes, like for example a robot tax similar to the one described by Bill Gates, should help,” he says.
“Androids will make it possible for humans to relax and keep evolving. If we compare the time we spend studying now compared to 100 years ago, it has grown a lot. I think it needs to keep growing if we are to keep expanding our scientific and technological knowledge. In the future, we might end up spending 20 percent of our lifetime on work and 80 percent of the time on education and growing our skills.”
Android asks who you are
For Dr. Ishiguro, another aspect of robotics in general, and androids in particular, is how they question what it means to be human.
“Identity is a very difficult concept for humans sometimes. For example, I think clothes are part of our identity, in a way that is similar to our faces and bodies. We don’t change those from one day to the next, and that is why I have ten matching black outfits,” he says.
This link between physical appearance and perceived identity is one of the aspects Dr. Ishiguro is exploring. Another closely linked concept is the connection between body and feeling of self. The Ishiguro avatar was once giving a presentation in Austria. Its creator recalls how he felt distinctly like he was in Austria, even capable of feeling sensation of touch on his own body when people laid their hands on the android. If he was distracted, he felt almost ‘sucked’ back into his body in Japan.
“I am constantly thinking about my life in this way, and I believe that androids are a unique mirror that helps us formulate questions about why we are here and why we have been so successful. I do not necessarily think I have found the answers to these questions, so if you have, please share,” he says with a laugh.
His work and these questions, while extremely interesting on their own, become extra poignant when considering the predicted melding of mind and machine in the near future.
The ability to be present in several locations through avatars—virtual or robotic—raises many questions of both philosophical and practical nature. Then add the hypotheticals, like why send a human out onto the hostile surface of Mars if you could send a remote-controlled android, capable of relaying everything it sees, hears and feels?
The two ways of robotics will meet
Dr. Ishiguro sees the world of AI-human interaction as currently roughly split into two. One is the chat-bot approach that companies like Amazon, Microsoft, Google, and recently Apple, employ using stationary objects like speakers. Androids like ERICA represent another approach.
“It is about more than the form factor. I think that the android approach is generally more story-based. We are integrating new conversation features based on assumptions about the situation and running different scenarios that expand the android’s vocabulary and interactions. Another aspect we are working on is giving androids desire and intention. Like with people, androids should have desires and intentions in order for you to want to interact with them over time,” Dr. Ishiguro explains.
This could be said to be part of a wider trend for Japan, where many companies are developing human-like robots that often have some Internet of Things capabilities, making them able to handle some of the same tasks as an Amazon Echo. The difference in approach could be summed up in the words ‘assistant’ (Apple, Amazon, etc.) and ‘companion’ (Japan).
Dr. Ishiguro sees this as partly linked to how Japanese as a language—and market—is somewhat limited. This has a direct impact on viability and practicality of ‘pure’ voice recognition systems. At the same time, Japanese people have had greater exposure to positive images of robots, and have a different cultural / religious view of objects having a ‘soul’. However, it may also mean Japanese companies and android scientists are both stealing a lap on their western counterparts.
“If you speak to an Amazon Echo, that is not a natural way to interact for humans. This is part of why we are making human-like robot systems. The human brain is set up to recognize and interact with humans. So, it makes sense to focus on developing the body for the AI mind, as well as the AI. I believe that the final goal for both Japanese and other companies and scientists is to create human-like interaction. Technology has to adapt to us, because we cannot adapt fast enough to it, as it develops so quickly,” he says.
Banner image courtesy of Hiroshi Ishiguro Laboratories, ATR all rights reserved.
Dr. Ishiguro’s team has collaborated with partners and developed a number of android systems:
Geminoid™ HI-2 has been developed by Hiroshi Ishiguro Laboratories and Advanced Telecommunications Research Institute International (ATR).
Geminoid™ F has been developed by Osaka University and Hiroshi Ishiguro Laboratories, Advanced Telecommunications Research Institute International (ATR).
ERICA has been developed by ERATO ISHIGURO Symbiotic Human-Robot Interaction Project Continue reading

Posted in Human Robots