HRI-JP Next Generation Intelligent Communication and Social Interaction Research Alliance Laboratory (ICSI)
HRI-JP Next Generation Intelligent Communication and Social Interaction Research Alliance Laboratory(ICSI)
We are conducting research to apply robot audition technology to ICT and mobility mainly in collaboration with HRI-JP. We are building ultra-thin processor board as a device for that. With this device, the embedded version of HARK mentioned above will operate in real time. For example, if you use this device to create a tablet case with microphones embedded around it, you can use HARK’s functionality in a commercially available tablet. You can build tools that support hearing impaired people and support multilingual communication.
You can also apply this to a car. A general car navigation system with voice recognition functions always requires the user to press the button before he or she speaks. Pressing the button will mute the music that was flowing, the air volume of the air conditioner will be smaller, and a beep will be picked up. If the user starts speaking before this beep tone, the voice recognition function will not work well. However, using HARK makes it possible to recognize robustly under musical noise and even in environments with large air volume of air conditioning. In addition to enable the system to be buttoneless, it can realize car navigation with speech recognition function that can correspond to both driver’s seat and passenger’s seat.In recent years, due to the appearance of AI speakers, user friendliness has been improved, but even AI speakers require certain keywords to operate them. Basically, it is assumed that there is only a single user, so the merit of the function brought by HARK is great.
In addition to this, we are also conducting research and development of HARK’s cloud services. Once it becomes a cloud service, all computational resources can be placed on the cloud side, so users can enjoy the benefits of HARK with a simple system.
By analyzing data of a certain length, such as a conference, users can analyze the utterance such as the frequency and timing of each utterance and the relation between speakers.