With the evolution of Natural Language Understanding, number of devices came in the market like Alexa, Google Assistant etc. These devices get activated with the wake word. A wake word is a special word like “Hey Siri”, “OK Google”, and “Alexa”, which activates the devices. Wake word also known as ‘hotword’, ‘trigger word’, and ‘wake up word’ are the phrases which help end users to initialize the devices.
After invoking wake words voice assistant’s records commands from the invoker (user) and acts according to its command. It separates the background noise and act as per instruction.
Types of Wake Words:
There are two kinds of wake word detectors: Universal and Personal.
The Universal Wake word is common for all, actually it is trained over a variety of voices. The model in which it is trained is mostly neural nets. It can be activated by anyone who invokes it. If we compared it with personal wake up words, they are customized and trained locally.
The Universal does not allow random wake words while personal does, though both of them are fed into machine learning.
There are multiple technologies working together which activates wake-up word.
- Listener – There is a number of microphones which listens the wake up word. They filter out the background noise, differentiate between wake up words and others and activate the device. After listening to the voice it also follow the direction from where this voice is coming.
- Built-in Memory – It has a limited memory which normally retains for three seconds to collect the inputs from the user, process it, and delete the data after getting new input. This gets written old data.
- Data Processing- This is done by feeding the input to a series of neural nets which understands the human requirements and responds to it accordingly. Every word it hears passes through multiple layers of testing. This testing determines if the word is the wake word. After the word passes through several layers of verification, and the device evaluates that it was actually the wake word, so that it can start recording.
Performance of Wake word recognition:
We can see or measure the performance of Accuracy of wake word recognition by evaluating the feature like, it should not start recording without someone directly speaking to it. Recognizing the word and not getting it confused with other words is a top priority. The software relies on data that represents every possible way to say the wake up word. This data helps the device determine when the user is speaking to the device. It has to ensure it doesn’t activate by hearing the word in casual conversation or background noise.
Using neural network, device does not start recording when the word Alexa is found in background noise. If someone says, “wake up word,” during a television program, the network should understands that it is not likely that a large number of devices all heard the same word, in the same tone, at the exact same time. It will ignore the sound and not begin recording.
Available wake word systems in the market
a) Raven: This system is based on the Snips Personal Wakeword Detector and works by comparing incoming audio to several pre-recorded templates.
b) Porcupine: Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening voice-enabled applications. It is
- using deep neural networks trained in real-world environments.
- compact and computationally-efficient making it perfect for IoT.
- cross-platform. It is implemented in fixed-point ANSI C. Raspberry Pi (all variants), Beagle Bone, Android, iOS, watchOS, Linux (x86_64), Mac (x86_64), Windows (x86_64), and web browsers are supported. Furthermore, Support for various ARM Cortex-A microprocessors and ARM Cortex-M microcontrollers is available for enterprise customers.
- scalable. It can detect multiple always-listening voice commands with no added CPU/memory footprint.
- self-service. Developers can train custom wake phrases using Picovoice Console.
c) Snowboy: Snowboy is an highly customizable hotword detection engine that is embedded real-time and is always listening (even when off-line) compatible with Raspberry Pi, (Ubuntu) Linux, and Mac OS X.
Snowboy is:
- highly customizable allowing you to freely define your own magic hotword such as (but not limited to) “open sesame”, “garage door open”, or “hello dreamhouse”. If you can think it, you can hotword it!
- always listening but protects your privacy because Snowboy does not connect to the Internet or stream your voice anywhere.
- light-weight and embedded allowing you to runs it on Raspberry Pi’s consuming less than 10% CPU on the smallest Pi’s (single-core 700M Hz ARMv6).
- Apache licensed!
Currently, Snowboy supports:
- all versions of Raspberry Pi (with Raspbian based on Debian Jessie 8.0)
- 64bit Mac OS X
- 64bit Ubuntu (12.04 and 14.04)
- iOS
- Android with ARMv7 CPUs
- Pine 64 with Debian Jessie 8.5 (3.10.102)
- Intel Edison with Ubilinux (Debian Wheezy 7.8)
d) Pocket sphinx: one of Carnegie Mellon University’s open source large vocabulary, speaker-independent continuous speech recognition engine.. This is an early release of a research system.