With the evolution of Natural Language Understanding, number of devices came in the market like Alexa, Google Assistant etc. These devices get activated with the wake word. A wake word is a special word like  “Hey Siri”, “OK Google”, and “Alexa”, which activates the devices. Wake word also known as ‘hotword’, ‘trigger word’, and ‘wake up word’ are the phrases which help end users to initialize the devices.

After invoking wake words voice assistant’s records commands from the invoker (user) and acts according to its command. It separates the background noise and act as per instruction.

Types of Wake Words:

There are two kinds of wake word detectors: Universal and Personal.

The Universal Wake word is common for all, actually it is trained over a variety of voices. The model in which it is trained is mostly neural nets. It can be activated by anyone who invokes it. If we compared it with personal wake up words, they are customized and trained locally.

The Universal does not allow random wake words while personal does, though both of them are fed into machine learning.

There are multiple technologies working together which activates wake-up word.

  1. Listener – There is a number of microphones which listens the wake up word. They filter out the background noise, differentiate between wake up words and others and activate the device. After listening to the voice it also follow the direction from where this voice is coming.
  2. Built-in Memory – It has a limited memory which normally retains for three seconds to collect the inputs from the user, process it, and delete the data after getting new input. This gets written old data.
  3. Data Processing- This is done by feeding the input to a series of neural nets which understands the human requirements and responds to it accordingly.  Every word it hears passes through multiple layers of testing. This testing determines if the word is the wake word. After the word passes through several layers of verification, and the device evaluates that it was actually the wake word, so that it can start recording.

Performance of Wake word recognition:

We can see or measure the performance of Accuracy of wake word recognition by evaluating the feature like, it should not start recording without someone directly speaking to it. Recognizing the word and not getting it confused with other words is a top priority. The software relies on data that represents every possible way to say the wake up word. This data helps the device determine when the user is speaking to the device. It has to ensure it doesn’t activate by hearing the word in casual conversation or background noise.

Using neural network, device does not start recording when the word Alexa is found in background noise. If someone says, “wake up word,” during a television program, the network should understands that it is not likely that a large number of devices all heard the same word, in the same tone, at the exact same time. It will ignore the sound and not begin recording.

Available wake word systems in the market

a)     Raven:  This system is based on the Snips Personal Wakeword Detector and works by comparing incoming audio to several pre-recorded templates.

b) Porcupine:   Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening voice-enabled applications. It is

c) Snowboy:  Snowboy is an highly customizable hotword detection engine that is embedded real-time and is always listening (even when off-line) compatible with Raspberry Pi, (Ubuntu) Linux, and Mac OS X.

    Snowboy is:

Currently, Snowboy supports:

d) Pocket sphinx: one of Carnegie Mellon University’s open source large vocabulary, speaker-independent continuous speech recognition engine.. This is an early release of a research system.   

Leave a Reply

Your email address will not be published. Required fields are marked *