A smart speaker, which is equipped with a voice recognition function, which is one of the AI (artificial intelligence) learning technologies. The smart speaker can understand the intent of the speaker’s words, find the app according to the instructions from the linked device, and play music or make a call.
This time, I would like to explain the contents of smart speakers and typical products.
What is a smart speaker?
Smart speakers are home voice assistant machines that have appeared in the last few years. It is said to be an extension of the technology of the voice assistant “Siri” that is familiar to iPhone and Apple Watch users. It’s easy to think of a smart speaker as a voice assistant in a smartphone with dedicated ears, mouth, and brain that can be moved independently.
Contents of smart speaker
People use their ears, mouth, and head when talking face-to-face or over the phone.
Smart speakers are equipped with speakers, microphones, and network functions. If you replace these with human face-to-face or telephone conversations, the microphone will speak, the speaker will listen to people, and the network function will be in your brain. Applicable. AI (artificial intelligence) consists of voice recognition technology and question answering technology in the brain.
Speech recognition technology and question answering technology
Speech recognition technology
Speech recognition technology is a technology that converts the waveform generated by uttering a voice into characters. Instead of hitting the keyboard, you can input by voice to turn the voice into text data that is easy for your computer to handle.
General speech recognition consists of the following main functions.
- Acoustic model: Modeling the sound characteristics of language
- Language model: Model grammatical features such as word order and semantic features of sentences
- Acoustic feature extraction (acoustic analysis): Converts input audio data to make it easier to handle in an acoustic model.
Question answering technology
Question answering technology is a technology that returns answers to questions asked in words. Broadly speaking, the contents consist of the following four flows.
- Interpretation of the question
- Database and Internet search (information retrieval)
- Answer extraction
- Generation of answer text
For example, let’s say you ask a smart speaker, “How high is the Sky Tree?”
When a smart speaker receives a question from a human, it interprets the meaning contained in the question text. If the question is How high is the Sky Tree? The question “The word” Sky Tree “depends on the word” Height “” and “How many meters” is “Height”. ”Is pointed to.” By doing this, the smart speaker understands that the “height of the Sky Tree” is being asked by humans, and searches the database, the Internet, etc. for information sources that are likely to have the “height of the Sky Tree” written on it. ..
Typical smart speaker
Home-use products are already on the market all over the world. The representative products are introduced below.
“Amazon Echo” is the world’s best-selling smart speaker, which was launched in 2014 by Amazon, which is headquartered in the United States. Inside, a voice recognition technology called “Alexa” is used. It was released in Japan in 2017, but many people are more impressed by the commercials that talk to Amazon Echo, such as “Alexa, order XX.”
In addition to shopping on Amazon and using various services such as Amazon Music, users can also select and add functions called “SKill” developed by third parties.
“Google Home” is a smart speaker equipped with Google’s voice assistant “Google Assistant”. Although it was released in the fall of 2016, the Google Assistant itself is installed in Android devices such as the smartphone “Google Pixel” originally developed by Google. When you say “OK Google” to your Android smartphone, you will get a reply, which is exactly the Google Assistant. Since the Google Assistant is linked with the information of the account owned by the user, you can use various services such as schedule management and voice search.
“Home Pod” released in 2017. Apple calls it a “home wireless speaker.” It is equipped with Apple’s voice assistant “Siri” released by Apple in 2012. What makes it different from other smart speakers is that it has a strong color as a music speaker.
By the way, “Siri” is considered to be the origin of smart speaker technology. It was developed based on the state-of-the-art military technology at the time of its release, and because it supports various languages, it quickly caught the hearts of users all over the world. In addition, it is said that Siri has thoroughly adhered to its usability, such as making it a familiar voice and removing the text input function.
“Clova” is a cloud AI platform released by LINE and its parent company NAVER in March 2017. Clova is implemented in a smart speaker called “WAVE”, which is also characterized by a rich lineup, such as a trapezoidal one that embodies the mascot character of LINE.
It is the same as other smart speakers in that voice recognition is activated by talking to Clova, but in cooperation with various services operated by LINE and the LINE app on smartphones, all talk message reading and replying is completed by voice. I can do it. It can be said that the fact that the services operated by the company can be used from smart speakers using the voice recognition technology developed in-house is similar to the relationship between Alexa and Echo developed by Amazon.