• John Ennis

Eye on AI - November 22nd, 2021

Welcome to Aigora's "Eye on AI" series, where we round up exciting news at the intersection of consumer science and artificial intelligence!

This week, we’ll be addressing two important advances in AI speech: synthetic speech’s evolving clarity, functionality and use cases, and healthcare’s use of voice recognition and natural language processing (NLP) to improve inefficiencies.


Enjoy!


What Is Synthetic Speech, And How Is It Created By AI?



If you’ve ever asked Siri or Alexa a question or listened to a Stephen Hawking lecture, you’re familiar with synthetic speech. Synthetic speech is, to put it simply, the computer generated production of human words, and it’s used by our smart phones and speakers to answer voice queries, and what those old Macintosh computers used to say ‘hello’. If you’re thinking of that robotic voice void of human emotion or inflection, that’s the one. And for a long while that was the only synthetic voice, or some version of it.

Recently though, synthetic speech has seen an evolutionary leap forward, which is the topic of Bernard Marr’s recent article, “How AI Creates Synthetic Speech”.


“Traditional text-to-speech robotic voices you hear on software or hardware products like Amazon Echo, Google Home, your GPS, or your ebook reader are fast and cheap for companies to create, but they can also be unoriginal and unrealistic,” writes Marr. “Artificial intelligence or AI voice operates a little differently. AI voice uses deep learning to create higher-quality synthetic speech that more accurately mimics the pitch, tone, and pace of a real human voice.”

Why is this change significant? Suppose you were looking for new ways to increase exposure to an article written on your website. Traditionally, visitors would only be able to read the article. This limits how and when it can be consumed. Synthetic voice gives visitors a second option, which doubles the article’s potential consumption (it’s difficult to read on a jog or in the car). The problem is that, until recently, most synthetic voices had that same emotionless, inflectionless, mechanical tone, and could only be spoken in a single language, neither of which is good for information retention or exposure.


Now, using AI programs like LOVO AI, users can choose from synthetic voice script options like language, style, and character to better represent the content being shared. They can even upload voice samples, even their own, for LOVO to mimic with near lifelike accuracy (see video in link for real to synthetic comparison; it’s pretty impressive stuff). The potential for human-like synthetic voices are practically infinite, from video translations and video or audio ads to e-learning lessons and AR and VR experiences where scripts can be written without the additional step of recording necessary. Celebrities are even embracing the synthetic voice, and are now being given royalties for synthetic reproductions of their voices; crazy, I know. But it’s true. That tells you just how good AI synthetic voices have become.


How AI, Synthetic Speech, and NLP Are Improving Healthcare Call Centers



Like synthetic speech, medical call centers are also getting a voicelift with the help of AI; and yes, I’m talking about those call centers, the ones that on average disconnect 13% of users before connecting them with an agent and cause 60% of users to hang up in frustration. According to the article “How AI and NLP are helping healthcare call centers to be more efficient”, those inefficiencies are being significantly reduced using AI and NLP.


“To meet high customer service demands, healthcare providers are turning to automation technologies like voice recognition to strengthen efficiencies, improve performance, reduce costs and improve the patient experience,” writes TechRepublic contributor Mary Schacklett. “One of the technologies they are implementing in their call centers is context artificial intelligence-based speech recognition.”

By training automated call centers with better natural language processing (a form of AI), researchers have allowed them to achieve a more nuanced understanding of language, inflection, and emotional escalation than ever before. They’re now able to better understand users and offer informative and emotionally relevant responses to their questions. Using voice recognition, they can remember users and pull up information quickly to resolve simple requests, freeing up human agents to respond to more complex inquiries. Add to that AI-driven synthetic voices, one can almost imagine having a pleasant medical call center experience –– maybe.



Other News


That's it for now. If you'd like to receive email updates from Aigora, including weekly video recaps of our blog activity, click on the button below to join our email list. Thanks for stopping by!