Voice assistants like Google Assistant and Alexa are part of everyday life. They’re on phones, laptops, walls, and control smart homes. But they can be difficult to use, especially for anyone that speaks “nonstandard” English. Compies are trying to fix that problem, but what if that’s a bad thing?
By making voice assistants in smart homes and on smartphones easier to use, companies may actually be decreasing their users’ ability to function in the wider world. There are around 1.35 billion English speakers globally, of which 400 million are “native speakers.”
So, it’s safe to assume approximately 2/3 of English speakers have some degree of accent purely because it is not their first language. Then with the 400 million people who speak English as a first language, you have multiple national accents (British, Canadian, American, Australian, et al.). In each country, you have regional dialects and so forth.
If companies were to pick and perfect a single dialect, say American Standard English, their product would only be usable by a tiny fraction of English speakers. Conversely, if they go too far, they could rob people of what could be a very useful tool for developing their communication skills.
How Are Tech Companies Trying to Improve Things?
Voice assistants have been working to better understand their users’ commands for as long as they have existed. Microsoft, Apple, Google, and Amazon are amongst the big names that have pumped a ton of resources into their respective voice assistants and want to make them as accessible and frustration-free for as many people as possible.
This has involved hiring people with particular accents to record hundreds of voice commands and conversations, which can then be used to teach AI dialects. During one of my leaner months, I decided to cash in on my sexy Northern accent and spent hours recording hundreds of seemingly random words and phrases for a company called Appen.
That company then took my recordings and sent them off to Amazon, Google, Microsoft, or whoever else was paying them. The voice snippets are then theoretically used to improve whatever AI the company that bought them is developing.
Some voice assistants can even be trained to understand better the exact voice of the person using it. Unlike waiting for big tech to up their game, this produces immediate results and can help your voice assistant’s accuracy significantly. It also allows multiple users to access their smart home profiles without having to switch manually.
So, Why Might This Be a Bad Thing?
I might get away with saying: “Alexer, serruz an alarm for eight o clock tomorrer will yer,” but trying to request songs is where the struggle really begins. It took around three months of communicating with Amazon Music and a few thousand frustrated expletives, but I can now say “play Happy Hour by The Housemartins” as clearly as a 1980s BBC newsreader. There are still occasions when I ask for Paul Weller and somehow end up with Ella Fitzgerald, but there’s always room to improve.
The silver lining that has come with the accent struggles is the fact my English has improved. I can now communicate more clearly than ever before. This is useful because technology may improve to the point the AI on my smartphone can understand me—but that won’t do me much good when I’m using said phone to talk to another human being.
Another benefit is I haven’t utterly butchered my accent in the process. If I’d opted to shell out for elocution lessons instead, I might be rattling off sentences in received pronunciation. Identity is important; accents are an important part of someone’s culture and background.
The United Kingdom, for example, has a distinct accent every few miles. There’s a map that has been flying around the internet for a few years that looks extensive but still barely scratches the surface. A tiny part of the North East is labeled as having a “Teesside” accent, but the natives of each town in that area (Middlesbrough, Hartlepool, Stockton, and Billingham) all speak differently.
Now imagine the variations in a county the size of Yorkshire. People also tend to identify with where they’re from and preserve a lot of their culture. Accents are a large part of that; not everyone wants to sound like Hugh Grant. We may have been spoiled in recent years, as many people are now happy to sit back and wait for technology to make up for their shortcomings—and in a lot of cases, it will do just that. But sometimes, meeting tech in the middle is both quicker and better in the long run.
Voice assistants do need to be made accessible to as many people as possible. If you had to speak in perfect received pronunciation before Siri would give you the time of day, one of Apple’s most significant successes would be useless to over 99.9% of English speakers.
Even something like a standard American accent would rule out the majority of users in the United States, never mind worldwide. Hence, it’s obvious why companies are putting a lot of effort into teaching their software to understand as many dialects as they possibly can. And so they should. But they should only go so far.
It would be better if Apple, Google, et al. avoid adopting a perfectionist mentality and instead just aim for a standard that allows for accessibility but still requires a bit of care on the users’ part. On a personal note, Alexa’s unwillingness to listen to anything beyond clear speech forced me to think about how I pronounce things.
My speech is undoubtedly clearer than before I had to deal with a voice assistant multiple times a day. It wasn’t something I set out to do; it was an unintended and very beneficial side-effect—and if it worked for me, it might work for other people, too.