I came across this video on YouTube the other day, so I thought I'd share it with you guys.
Basically, there's an AI startup called Lyrebird based in Montreal, Canada which has figured out a way to clone any voice with frightening precision. Not only that, it achieves this by using just a tiny sample of your voice and can even say words that you haven't spoken during training.
This is pretty remarkable considering that this sort of thing was impossible just a few years ago. But now with the advent of Deep Learning, a type of Machine Learning that mimics (albeit on a small scale) the inner workings of the human brain, companies like Google and now Lyrebird have been able to create very convincing, life-like synthesized voices.
What makes Lyrebird standout however is the small amount of time it takes to train it's algorithm on your voice. It takes just 60 seconds of voice recording whereby you read 30 seemingly random sentences out loud. In comparison, Adobe's Project VoCo takes at least 20 minutes of sample audio to effectively achieve the same outcome. So you can see just how powerful and advanced Lyrebird's software is.
Granted, the results aren't perfect and they certainly aren't indistinguishable from human speech, but it's still pretty impressive and you can certainly see a future where synthesized voiced will become indistinguishable, which of course brings with it a whole host of potential uses (and problems).
Check out the sound clip below to hear the synthesized voices of Donald Trump, Barack Obama, and Hillary Clinton discussing the startup:
Lyrebird claims its algorithms can infuse the speech it creates with emotion. In other words, it will allow customers to create voices that sound angry, sympathetic, or stressed out.
The startup also points out a number of good uses for this kind of technology such as "reading of audio books with famous voices, for connected devices of any kind, for speech synthesis for people with disabilities, for animation movies or for video game studios.".
Of course there are some more sinister uses that immediately come to mind. It's not hard to imagine algorithms like this being used to trick voice biometric systems used in banks for example. And neither does it take a great deal of imagination to see how this could be combined with other deep learning technologies to create viral videos of famous politicians to make fake news even more realistic.
But for now, we don't need to worry. Although Lyrebird does do a good Trump impression, its other voices are definitely more robotic sounding:
If you want to check this out for yourself, just head over to the Lyrebird website, sign up and start recording. You can clone your own voice by reading out loud 30 different sentences which are recorded and used to train the algorithm on your voice. Behind the scenes, it takes quite a bit of computing power to generate a voice-print, but once the algorithm is trained, the speech is easy to make. You simply type what you want it to say and within a second, your text is turned into speech, in your voice of course.
I had some great fun with this and my daughter certainly enjoyed it too - if a little bemuzed by it a first ;)