Elon Musk’s xAI project has added custom voice models to its expanding feature set, which enable users to generate audio voice samples that replicate their own, based on just a few seconds of audio.
The functionality, now available within xAI’s management tools, will provide a new way to add a human touch to digital audio, by replicating any person’s voice for use in other applications.
This could be a little concerning in regards to potentially misrepresenting what people have or have not said. But xAI said it has a process in place to limit misuse, and ensure that its voice replicants are only applied in approved ways.
That could facilitate custom customer support bots, enhanced content narration in a user’s own voice, and improved accessibility features, among other uses.

In order to counter potential misuse of the option, xAI said that every voice recording will go through a two-step verification process before it can be created.
As per xAI: “First, the speaker reads a verification phrase that our STT engine transcribes and matches in real time, confirming intent and presence. Then we compute speaker embeddings from the verification clip and the full recording to confirm they belong to the same person.”
The idea is that this will then ensure that the voice being replicated is from a person who has spoken the text, and thereby approved such usage.
This isn’t foolproof, and the tool could still be misused to represent what a person says. There’s also a question about what happens to those voice recordings in future, and how they might be used after an employee leaves the business.
But xAI believes this process will help to ensure safety in the use of the tool, and limit the capacity for people to make replica voices based on recordings, or from unapproved sources.
It remains to be seen how that works in practice.
In addition to this, xAI has also expanded its built-in voice catalog to more than 80 voices across 28 languages, giving users plenty of options for generating audio samples for their usage.
AI tools are inevitably going to facilitate more deepfakes and misinformation, and in that sense, this process isn’t adding any major new safety risks. Indeed, xAI could argue that this will enhance safety on this front, by ensuring that a real person has supplied and approved the initial recording, but it does feel like it will see misuse, and could lead to problems in future.
But maybe voice cloning like this is inevitable, and the best-case scenario here is that the big tech platforms will enact some level of verification to protect against misuse.

