dot.LA

Vocal Deepfakes Are Here To Make You Question Who You're Listening To

Midjourney

This is the web version of dot.LA’s daily newsletter. Sign up to get the latest news on Southern California’s tech, startup and venture capital scene.

This week's newsletter sponsor is WeWork. Unlock coworking spaces near you with WeWork All Access. Get 25% off WeWork All Access monthly membership fees for 5 months. Terms apply. Visit wework.com to get started.

Last week, a team from Microsoft posted a new voice synthesis machine learning model called VALL-E to GitHub. That’s a lot of words, but in simpler terms, the idea is basically vocal deepfakes, or software capable of accurately imitating the real sound of a human voice, even a specific human voice.

'You Talking to Me?' This New Startup Uses Deepfake Technology for Movie Dubbing

assets.rbl.ms

Robert De Niro's performance seemed totally off. In fact, nothing about the movie was working.

The problem, as Scott Mann saw it, was that the dubbing process had created a dissonance between the original film and the translated version he was watching, the cumulative effect of which was that it had "become a different film." That was Mann's opinion when he first saw a translated version of "Heist," the 2015 film about a casino robbery that he directed.

"I saw the foreign translations of my film, and I was horrified," said Mann, who lives in the Hollywood Hills. "As filmmakers we don't tend to watch our own movies in foreign languages and when I was made aware of it, it really shocked me."

Film director Scott Mann (left) and Robert De Niro on the set of "Heist".

International markets have become more important than ever for film and television. Projects rely on foreign audiences to recoup their costs; as the U.S. market grows saturated, streaming services are increasingly competing for overseas viewers.

Netflix, for instance, consistently sees foreign-produced projects like "Money Heist" (aka "La Casa de Papel") and "Lupin" as among its most popular shows. Last year "Parasite" became the first foreign-language film to win an Oscar for Best Picture.

As Mann sought a better solution for his films, he says he came across a research paper that described a new artificial intelligence solution to the dubbing problem. On Monday, he launched a company that uses the technology, and which he hopes will become the new standard in Hollywood and beyond.

Flawless, which Mann co-founded with Nick Lynes of his native England, uses technology that was developed by a team of researchers at Germany's Max Planck Institute. In the paper detailing their "style-preserving" approach, the group explains how their technique improves upon the dubbing technology the industry relies on.

Traditional dubbing techniques can be categorized into two groups, they write. In the audio-only approach, a foreign-language "source" actor recites the translated script and the resulting audio file is merged with the original video of the "target" actor. The translated script must be rewritten, of course, and is often done so to try to preserve alignment between the lip movements of the target actor and the audio footage from the source's translation. For example, when the target actor pronounces a close-mouthed "b", "m" or "p" sound in the original, the translated script aims to use a foreign word that requires a similar lip movement. But working within these constraints can cause the script's nuances to get literally lost in translation.

The second approach, called "visual dubbing," changes the visual footage of the original performance, so that the actor's lip movements align with the translated audio. The downside, write the researchers, is that doing so "removes the person-specific idiosyncrasies and the style of the target actor, and makes the target actor's face move unnaturally like the source actor."

The new technology uses artificial intelligence to transfer the source actor's lip movements so that, by contrast, it matches the target actor's characteristic style. In doing so, the company hopes it will appear to viewers that De Niro himself is actually speaking the dubbed foreign language.

How Flawless Works

The startup uses technology similar to "deepfake" techniques that manipulate video footage to make a subject appear to be saying something he or she is not. In Flawless' case the technology is used in service of a more realistic translation of the film, rather than, say, creating an embarrassing moment for a politician or celebrity.

To achieve that, its artificial intelligence and machine learning model analyzes video footage of the actor – De Niro, in this example – to learn the subtle nuances of how his lip movements relate to other features of his face, including head position and eye motion. On the other end, it analyzes the translator's performance, then generates new footage that maintains De Niro's style while incorporating the lip movements of the translator. The new footage is rendered exclusively on the actor's (De Niro's) face, and keeps the original background unchanged.

The original researchers write that five-minute videos are sufficient for training their "style translator." Mann added that "for 'locked/completed' productions, the process takes around 10 weeks from ingestion to finals."

The company has signed its first contract but is "sworn to secrecy on the details," Mann said, adding that pricing will be based on the number of translations and the length of the film.

The company has so far been self-financed. It received some external funding in 2020, but won't disclose how much. Lynes wrote it will be raising a Series A round of funding later this year.

In their paper, the researchers cautioned that their technology could be used in pernicious ways as well. They say they have been working on AI technology "to detect synthetically generated or edited video at high precision to make it easier to spot forgeries."

Despite that potential, Mann thinks the technology could "completely change filmmaking" – well beyond dubbing.

"What you're constantly doing as a filmmaker is you're doing the same thing again and again from different angles," he said."From a production point of view, that's very time consuming and expensive. What this allows, is you could get one great take, from one angle and then essentially have it recreated from other angles."

He also thinks it could make remaking foreign films a thing of the past.

"Rather than having to remake a movie in English and all these different languages, we'll be able to enjoy the original," he said. "I think that will have an effect on the international community of filmmaking, and all types of actors that really should be on the world stage but currently aren't because no one speaks that language."

"I do think this is the tip of the iceberg on this technology in the film field," he added.

From Your Site Articles

deepfake

How the LA Public Library Is Leveraging TikTok To Bring People Back

LA Venture: Dangerous Ventures’ Gaby Darbyshire On ‘Shining a Bright Light’ on Difficult Problems

What We’re Reading...

@deeptomcruise

'You Talking to Me?' This New Startup Uses Deepfake Technology for Movie Dubbing

How Flawless Works

How Will LA Look in 2028? A Look at the City's Plan To Embrace Transformational Tech

The LAPD Spends Millions on Spy Tech. Here’s What They’re Buying

FCTRY LAb Raised $6M To Continue Expansion

This Digital 'Mirror' Measures Every Aspect of Your Body, Making It Easier To Try On Clothes Online