1. 18

  2. 9

    Obviously, this should be combined with Face2Face.

    1. 3

      VR photo/audio-realistique celebrity porn is sure to come soon.

    2. 6

      Voice is another type of biometric. It has been well established that biometrics should only be used as ‘usernames and not passwords’. So the idea of assuming trust based on evaluating voice should be deprecated as soon as possible.

      It is curious though, if there will be any implicit way to guarantee identity in the future over digital media. An in-person meeting is one thing, but maybe everyone is just going to need some sort of private key that will be used in all transactions requiring trust.

      1. 4

        There are more legit domains for this sort of tech. (I hope this isn’t too much of a diversion to the topic.)

        I’m thinking of the Kemper Profiling Amp, specifically. It creates incredibly accurate simulations of guitar amplifiers and the corresponding speaker cabinets by profiling an amp. Basically, it sends a bunch of sounds into an amp + cab, then compares how differently they sound coming out. The result of this is a snapshot of an amp + cab at a particular set of settings. It sounds kind of silly, but it is a vastly different approach to other modeling techniques, which focus on modeling discrete components used in amps, and simulating the complex harmonics that emerge from distortion.

        The KPA itself is not cheap, but there is definitely a market for a software-based solution that gets 80% or 90% of the way there using similar tech to this.

        1. 3

          This will be used in a ton of scams and make social engineering a walk in the park.

          1. 3

            Did you actually listen to the demos?

            They are somewhat recognizable as the person but a long way from sounding natural and making scams a “walk in the park”.

            1. 2

              Their samples say they are not cherry-picked, so I’d believe it could be more convincing with the current technology. And in security, we say “attacks only get better”.

              1. 1

                They’re not perfect, but I’d imagine they’d sound a lot more realistic if you had to listen to them over the phone.

              2. 1

                Good. Maybe as this technology becomes more accessible, we’ll move on to crypto authentication.

              3. 2

                I suspect we will soon return to the evidentiary state of affairs of the 19th century; the only admissible forms of evidence will be physical or testimonial. Anything digital can be faked, now cheaply. (Except non-repudiable crypto signatures.)

                1. 2

                  “Copy” with sufficiently loose definition of copy. It’s a great idea, with okay execution for text to speech but it’s a gross overstatement to call this a copy lol.

                  1. 2

                    The quality of their actual synthesis seems to be 10 years behind the curve, but I expect that will change sooner or later. The intonation is pretty good, and the prosody is good except when it’s terrible.

                    Given how they emphasize the way that it produces subtly different results every time, I’m going to bet they’re using a GAN. Lots of other generative techniques, when trained on data that has a lot of diversity in it, tries to minimize the loss across the whole dataset by “shooting right down the middle”, which produces output that’s boring and often not even that good (e.g. a font generator that can’t decide between a one-story and two-story “a” will generate a half-breed version that’s actually completely implausible). But GANs seem to be much better at actually modeling the diversity so that drawing randomly from the latent space gives generated outputs that are varied, but still plausible.

                    1. 2

                      I’m going to have to file this release under “grossly irresponsible”.

                      That said, I think they raise an interesting approach in their “Ethics” section:

                      By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks. We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future.

                      I mean, they’re a startup and a business and hence full of shit, but it’s a nice idea.