When I was very, very young I had an Apple II, and a few scattered magazines. One of the letters-to-the-editor asked for help getting a text-to-speech engine called “SAM” to work, and it stood out in my memory because where the rest of the magazine was interesting and informative, the editors’ reply to that letter was just “Never heard of it, maybe one of our readers knows something?” It was incredibly frustrating to my young mind, because they didn’t know the answer, and I didn’t know the answer, and I had no access to later issues of the magazine so I could never find out if anybody else knew the answer.
And here we are 30 years later and now I do know the answer, but I’ve forgotten what magazine it was, so I can’t write back and tell them.
Ah, another C=64 kid here: I plugged a carbon microphone from a discarded telephone to the potentiometer input in the game port (used for paddles) so that whenever anyone went to the computer and said anything, the C=64 would reply with “That’s very interesting!” (in Spanish, mind you, using trial and error to find letter combinations that, when spoken by SAM as English, would produce Spanish-like phonemes), which kept people entertained for a few interactions until they figured out that it was a fixed answer.
Another program I made was “The Polite Questioner” (“El preguntón educado”) which posed questions to you and checked your typed responses, and either congratulated you if correct or berated you (and gave you a second chance!) if not, but always insulted you regardless. The insult collection included one from the Ghostbusters movie.
The only sppech synthesis engine which can synthesize Polish language without making an “uncanny valley” felling to the listener, like Ivo Software’s Ivona (it’s now bought by Amazon, whaaat?!) does. And it runs on Commodore 64!
For a related tangent, check out Bill Lange’s blog post about the Alien Group Voice Box for Atari 400/800 personal computers. It’s one of several voice synthesizer devices for the 8bit micro market. Toward the bottom of the post, there’s also a link to an episode of Kevin Savetz’ podcast with Mike Matthews of Electro Harmonix, who offers some more background and context for the peripheral.
Is it just me or is the quality really bad? Or is that to be expected?
The speed parameter behaves very unpredictably
$ time -p ./sam -speed 100 'hello world'
real 1.40
$ time -p ./sam -speed 200 'hello world'
real 2.40
$ time -p ./sam -speed 280 'hello world'
real 0.42
Yes, that’s the quality I remember, which was amazing at the time. :-D
Just in case we might be getting different results, here is what it’s supposed to sound like:
https://matracas.org/tmp/sam-hello-world.wav
If you put a full-stop at the end, the intonation sounds a bit more natural but also unenthusiastic.
As for the speed parameter, that’s 8-bit overflow: the slowest is at 256, and if you roll it over to 257 it’ll speak so fast it’ll just be a blip.
Thanks! That explains it. It not bother by the intonation, much more by the audio artifacts when it “screeches”.
At least that problem doesn’t get worse at higher speeds (around 20) and is still understandable. I know this is old (but never had a C64 to experience it with) but couldn’t tell if its a porting bug or not. Maybe I could have worded my last comment better.
It ran on a system with 1MHz CPU and 64KB of RAM during the year they were cranking out state-of-the-art experiences like this one. Relative to the time, great sound quality to hardware ratio. Today, not so much but that comparison is cheating. ;)
Looks like before Moore’s law really kicked in, you just had to do with what you have. And they did an amazing job with that hardware from the looks of it!
Though I was comparing it to eSpeak, which is still pretty small but I don’t know its CPU and memory footprint. (That page says it was originally for Acorn/RISC_OS computers but then relaxed those constraints.)
Interestingly a 1930s analog system sounds a lot smoother. Partly I think because it doesn’t have to drive digital audio output; you effectively get a very high “clock rate” with a specialized analog circuit. E.g. Wikipedia says this 1939 device is running 10 bandpass filters, which a C64 probably can’t do in realtime. Kind of interesting that you could do it with 1930s tech though, just not the same kind of tech. (The Voder also cheats compared to a TTS system by having a trained human key in the phonemes.)
That was really neat! It was human controlled, though. I think that disqualifies it for automated synthesis. The fact that it was analog certainly gives it an advantage since analog doesn’t pause (clockless realtime), uses a fraction of the power, and operates on mediums closer to vocals. I’ll do a submission close to tuesday afternoon you’ll like on them.
I haven’t heard this voice in decades. I didn’t know I missed it. Now that I played with it, I’m filled with nostalgia. I loved this robotic voice then, still love it now. I think I’ll make a few ringtones with it…
When I was very, very young I had an Apple II, and a few scattered magazines. One of the letters-to-the-editor asked for help getting a text-to-speech engine called “SAM” to work, and it stood out in my memory because where the rest of the magazine was interesting and informative, the editors’ reply to that letter was just “Never heard of it, maybe one of our readers knows something?” It was incredibly frustrating to my young mind, because they didn’t know the answer, and I didn’t know the answer, and I had no access to later issues of the magazine so I could never find out if anybody else knew the answer.
And here we are 30 years later and now I do know the answer, but I’ve forgotten what magazine it was, so I can’t write back and tell them.
Ah, another C=64 kid here: I plugged a carbon microphone from a discarded telephone to the potentiometer input in the game port (used for paddles) so that whenever anyone went to the computer and said anything, the C=64 would reply with “That’s very interesting!” (in Spanish, mind you, using trial and error to find letter combinations that, when spoken by SAM as English, would produce Spanish-like phonemes), which kept people entertained for a few interactions until they figured out that it was a fixed answer.
Another program I made was “The Polite Questioner” (“El preguntón educado”) which posed questions to you and checked your typed responses, and either congratulated you if correct or berated you (and gave you a second chance!) if not, but always insulted you regardless. The insult collection included one from the Ghostbusters movie.
That was a great way to learn about computers.
The only sppech synthesis engine which can synthesize Polish language without making an “uncanny valley” felling to the listener, like Ivo Software’s Ivona (it’s now bought by Amazon, whaaat?!) does. And it runs on Commodore 64!
Here’s the proof
Note that Google Assistant / Google Maps TTS is just based upon pre-recorded strings composed nicely.
I remember that I used it on my 8-bit Atari 65XE - indeed it was the same engine. That time it was amazing experience.
For a related tangent, check out Bill Lange’s blog post about the Alien Group Voice Box for Atari 400/800 personal computers. It’s one of several voice synthesizer devices for the 8bit micro market. Toward the bottom of the post, there’s also a link to an episode of Kevin Savetz’ podcast with Mike Matthews of Electro Harmonix, who offers some more background and context for the peripheral.
I remember using this (pretty sure it was SAM, at least) to make prank calls using a Commodore 64.
this is pretty awesome, the distorted quality of the vocals sounds amazing to my ears =)
Is it just me or is the quality really bad? Or is that to be expected?
The
speed
parameter behaves very unpredictablyYes, that’s the quality I remember, which was amazing at the time. :-D Just in case we might be getting different results, here is what it’s supposed to sound like: https://matracas.org/tmp/sam-hello-world.wav
If you put a full-stop at the end, the intonation sounds a bit more natural but also unenthusiastic.
As for the speed parameter, that’s 8-bit overflow: the slowest is at 256, and if you roll it over to 257 it’ll speak so fast it’ll just be a blip.
https://matracas.org/tmp/sam-lobsters-reply.mp3
Thanks! That explains it. It not bother by the intonation, much more by the audio artifacts when it “screeches”.
At least that problem doesn’t get worse at higher speeds (around 20) and is still understandable. I know this is old (but never had a C64 to experience it with) but couldn’t tell if its a porting bug or not. Maybe I could have worded my last comment better.
It ran on a system with 1MHz CPU and 64KB of RAM during the year they were cranking out state-of-the-art experiences like this one. Relative to the time, great sound quality to hardware ratio. Today, not so much but that comparison is cheating. ;)
Looks like before Moore’s law really kicked in, you just had to do with what you have. And they did an amazing job with that hardware from the looks of it!
Though I was comparing it to eSpeak, which is still pretty small but I don’t know its CPU and memory footprint. (That page says it was originally for Acorn/RISC_OS computers but then relaxed those constraints.)
That looks cool. Didn’t know about it. Thanks for the link!
Interestingly a 1930s analog system sounds a lot smoother. Partly I think because it doesn’t have to drive digital audio output; you effectively get a very high “clock rate” with a specialized analog circuit. E.g. Wikipedia says this 1939 device is running 10 bandpass filters, which a C64 probably can’t do in realtime. Kind of interesting that you could do it with 1930s tech though, just not the same kind of tech. (The Voder also cheats compared to a TTS system by having a trained human key in the phonemes.)
Super cool! Too bad it needs month of training to operate
That was really neat! It was human controlled, though. I think that disqualifies it for automated synthesis. The fact that it was analog certainly gives it an advantage since analog doesn’t pause (clockless realtime), uses a fraction of the power, and operates on mediums closer to vocals. I’ll do a submission close to tuesday afternoon you’ll like on them.
That submission is here.
Totally expected. Go read the Wikipedia Article - this code was released in 1982.
Making no assumptions about the reader, were you born yet? :) In any case, hopefully that will give you some context.
I haven’t heard this voice in decades. I didn’t know I missed it. Now that I played with it, I’m filled with nostalgia. I loved this robotic voice then, still love it now. I think I’ll make a few ringtones with it…