My favorite excerpt from this: “It’s just lazyness on the part of the player developers that they rely on the old methods, I guess they think bits are bits.”
Like many others who went to engineering school in the ‘80s, I appreciate high fidelity audio gear to the point of being a little bit of an audiophile. Also, like many people who knew electrical engineers from that era, the amount of bad advice that you can find in “audiophile” circles is astounding. And don’t get me wrong, I’ll tell you to your face that popular music sounds better today on vinyl rather than CD. What I won’t do is try to convince you that the reason for this is because digital is an inferior technology incapable of meeting audiophile standards because that’s not even close to the reason.
As far as your ears are concerned, louder sounds better. CD and digital have far greater fidelity to the original wave form than vinyl does. Vinyl also has less dynamic range, the difference between the loudest sound that you can record and the softest sound that you can record. When music was regularly sold in both formats in the late eighties, the mastering process for popular music was the same for both formats. In the early ‘90s people discovered that the common mastering process for CD and LP was leaving a lot of dynamic range unused when the music was pressed onto CDs. As the vinyl LP was falling out of favor, people discovered that you if you reduce the dynamic range of the music you can master it at a higher sound level or loudness on CD without generating distortion. To people listening to the music these louder pressings initially sound better. Every rock and pop artist on the planet wants their song to sound the best when played alongside other music. So artist started asking for this as part of the mastering process. The problem with doing this is that everything begins to get a wall of sound feeling to it. By making the soft parts louder and the loud parts softer so you can make the whole thing louder, you take some of the impact that the music would have with its original dynamic range. When vinyl records starting coming back in to favor, the music destined for LP was mastered the way they used to for vinyl back in the ‘80s. If you listen to two versions of a song, one mastered for LP, and the other mastered for CD, the LP will sound better in the first few plays before vinyl’s entropic nature ruins it, because the LP version will have more dynamic range. The same is true of two pressings of Pink Floyd’s “Dark Side of the Moon” if you are comparing an early 1980’s issue CD to a late ’90s CD reissue. This is really only true of Rock, Pop, and R&B. Classical and Jazz were unaffected by the Loudness War because fans of those genres put fidelity highest in their desired traits.
Summarizing: when you say you prefer vinyl over CD, you say that you prefer 80’s style mastering over the overly compressed end-90s+ mastering.
It’s interesting that the extra headroom on CDs sparked the loudness war, instead of resulting in better dynamics. And now that people expect music to have a certain loudness, I guess we can’t go back.
Perhaps one day we could get a new wave of artists mastering their 320kbps mp3s 80s-stlye?
A loud mix makes sense if you’re listening to music in a noisy environment. (On your commute, say.) But I’d rather have the ability to compress the dynamic range at the time of playing, so I can adjust it to suit the environment.
I used to have a decent, though inexpensive, stereo system setup. Back when I would sit down just to listen to music, with no other distractions, like the Internet.
But when was the last time I really sat down to listen to music? For me it is usually in the car, or through a pair of earbuds. Or maybe washing the dishes.
The extra headroom mostly provided the opportunity, alongside the fidelity and lack of physical limitations of a CD: on a vinyl if you try to brickwall you end up with unusable media.
What sparked the loudness war is the usual prisoner’s dilemna, where producers ask for more volume in order to stand out, leading the next producer to do the same, until you end up with tons of compression and no dynamic range left. Radio was a big contributor, as stations tend(ed?) to do peak normalisation[0], so if you leverage dynamic range widely you end up very quiet next to the pieces played before and after.
Perhaps one day we could get a new wave of artists mastering their 320kbps mp3s 80s-stlye?
To an extent it’s already been happening for about a decade: every streaming service does loudness normalisation[1], so by over-compressing you end up with a track that’s no louder than your neighbours, but it clips and sounds dead.
Lots of “legacy” media companies (software developers, production companies, distributors, …) have also been using loudness normalisation for about a decade following the spread of EBU R 128 (a european recommendation for loudness normalisation), for the same reason.
[0] where the highest sound of every track is set to the same level
[1] where the target is a perceived overall loudness for the entire track
That’s me. When I buy rock music on LP, I’m purchasing music mastered to the 1980’s LP standards. I do that because Rock music works well with the 60 dB or so of dynamic range that vinyl LP offers.
This is an excellent explanation. Being able to explain things clearly without hiding behind technical terms like “compression” is a strong indicator to me that you are a true expert in this field.
For drummers and bassists, “compression” is a well-known term, because compressing dynamic range is almost required in order to record them faithfully. The typical gigging bassist will have a compressor pedal in their effects chain for live performance, too.
I do appreciate it when digital releases are mastered in way that preserves dynamic range, and playing it after any typical digital release in the affected genres, it will sound really quiet.
I stream way more than I use my turntable, basically for the same reasons @fs111 mentions. But I definitely prefer vinyl because while streaming is pure consumption, vinyl is participatory. I enjoy handling the vinyl and really taking care of it (cleaning it when I get it/before I play it, taking care of the jacket, etc.). It makes me feel like a caretaker of music that’s important to me - a participant in the process, instead of just a consumer.
On my phone I listen to music. On my turntable I play music.
Possibly: I’ve read that over time many pop songs get remastered with more and more dynamic range compression. This makes all parts of the song sound similar in loudness, but also removes some musical features (dynamics) and depending on the DRC method (fast attack/decay, slow attack/decay, manual envelope adjustment) can introduce audible distortion.
Older vinyl record and CD releases are from earlier masters. Albeit some records are newly manufactured, so some will be based on newer remasters anyway.
Cannot confirm or deny, I don’t buy or listen to pop :/
In addition to the loudness wars people have been talking about, certain technical restrictions limit what can accurately be recorded on vinyl. This leads to a subtle “sound” that people get used to and prefer. This could be reproduced when mastering for digital audio formats, but people either don’t do that processing or “audiophiles” claim that it gets lost in translation somehow.
TLDR: Bytes are bytes but the linked to post is actually about latency, not data integrity.
I’m disappointment by the arrogance of users on this forum. I’ve never had great luck being %100 confident about the behavior of code without running it, but in this case, reading the actual code posted will show you that using a bad memcpy could have potentially cause a problem with audio quality. The user ran the code on their system and noticed a difference in audio quality. Can you be confident that this code, written by someone who may not be well-versed in C++ and Windows apis, wouldn’t produce different results based on the memcpy implementation used in a specific line?
The code:
Lets look at a cleaned up section of the code where the memcpy takes place:
// Read audio data in wav format into memory and store it at `sound_buffer`
BYTE *sound_buffer = new BYTE [sizeof(BYTE) * nBytesInFile];
hr = mmioRead(hFile, (HPSTR)sound_buffer, nBytesInFile);
mmioClose(hFile, 0);
// Send audio data to sound card/kernel
do {
WaitForSingleObject(hNeedDataEvent, INFINITE);
pAudioRenderClient->ReleaseBuffer(nFramesInBuffer, 0);
pAudioRenderClient->GetBuffer(nFramesInBuffer, &pData);
A_memcpy (pData, sound_buffer + nBytesToSkip, nBytesThisPass);
nBytesToSkip += nBytesThisPass;
} while (--nBuffersPlayed);
Lower down in the code we also find the following lines:
As far as I can tell these are just extraneous fluff.
Sound card buffers and the need for low latency
Now lets take a moment to understand what is happening here. The programmer is using the windows IAudioRenderClient. This allows the user to send little snippets of sound to the sound card by copying them into a buffer provided by the windows kernel. The soundcard in question (we don’t know what the specifications of his system were) probably has two buffers:
The source buffer: which it reads in from a serial connection.
The sink: a loopbuffer which it reads continuously into a digital to analog conversion mechanism. This loop buffer is fed by firmware from the source buffer.
If new messages to the source buffer don’t come in fast enough, the loop buffer will restart the loop. You’ll hear a clicking noise every time the loop restarts before a new source buffer comes in. You have almost certainly heard this clicking at some point in your lives, it is a rather hard problem to solve while still allowing for low latency sound when playing video games or musical instruments. https://www.youtube.com/watch?v=5m8GoJeqras
Going back to the code:
So in this loop looks like this:
Wait for an hNeedDataEvent, which gets triggered only after the windows kernel audio buffer is exhausted.
Now looking at the windows docs, the programmer is doing this wrong:
“The client is responsible for writing a sufficient amount of data to the buffer to prevent glitches from occurring in the audio stream. For more information about buffering requirements, see IAudioClient::Initialize.
After obtaining a data packet by calling GetBuffer, the client fills the packet with rendering data and issues the packet to the audio engine by calling the IAudioRenderClient::ReleaseBuffer method.”
This is not their fault, the docs for ReleaseBuffer are also misleading:
The ReleaseBuffer method releases the buffer space acquired in the previous call to the IAudioRenderClient::GetBuffer method.
“
Clients should avoid excessive delays between the GetBuffer call that acquires a buffer and the ReleaseBuffer call that releases the buffer. The implementation of the audio engine assumes that the GetBuffer call and the corresponding ReleaseBuffer call occur within the same buffer-processing period. Clients that delay releasing a buffer for more than one period risk losing sample data.
“
I think the code should look like:
do {
pAudioRenderClient->GetBuffer(nFramesInBuffer, &pData);
A_memcpy (pData, sound_buffer + nBytesToSkip, nBytesThisPass);
pAudioRenderClient->ReleaseBuffer(nFramesInBuffer, 0);
nBytesToSkip += nBytesThisPass;
} while (--nBuffersPlayed);
I’m about 30% sure that the call to WaitForSingleObject(hNeedDataEvent, INFINITE); is totally extraneous.
Now there is also another problem with the Windows API. It does not match buffer sizes between the kernel buffer and the sound card buffer. This buffer size misalignment could mean that the buffer size on the sound card is larger than nBytesThisPass. That means that in order to send the source buffer to the sound card, it is necessary to go through the loop multiple times. All of a sudden, we’re not doing just 1 memcpy but multiple, and since we’ve already exhausted the audio data sent to the card by waiting for hNeedDataEvent, this has to happen extremely fast in order to not have glitches. Audio is being sampled at 48 kHz, or once every 0.02ms. If these memcopies and memory allocations and everything else take longer than about 0.01ms (since we have to account for data transfer to the sound card as well) we will get glitching. It seems quite likely that even a tiny amount of extra time in the memcpy call could end up causing the glitching to get worse.
Edit: I was wondering myself how it is that two memcpy implementations could have a significant speed difference, but then I noticed his further comments:
“Here are the optimisation settings for x64 Intel processor, /MT, /fp:fast,
and the /O2 /Ob2 /Oi /Ot /Oy switches made the difference
/OUT:“C:\vs2010\playerextreme - mem - one file\source\x64\Release\playerextreme.exe” /INCREMENTAL /NOLOGO “libad64.lib” “libacof64.lib” “libacof64o.lib” “kernel32.lib” “user32.lib” “gdi32.lib” “winspool.lib” “comdlg32.lib” “advapi32.lib” “shell32.lib” “ole32.lib” “oleaut32.lib” “uuid.lib” “odbc32.lib” “odbccp32.lib” /MANIFEST /ManifestFile:“x64\Release\playerextreme.exe.intermediate.manifest” /ALLOWISOLATION /MANIFESTUAC:“level=‘asInvoker’ uiAccess=‘false’” /DEBUG /PDB:“C:\vs2010\playerextreme - mem - one file\source\x64\Release\playerextreme.pdb” /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /PGD:“C:\vs2010\playerextreme - mem - one file\source\x64\Release\playerextreme.pgd” /LTCG /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /ERRORREPORT:QUEUE
I also made it a single thread which again made a small improvement, also fixed the sizes of buffers etc so that numbers were used in the code. This means that different code would need to be used for different sampling rates. Also worked out a way to play gapless (next release) where the wav data is just appended to the buffer. Means that it won’t be able to play different sample rates gapless, but it suits my purposes.”
Basically, I think that he was moving from a win32 memcpy (copying 32 bits for every loop) to a 64 bit c++ memcpy copying 128? bit vectors per loop if I recall correctly. That should end up being like a 5x speed up.
I think the most important take away is actually that windows docs on ReleaseBuffer have been unclear. While at the very bottom of the ReleaseBuffer docs we find:
Clients should avoid excessive delays between the GetBuffer call that acquires a buffer and the ReleaseBuffer call that releases the buffer. The implementation of the audio engine assumes that the GetBuffer call and the corresponding ReleaseBuffer call occur within the same buffer-processing period. Clients that delay releasing a buffer for more than one period risk losing sample data.
the top says The ReleaseBuffer method releases the buffer space acquired in the previous call to the IAudioRenderClient::GetBuffer method. which makes it sound like it’s just a memory free.
Thank you for writing all this up. I’ve been bugged by the feeling of people punching down in this thread along with the fact the poster is actually talking about implementing an audio playback engine on which case these small differences can manifest as semi-perceptible aural differences due to latency, jitter, and scheduling.
I’m disappointment by the arrogance of users on this forum.
This is a programmer community, what’d you expect? Arrogance almost comes with the territory.
Things like rowhammer/ECCploit, van Eck phreaking and other such side channel attacks show that yes, we live in a physical world and this actually has an impact on the pristine and pure thought-stuff that we manipulate, and still people here refuse to believe that a different algorithm with different CPU branching patterns and RAM access patterns might have observable effects on audio output. I suppose we’re so used to abstracting all that stuff away that we get offended by the very idea that differences in how software utilizes hardware components inside the computer can actually have an impact on “unrelated” hardware.
I don’t understand. A sound card has a fixed sample rate, as long as the engine is filling the buffer fast enough*, it should be equivalent. If it isn’t, there will be obvious pops and clicks when the buffer underruns. If it is, the output is equivalent. I don’t see how some subtle sound quality difference could arise.
A fancy sound card has a sample rate of 48khz, a sample is 3 bytes, that’s copying around 144 kbytes per second. On a memory bus. Even an 8 bit naive implementation on a gateway 2000 could do that.
If you read the code, the author is literally waiting for the buffer to become exhausted before sending the data, of course that could lead to subtle and not so subtle sound quality differences.
The sound card sample clock is not attached to the OS at all, as long as the buffer isn’t underrunning and the samples are the same, the result is the same.
The things you describe result in defects noticeable by anyone though. That’s different from the allegedly minor differences you’d get due to using ‘gold plated cables’, which is obviously the kind of thing being referenced here. So although you very nicely explain how a memcpy implementation could influence sound, I don’t think it excuses the original inspiration for the OP.
That is a mystery to me. I really don’t know what is going on there, but it appears from the extremely low quality of the code, that we shouldn’t really trust the author in telling us which exact changes made a difference. My point was to point out that in this specific broken bit of code, the speed of malloc could potentially be important and thus the people making fun of the concept were being presumptive.
I think if the author of the code had demonstrated that they can tell the difference in a true double blind test, or on their testing methodology for determining that there existed a difference, people would take it more seriously.
As is, its just that the author claims it sounds different and we’re expected to believe. IMO, more likely, the author is hearing a difference because they expect to, is making a fool of themselves in public by posting about it and blaming libc, and now all the townspeople are pointing and laughing.
Having known a bunch of person who tended to fall into these kind of pricey rabbit hole, my surprising conclusion is that they are not particularly rich people. They just are very, very bad at making financial decision. You can expect a lot of these people to have loaded credit credit cards, a remortgage, and even a few personal bankruptcy to their name.
In one particular case, the guy had an undiagnosed ADHD and was constantly making impulse purchase for new Magic the Gathering card, despite being barely able to afford his rent.
I’m guessing audiophile are the very same type of people.
Also from reading the thread it sounds like the guy didn’t know how to program before starting this project and is entirely self taught. Sure he picked up some weird ideas but I gotta admire the work!
For those not wanting to click on the link, the possible explanation was that different memcpy implementations use different pipelines and so may rigger different power states. This is possibly more plausible than you might at first imagine. I had a computer some years ago where I could hear SSE-heavy workloads because the EM leakage from the CPU hitting those pipelines was picked up in the sound card, so an SSE-based memcpy would give worse sound than one using integer pipelines (possibly - I doubt SSE loads and stores were actually audible on that machine). This is why sensible audiophiles that I’ve known recommend a cheap USB audio adaptor over any expensive sound card for analogue output: simply moving the DAC a metre away from the electrical noise does more good than anything else. Outputting optical audio to an external decoder and amplifier can be better because then your audio system can be completely electrically isolated from the computer (there are a lot of chips in a typical computer case that are little radio broadcasters and any nearby wire may accidentally act as an antenna receiving the signal).
The other plausible explanation that I considered was jitter. Probably not the case 10 years ago, but 20 years ago the ring buffers that sound cards could DMA from were quite small and you could quite clearly hear when the CPU wasn’t filling them fast enough. A slow memcpy being preempted at the wrong time may well have caused artefacts. If you can fill the buffer easily within a single scheduling quantum and yield then you’ll be priority boosted the next time and so you’ll easily keep up playing music. If you’re occasionally being preempted mid-copy, then you’ll suffer from weird artefacts as the card overtakes the CPU and plays a few ms of the previous sample instead of the new one.
I don’t think either of these were the case here though, and this is why sensible audiophiles do random double-blind tests on themselves: play the thing n times, where half are with one version, half with the other. If you don’t get a statistically significant number of guesses as which is the better one landing on the same implementation, you’re just kidding yourself (which is very easy to do with audio).
Those counterpoints are good. And he doesn’t much mock the audiophiles. High end audio is funny game. Since it involves chasing the limb of the diminishing returns curve, it naturally attracts eccentrics. The kind of people who take joy from the fact that a brake job on their Porsche cost ten times as much is does on a Toyota. For these folks, gold plated TOSLINK cables, $500 ethernet cables, and high fidelity audio grade ethernet switches are the stuff of which happy dreams are made. If that’s how eccentric people enjoy their money, more power to them.
What’s truly sad about this it ruins things for the people who have some sort of balance in their lives because there is point to be gained from climbing the diminishing returns curve in A/V equipment as a means of improving sound quality. The trick is that you don’t have to climb too far to greatly improve your experience.
I’m not involved in high-end audio, but it feels it has overlap with another field I’m tangentially aware of, high-end watch collection. If they’re similar, then the motivation driving participants is less the end goal of a perfect sound system or the perfect watch, but learning about the stuff that’s out there, trading for new used stuff, and generally being part of a community.
Specifically, it’s not that hard to incrementally increase ones collection by starting small, looking for used instances of the stuff you’re interested in, saving some discretionary income, buying up, etc. And for many people, that’s the actual fun part!
And while there’s an element of snake oil[1] and grift, in general both HE audio and watches are physical objects that can be inspected for quality. I.e. even if paying 6 figures or more for a turntable with a cast iron bed[2] is a lot of money in absolute terms, you are getting a physical object that has intrinsic worth, if only to other HE audio enthusiasts.
Having read through a dozen or so posts in that thread I really think the whole thing is a parody, not an actual debate. So much sarcasm and only the craziest possible statements without even any fluff of normal programing discussion that stands in the way.
Nice humor, and at some points even brilliant parody, but it would take a lot to convince me that was real. Too many people at once saying they did specific things that aren’t actually possible…
OMG of course it is. I’m stunned to see people taking it so seriously. Surely it was all ridiculous humour 10+ years ago and it still is now? Follow the thread, read the posts, it’s just the most wonderful piss-take all the way through. I’m prepared to be totally wrong and if this guy genuinely thinks the things he’s saying then OK, but seriously, just read the thread with a smirk and oh dear it’s just hilarious. Not meant to be any more nor any less. Just excellent piss-take fun.
My favorite excerpt from this: “It’s just lazyness on the part of the player developers that they rely on the old methods, I guess they think bits are bits.”
I mean, they aren’t wrong.
They anticipated rowhammer by years ;)
Like many others who went to engineering school in the ‘80s, I appreciate high fidelity audio gear to the point of being a little bit of an audiophile. Also, like many people who knew electrical engineers from that era, the amount of bad advice that you can find in “audiophile” circles is astounding. And don’t get me wrong, I’ll tell you to your face that popular music sounds better today on vinyl rather than CD. What I won’t do is try to convince you that the reason for this is because digital is an inferior technology incapable of meeting audiophile standards because that’s not even close to the reason.
(What’s the real reason? :-))
As far as your ears are concerned, louder sounds better. CD and digital have far greater fidelity to the original wave form than vinyl does. Vinyl also has less dynamic range, the difference between the loudest sound that you can record and the softest sound that you can record. When music was regularly sold in both formats in the late eighties, the mastering process for popular music was the same for both formats. In the early ‘90s people discovered that the common mastering process for CD and LP was leaving a lot of dynamic range unused when the music was pressed onto CDs. As the vinyl LP was falling out of favor, people discovered that you if you reduce the dynamic range of the music you can master it at a higher sound level or loudness on CD without generating distortion. To people listening to the music these louder pressings initially sound better. Every rock and pop artist on the planet wants their song to sound the best when played alongside other music. So artist started asking for this as part of the mastering process. The problem with doing this is that everything begins to get a wall of sound feeling to it. By making the soft parts louder and the loud parts softer so you can make the whole thing louder, you take some of the impact that the music would have with its original dynamic range. When vinyl records starting coming back in to favor, the music destined for LP was mastered the way they used to for vinyl back in the ‘80s. If you listen to two versions of a song, one mastered for LP, and the other mastered for CD, the LP will sound better in the first few plays before vinyl’s entropic nature ruins it, because the LP version will have more dynamic range. The same is true of two pressings of Pink Floyd’s “Dark Side of the Moon” if you are comparing an early 1980’s issue CD to a late ’90s CD reissue. This is really only true of Rock, Pop, and R&B. Classical and Jazz were unaffected by the Loudness War because fans of those genres put fidelity highest in their desired traits.
Summarizing: when you say you prefer vinyl over CD, you say that you prefer 80’s style mastering over the overly compressed end-90s+ mastering.
It’s interesting that the extra headroom on CDs sparked the loudness war, instead of resulting in better dynamics. And now that people expect music to have a certain loudness, I guess we can’t go back.
Perhaps one day we could get a new wave of artists mastering their 320kbps mp3s 80s-stlye?
A loud mix makes sense if you’re listening to music in a noisy environment. (On your commute, say.) But I’d rather have the ability to compress the dynamic range at the time of playing, so I can adjust it to suit the environment.
I used to have a decent, though inexpensive, stereo system setup. Back when I would sit down just to listen to music, with no other distractions, like the Internet.
But when was the last time I really sat down to listen to music? For me it is usually in the car, or through a pair of earbuds. Or maybe washing the dishes.
The extra headroom mostly provided the opportunity, alongside the fidelity and lack of physical limitations of a CD: on a vinyl if you try to brickwall you end up with unusable media.
What sparked the loudness war is the usual prisoner’s dilemna, where producers ask for more volume in order to stand out, leading the next producer to do the same, until you end up with tons of compression and no dynamic range left. Radio was a big contributor, as stations tend(ed?) to do peak normalisation[0], so if you leverage dynamic range widely you end up very quiet next to the pieces played before and after.
To an extent it’s already been happening for about a decade: every streaming service does loudness normalisation[1], so by over-compressing you end up with a track that’s no louder than your neighbours, but it clips and sounds dead.
Lots of “legacy” media companies (software developers, production companies, distributors, …) have also been using loudness normalisation for about a decade following the spread of EBU R 128 (a european recommendation for loudness normalisation), for the same reason.
[0] where the highest sound of every track is set to the same level
[1] where the target is a perceived overall loudness for the entire track
That’s me. When I buy rock music on LP, I’m purchasing music mastered to the 1980’s LP standards. I do that because Rock music works well with the 60 dB or so of dynamic range that vinyl LP offers.
It really is quite shocking to take a CD mastered in the early 90s and another in the late 90s-early 2000s and play them at the same volume settings.
This is an excellent explanation. Being able to explain things clearly without hiding behind technical terms like “compression” is a strong indicator to me that you are a true expert in this field.
For drummers and bassists, “compression” is a well-known term, because compressing dynamic range is almost required in order to record them faithfully. The typical gigging bassist will have a compressor pedal in their effects chain for live performance, too.
I do appreciate it when digital releases are mastered in way that preserves dynamic range, and playing it after any typical digital release in the affected genres, it will sound really quiet.
Some bands have demonstrated to me that you can be a loud rock band with dynamic range mostly intact.
I think it’s fun to watch the record spin around. :-)
I listen to 90% of stuff on vinyl, and I have no rational explanation beyond yours as to why I like it more than streaming.
I heard on the radio that Metallica are investing in their own vinyl-pressing plant.
I stream way more than I use my turntable, basically for the same reasons @fs111 mentions. But I definitely prefer vinyl because while streaming is pure consumption, vinyl is participatory. I enjoy handling the vinyl and really taking care of it (cleaning it when I get it/before I play it, taking care of the jacket, etc.). It makes me feel like a caretaker of music that’s important to me - a participant in the process, instead of just a consumer.
On my phone I listen to music. On my turntable I play music.
I like the physicality of it, too, and I also love the actual artifacts, the records and their sleeves and such.
While I can see the appeal most of my music consumption is while working. I would not like getting up constantly to switch records.
Possibly: I’ve read that over time many pop songs get remastered with more and more dynamic range compression. This makes all parts of the song sound similar in loudness, but also removes some musical features (dynamics) and depending on the DRC method (fast attack/decay, slow attack/decay, manual envelope adjustment) can introduce audible distortion.
Older vinyl record and CD releases are from earlier masters. Albeit some records are newly manufactured, so some will be based on newer remasters anyway.
Cannot confirm or deny, I don’t buy or listen to pop :/
This is called the loudness war. This site collects the dynamic range for albums: https://dr.loudness-war.info
In addition to the loudness wars people have been talking about, certain technical restrictions limit what can accurately be recorded on vinyl. This leads to a subtle “sound” that people get used to and prefer. This could be reproduced when mastering for digital audio formats, but people either don’t do that processing or “audiophiles” claim that it gets lost in translation somehow.
TLDR: Bytes are bytes but the linked to post is actually about latency, not data integrity.
I’m disappointment by the arrogance of users on this forum. I’ve never had great luck being %100 confident about the behavior of code without running it, but in this case, reading the actual code posted will show you that using a bad memcpy could have potentially cause a problem with audio quality. The user ran the code on their system and noticed a difference in audio quality. Can you be confident that this code, written by someone who may not be well-versed in C++ and Windows apis, wouldn’t produce different results based on the memcpy implementation used in a specific line?
The code:Lets look at a cleaned up section of the code where the memcpy takes place:
Lower down in the code we also find the following lines:
As far as I can tell these are just extraneous fluff.
Sound card buffers and the need for low latencyNow lets take a moment to understand what is happening here. The programmer is using the windows
IAudioRenderClient
. This allows the user to send little snippets of sound to the sound card by copying them into a buffer provided by the windows kernel. The soundcard in question (we don’t know what the specifications of his system were) probably has two buffers:The source buffer: which it reads in from a serial connection.
The sink: a loopbuffer which it reads continuously into a digital to analog conversion mechanism. This loop buffer is fed by firmware from the source buffer.
If new messages to the source buffer don’t come in fast enough, the loop buffer will restart the loop. You’ll hear a clicking noise every time the loop restarts before a new source buffer comes in. You have almost certainly heard this clicking at some point in your lives, it is a rather hard problem to solve while still allowing for low latency sound when playing video games or musical instruments. https://www.youtube.com/watch?v=5m8GoJeqras
Going back to the code:So in this loop looks like this:
hNeedDataEvent
, which gets triggered only after the windows kernel audio buffer is exhausted.Now looking at the windows docs, the programmer is doing this wrong:
“The client is responsible for writing a sufficient amount of data to the buffer to prevent glitches from occurring in the audio stream. For more information about buffering requirements, see IAudioClient::Initialize.
After obtaining a data packet by calling GetBuffer, the client fills the packet with rendering data and issues the packet to the audio engine by calling the IAudioRenderClient::ReleaseBuffer method.”
This is not their fault, the docs for
ReleaseBuffer
are also misleading:https://learn.microsoft.com/en-us/windows/win32/api/audioclient/nf-audioclient-iaudiorenderclient-releasebuffer
Only later in the docs do we see:
“ Clients should avoid excessive delays between the GetBuffer call that acquires a buffer and the ReleaseBuffer call that releases the buffer. The implementation of the audio engine assumes that the GetBuffer call and the corresponding ReleaseBuffer call occur within the same buffer-processing period. Clients that delay releasing a buffer for more than one period risk losing sample data. “
I think the code should look like:
I’m about 30% sure that the call to
WaitForSingleObject(hNeedDataEvent, INFINITE);
is totally extraneous.Now there is also another problem with the Windows API. It does not match buffer sizes between the kernel buffer and the sound card buffer. This buffer size misalignment could mean that the buffer size on the sound card is larger than
nBytesThisPass
. That means that in order to send the source buffer to the sound card, it is necessary to go through the loop multiple times. All of a sudden, we’re not doing just 1 memcpy but multiple, and since we’ve already exhausted the audio data sent to the card by waiting forhNeedDataEvent
, this has to happen extremely fast in order to not have glitches. Audio is being sampled at 48 kHz, or once every 0.02ms. If these memcopies and memory allocations and everything else take longer than about 0.01ms (since we have to account for data transfer to the sound card as well) we will get glitching. It seems quite likely that even a tiny amount of extra time in the memcpy call could end up causing the glitching to get worse.Edit: I was wondering myself how it is that two memcpy implementations could have a significant speed difference, but then I noticed his further comments:
“Here are the optimisation settings for x64 Intel processor, /MT, /fp:fast, and the /O2 /Ob2 /Oi /Ot /Oy switches made the difference
c/c++ section
/Zi /nologo /W3 /WX- /O2 /Ob2 /Oi /Ot /Oy /GL /D “WIN32” /D “NDEBUG” /D “_CONSOLE” /Gm- /EHsc /MT /GS /fp:fast /Zc:wchar_t /Zc:forScope /Fp”x64\Release\playerextreme.pch” /Fa”x64\Release" /Fo”x64\Release" /Fd”x64\Release\vc100.pdb” /Gd /errorReport:queue
-DUNICODE -D_UNICODE /Og /favor:INTEL64
linker section
/OUT:“C:\vs2010\playerextreme - mem - one file\source\x64\Release\playerextreme.exe” /INCREMENTAL /NOLOGO “libad64.lib” “libacof64.lib” “libacof64o.lib” “kernel32.lib” “user32.lib” “gdi32.lib” “winspool.lib” “comdlg32.lib” “advapi32.lib” “shell32.lib” “ole32.lib” “oleaut32.lib” “uuid.lib” “odbc32.lib” “odbccp32.lib” /MANIFEST /ManifestFile:“x64\Release\playerextreme.exe.intermediate.manifest” /ALLOWISOLATION /MANIFESTUAC:“level=‘asInvoker’ uiAccess=‘false’” /DEBUG /PDB:“C:\vs2010\playerextreme - mem - one file\source\x64\Release\playerextreme.pdb” /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /PGD:“C:\vs2010\playerextreme - mem - one file\source\x64\Release\playerextreme.pgd” /LTCG /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /ERRORREPORT:QUEUE
I also made it a single thread which again made a small improvement, also fixed the sizes of buffers etc so that numbers were used in the code. This means that different code would need to be used for different sampling rates. Also worked out a way to play gapless (next release) where the wav data is just appended to the buffer. Means that it won’t be able to play different sample rates gapless, but it suits my purposes.”
Basically, I think that he was moving from a win32 memcpy (copying 32 bits for every loop) to a 64 bit c++ memcpy copying 128? bit vectors per loop if I recall correctly. That should end up being like a 5x speed up.
I think the most important take away is actually that windows docs on
ReleaseBuffer
have been unclear. While at the very bottom of theReleaseBuffer
docs we find:Clients should avoid excessive delays between the GetBuffer call that acquires a buffer and the ReleaseBuffer call that releases the buffer. The implementation of the audio engine assumes that the GetBuffer call and the corresponding ReleaseBuffer call occur within the same buffer-processing period. Clients that delay releasing a buffer for more than one period risk losing sample data.
the top says
The ReleaseBuffer method releases the buffer space acquired in the previous call to the IAudioRenderClient::GetBuffer method.
which makes it sound like it’s just a memory free.Thank you for writing all this up. I’ve been bugged by the feeling of people punching down in this thread along with the fact the poster is actually talking about implementing an audio playback engine on which case these small differences can manifest as semi-perceptible aural differences due to latency, jitter, and scheduling.
This is a programmer community, what’d you expect? Arrogance almost comes with the territory.
Things like rowhammer/ECCploit, van Eck phreaking and other such side channel attacks show that yes, we live in a physical world and this actually has an impact on the pristine and pure thought-stuff that we manipulate, and still people here refuse to believe that a different algorithm with different CPU branching patterns and RAM access patterns might have observable effects on audio output. I suppose we’re so used to abstracting all that stuff away that we get offended by the very idea that differences in how software utilizes hardware components inside the computer can actually have an impact on “unrelated” hardware.
I don’t understand. A sound card has a fixed sample rate, as long as the engine is filling the buffer fast enough*, it should be equivalent. If it isn’t, there will be obvious pops and clicks when the buffer underruns. If it is, the output is equivalent. I don’t see how some subtle sound quality difference could arise.
A fancy sound card has a sample rate of 48khz, a sample is 3 bytes, that’s copying around 144 kbytes per second. On a memory bus. Even an 8 bit naive implementation on a gateway 2000 could do that.
If you read the code, the author is literally waiting for the buffer to become exhausted before sending the data, of course that could lead to subtle and not so subtle sound quality differences.
The sound card sample clock is not attached to the OS at all, as long as the buffer isn’t underrunning and the samples are the same, the result is the same.
The things you describe result in defects noticeable by anyone though. That’s different from the allegedly minor differences you’d get due to using ‘gold plated cables’, which is obviously the kind of thing being referenced here. So although you very nicely explain how a memcpy implementation could influence sound, I don’t think it excuses the original inspiration for the OP.
But then there’s this line in the first post linked to:
Um … doesn’t
new()
basically callmalloc()
under the hood? How can usingnew()
“sound better” than usingmalloc()
?That is a mystery to me. I really don’t know what is going on there, but it appears from the extremely low quality of the code, that we shouldn’t really trust the author in telling us which exact changes made a difference. My point was to point out that in this specific broken bit of code, the speed of malloc could potentially be important and thus the people making fun of the concept were being presumptive.
I think if the author of the code had demonstrated that they can tell the difference in a true double blind test, or on their testing methodology for determining that there existed a difference, people would take it more seriously.
As is, its just that the author claims it sounds different and we’re expected to believe. IMO, more likely, the author is hearing a difference because they expect to, is making a fool of themselves in public by posting about it and blaming libc, and now all the townspeople are pointing and laughing.
I was discussing this with a friend the other day, and as he said, “how did the fools get their money in the first place?”
Having known a bunch of person who tended to fall into these kind of pricey rabbit hole, my surprising conclusion is that they are not particularly rich people. They just are very, very bad at making financial decision. You can expect a lot of these people to have loaded credit credit cards, a remortgage, and even a few personal bankruptcy to their name.
In one particular case, the guy had an undiagnosed ADHD and was constantly making impulse purchase for new Magic the Gathering card, despite being barely able to afford his rent.
I’m guessing audiophile are the very same type of people.
You can be an outstanding dentist but an atrocious engineer.
They inherited it from parents or grandparents who worked hard and were lucky, or did something vile and were lucky; occasionally all three.
This is even better than gold-plated optical connectors.
You will want your memcpy() to stay in L1, L2 or L3 cache if your RAM doesn’t have gold plated (ENIG) contacts.
Wait no… chiplet cache in CPUs! Do they use gold or aluminium bond wires?
Oh wow, the files are still available, is anybody else morbidly curious? https://drive.google.com/folderview?id=0B3vvH5WBfg8PR0lHZzR6d012OXM&usp=sharing
Also from reading the thread it sounds like the guy didn’t know how to program before starting this project and is entirely self taught. Sure he picked up some weird ideas but I gotta admire the work!
An interesting…if not rebuttal, counter-point to just mocking the audiophiles: https://fediscience.org/@marcbrooker/110017159893728862
For those not wanting to click on the link, the possible explanation was that different memcpy implementations use different pipelines and so may rigger different power states. This is possibly more plausible than you might at first imagine. I had a computer some years ago where I could hear SSE-heavy workloads because the EM leakage from the CPU hitting those pipelines was picked up in the sound card, so an SSE-based memcpy would give worse sound than one using integer pipelines (possibly - I doubt SSE loads and stores were actually audible on that machine). This is why sensible audiophiles that I’ve known recommend a cheap USB audio adaptor over any expensive sound card for analogue output: simply moving the DAC a metre away from the electrical noise does more good than anything else. Outputting optical audio to an external decoder and amplifier can be better because then your audio system can be completely electrically isolated from the computer (there are a lot of chips in a typical computer case that are little radio broadcasters and any nearby wire may accidentally act as an antenna receiving the signal).
The other plausible explanation that I considered was jitter. Probably not the case 10 years ago, but 20 years ago the ring buffers that sound cards could DMA from were quite small and you could quite clearly hear when the CPU wasn’t filling them fast enough. A slow memcpy being preempted at the wrong time may well have caused artefacts. If you can fill the buffer easily within a single scheduling quantum and yield then you’ll be priority boosted the next time and so you’ll easily keep up playing music. If you’re occasionally being preempted mid-copy, then you’ll suffer from weird artefacts as the card overtakes the CPU and plays a few ms of the previous sample instead of the new one.
I don’t think either of these were the case here though, and this is why sensible audiophiles do random double-blind tests on themselves: play the thing n times, where half are with one version, half with the other. If you don’t get a statistically significant number of guesses as which is the better one landing on the same implementation, you’re just kidding yourself (which is very easy to do with audio).
Those counterpoints are good. And he doesn’t much mock the audiophiles. High end audio is funny game. Since it involves chasing the limb of the diminishing returns curve, it naturally attracts eccentrics. The kind of people who take joy from the fact that a brake job on their Porsche cost ten times as much is does on a Toyota. For these folks, gold plated TOSLINK cables, $500 ethernet cables, and high fidelity audio grade ethernet switches are the stuff of which happy dreams are made. If that’s how eccentric people enjoy their money, more power to them.
What’s truly sad about this it ruins things for the people who have some sort of balance in their lives because there is point to be gained from climbing the diminishing returns curve in A/V equipment as a means of improving sound quality. The trick is that you don’t have to climb too far to greatly improve your experience.
I’m not involved in high-end audio, but it feels it has overlap with another field I’m tangentially aware of, high-end watch collection. If they’re similar, then the motivation driving participants is less the end goal of a perfect sound system or the perfect watch, but learning about the stuff that’s out there, trading for new used stuff, and generally being part of a community.
Specifically, it’s not that hard to incrementally increase ones collection by starting small, looking for used instances of the stuff you’re interested in, saving some discretionary income, buying up, etc. And for many people, that’s the actual fun part!
And while there’s an element of snake oil[1] and grift, in general both HE audio and watches are physical objects that can be inspected for quality. I.e. even if paying 6 figures or more for a turntable with a cast iron bed[2] is a lot of money in absolute terms, you are getting a physical object that has intrinsic worth, if only to other HE audio enthusiasts.
[1] I tried to find some examples and ran across this piece about expensive audio ethernet cables: https://arstechnica.com/gadgets/2015/07/gallery-we-tear-apart-a-340-audiophile-ethernet-cable-and-look-inside/. While excessively priced they are not outright fraudulent.
[2] price source: https://harpers.org/archive/2022/12/corner-club-cathedral-cocoon-audiophilia-and-its-discontents/
Having read through a dozen or so posts in that thread I really think the whole thing is a parody, not an actual debate. So much sarcasm and only the craziest possible statements without even any fluff of normal programing discussion that stands in the way.
Nice humor, and at some points even brilliant parody, but it would take a lot to convince me that was real. Too many people at once saying they did specific things that aren’t actually possible…
OMG of course it is. I’m stunned to see people taking it so seriously. Surely it was all ridiculous humour 10+ years ago and it still is now? Follow the thread, read the posts, it’s just the most wonderful piss-take all the way through. I’m prepared to be totally wrong and if this guy genuinely thinks the things he’s saying then OK, but seriously, just read the thread with a smirk and oh dear it’s just hilarious. Not meant to be any more nor any less. Just excellent piss-take fun.
You can download the final program here: https://drive.google.com/drive/folders/0B3vvH5WBfg8PR0lHZzR6d012OXM
Code can be parody too.
I read his other posts on the forum to confirm he learned programing just for this project. He’s not a parody.
That’s amazing. And I hadn’t seen it!
Thanks for a good laugh, and for showing me several people I need to follow in the fediverse.
I love that this was a Mastodon/Fediverse link; I originally saw it from @DanCrossNYC.