1. 7

Hi all - I’m interested in learning more about codecs and/or signal processing. My background is CS (degree) and mostly spent time in compilers and operating systems.

Does anyone have any thoughts on introductory texts or sites/wikis on codecs and signal processing. I’ve found some books, but a lot of them are either very basic (for broadcasters or streamers) or assume a fair amount of background I don’t have.

I’m happy to read up on anything considered more foundational if needed, I’m just not sure where to start.



  2. 7

    I have done some audio programming, and am studying engineering, so I guess I have some knowledge about it. There are many who are better than me, though. I hope this isn’t too mathematical, but you need to have some grasp on differentiation, integration, complex numbers and linear algebra anyway. Here’s a ‘short’ overview of the basics:

    First of all, you need to know what happens when an analog, continuous signal is converted to digital data and back. The A->D direction is called sampling. The amount of times the data can be read out per second (the sampling rate) and the accuracy (bit depth) are limited for obvious reasons, and this needs to be taken into account.

    Secondly, analysing a signal in the time domain doesn’t yield much interesting information, it’s much more useful to look analyse the frequencies in the signal instead.

    Fourier’s theorem states that every signal can be represented as a sum of (co)sines. Getting the amplitude of a given freqency is done through the Fourier transform (F(omega) = integrate(lambda t: f(t) * e^-omega*j*t, 0, infinity)). It works a bit like the following:

    1. Draw the function on a long ribbon
    2. Twist the ribbon along its longest axis, with an angle proportional to the desired frequency you want the amplitude of (multiplying f(t) by e^-omega*j*t, omega is the pulsation of the desired frequency, i.e. omega = 2pi*f, and j is the imaginary unit. j is used more often than i in engineering.)
    3. Now smash it flat. In the resulting (complex) plane, take the average of all the points (i.e. complex numbers). (This is the integration step.)
    4. The sines will cancel out themselves, except for the one with the desired freqency. The resulting complex number’s magnitude is the amplitude of the sine, and its angle is the sine’s phase.

    (Note: the Fourier transform is also known as the Laplace transform, when substituting omega*j with s (or p, or z, they’re “implicitely” complex variables), and as the Z-transform, when dealing with discrete signals. It’s still basically the same, though, and I’ll be using the terms pretty much interchangably. The Laplace transform is also used when analyzing linear differential equations, which is, under the hood, what we’re doing here anyway. If you really want to understand most/everything, you need to grok the Laplace transform first, and how it’s used to deal with differential equations.)

    Now, doing a Fourier transform (and an inverse afterwards) can be costly, so it’s better to use the information gained from a Fourier transform while writing code that modifies a signal (i.e. amplifies some frequencies while attenuating others, or adding a delay, etc.), and works only (or most of the time) in the time domain. Components like these are often called filters.

    Filters are linear systems (they can be nonlinear as well, but that complicates things). They are best thought of components that scale, add, or delay signals, combined like this. (A z^-1-box is a delay of one sample, the Z-transform of f(t-1) is equal to the Z-transform of f(t), divided by z.)

    If the system is linear, such a diagram can be ‘transformed’ into a bunch of matrix multiplications (A, B, C and D are matrices):

    • state [t+1] = A*state[t] + B*input[t]
    • output[t ] = C*state[t] + D*input[t]

    with state[t] a vector containing the state of the delays at t.

    Analyzing them happens as follows:

    1. Take the Z-transform of the input signal (Z{x(t)}=X(z)) and the output signal (Z{y(t)}=Y(z)).
    2. The proportion between Y and X is a (rational) function in z, the transfer function H(z).
    3. Now find the zeros of the numerator and denominator. The zeros of the latter are called the poles, signals of (or near to) that frequency are amplified. Zeros of the numerator are (boringly) called zeros, and they attenuate signals. These poles and zeros are also related to the eigenvectors and -values of the matrix A.

    However, if the poles are outside of the unit circle, the system is ‘unstable’: the output will grow exponentially (i.e. “explode”). If the pole is complex or negative, the output will oscillate a little (this corresponds to complex eigenvalues, and complex solutions to the characteristic equation of the linear differential equation).

    What most often is done, though, is making filters using some given poles and zeros. Then you just need to perform the steps in reverse direction.

    Finally, codecs simply use that knowledge to throw away uninteresting stuff. (Eg. data is stored in the frequency domain, and very soft sines, or sines outside the audible range are discarded. With images and video, it’s the same thing but in two dimensions.) I don’t know anything specific about them, though, so you should look up some stuff about them yourself.

    Hopefully, this wasn’t too overwhelming :). I suggest reading Yehar’s DSP tutorial for the braindead to get some more information (but it doesn’t become too technical), and you can use the Audio EQ Cookbook if you want to implement some filters. [This is a personal mirror, as the original seems to be down - 509.]

    There’s also a copy of Think DSP lying on my HDD, but I never read it, so I don’t know if it’s any good.

    1. 3

      The amount of times the data can be read out per second (the sampling rate) and the accuracy (bit depth) are limited for obvious reasons

      Interesting post. I wanted to highlight this part where you say it’s limited for “obvious reasons.” It’s probably better to explain that since it might not be obvious to folks trained to think transistors are free, the CPU’s are doing billions of ops a second, and everything is working instantly down to nanosecond scale. “How could such machines not see and process about everything?” I thought. What I learned studying hardware design at a high-level, esp on the tools and processes, was that the digital cells appeared to be asleep a good chunk of the time. From a software guy’s view, it’s like the clock signal comes as a wave, starts lighting them up to do their thing, leaves, and then they’re doing nothing. Whereas, the analog circuits worked non-stop. If it’s a sensor, it’s like the digital circuits kept closing their eyes periodically where they’d miss stuff. The analog circuits never blinked.

      After that, the ADC and DAC tutorials would explain how the system would go from continouous to discrete using the choppers or whatever. My interpretation was the digital cells were grabbing a snapshot of the electrical state as bit-based input kind of like requesting a picture of what a fast-moving database contains. It might even change a bit between cycles. I’m still not sure about that part since I didn’t learn it hands on where I could experiment. So, they’d have to design it to work with whatever its sampling rate/size was. Also, the mixed-signal people told me they’d do some components in analog specifically to take advantage of full-speed, non-blinking, and/or low-energy operation. Especially non-blinking, though, for detecting things like electrical problems that can negatively impact the digital chips. Analog could respond faster, too. Some entire designs like control systems or at least checking systems in safety-critical stuck with analog since the components directly implemented mathematical functions well-understood in terms of signal processing. More stuff could go wrong in a complex, digital chip they’d say. Maybe they just understood the older stuff better, too.

      So, that’s some of what I learned dipping my toes into this stuff. I don’t do hardware development or anything. I did find all of that really enlightening when looking at the ways hardware might fail or be subverted. That the digital stuff was an illusion built on lego-like, analog circuits was pretty mind-blowing. The analog wasn’t dead: it just got tamed into a regular, synthesizable, and manageable form that was then deployed all over the place. Many of the SoC’s still had to have analog components for signal processing and/or power competitiveness, though.

      1. 3

        You’re right, of course. On the other hand, I intended to make it a bit short (even though it didn’t work out as intended). I don’t know much about how CPUs work, though, I’m only in my first year.

        I remember an exercise during maths class in what’s probably the equivalent of middle or early high school, where multiple people were measuring the sea level at certain intervals. To one, the level remained flat, while for the other, it was wildly fluctuating, while to a third person, it was only slightly so, and at a different frequency.

        Because of the reasons you described, the ADC can’t keep up when the signal’s frequency is above half the sampling frequency (i.e. the Nyqvist frequency).

        (Interestingly, this causes the Fourier transform of the signal to be ‘reflected’ at the Nyqvist frequency. There’s a graph that makes this clear, but I can’t find it. Here’s a replacement I quickly hacked together using Inkscape. [Welp, the text is jumping around a little. I’m too tired to fix it.])

        The “changing a bit between cycles” might happen because the conversion doesn’t happen instantaneously, so the value can change during the conversion as well. Or, when converting multiple values that should happen “instantaneously” (such as taking a picture), the last part will be converted a little bit later than the first part, which sounds analogous to screen tearing to me. Then again, I might be wrong.

        P.S. I’ll take “interesting” as a compliment, I just finished my last exam when I wrote that, so I’m a little tired now. Some errors are very probably lurking in my replies.

        1. 3

          I’ll take “interesting” as a complimen

          You were trying to explain some hard concepts. I enjoy reading these summaries since I’m an outsider to these fields. I learn lots of stuff by reading and comparing explanations from both students and veterans. Yeah, it was a compliment for the effort. :)

      2. 3

        Even though I learned about the Fourier transformation in University this video gave me a new intuition: https://www.youtube.com/watch?v=spUNpyF58BY

        1. 2

          Thanks very much for your detailed reply :). The math doesn’t scare me, it’s just very rusty for me since a lot of what I do doesn’t have as much pure math in it.

          I appreciate the time you put into it.

          1. 2

            Speaking specifically of Fourier transform: it behaves well for infinite signals and for whole numbers of periods of strictly periodic signals.

            But in reality the period usually doesn’t divide the finite fragment we have (and also there are different components with different periods). If we ignore this, we effectively multiply the signal by a rectangle function (0… 1 in the interval… 0…) — and Fourier transform converts pointwise multiplication into convolution (an operation similar to blur). Having hard edges is bad, so the rectangle has a rather bad spectrum with large amplitudes pretty far from zero, and it is better to avoid convolution with that — this would mix rather strongly even frequencies very far from each other.

            This is the reason why window functions are used: the signal is multipled by something that goes smoothly to zero at the edges. A good window has a Fourier transform that falls very quickly as you go away from zero, but this usually requires the spectrum to have high intensity on a wide band near zero. This tradeoff means that if you want less leak between vastly different frequencies, you need to mix similar frequencies more. It is also one of the illustrations of the reason why a long recording is needed to separate close frequencies.

          2. 2

            I tried several DSP texts, and (by far) liked this one the most: http://dspguide.com/

            There’s a newer version (https://www.amazon.com/Digital-Signal-Processing-Practical-Scientists/dp/B00KEVJG2S/), but I haven’t read it.

            1. 1

              A colleague has written a great interactive introduction to a lot of DSP topics that might prove useful. You can navigate through it by hitting “next” at the bottom or using the menu at the top. The section on phasors is really fun.

              1. 1

                Despite the dopey cover, Smith’s book is IMO the best introductory text in the subject.

                1. 2

                  Based on the index and introduction, that definitely looks like a good starting point. I will have a look. Thanks!

                2. 1

                  I like Understanding Digital Signal Processing but it doesn’t say anything about compression. I have Introduction to Data Compression which does cover audio compression but I haven’t read that far yet, and the bits I have read have been incredibly dry material and not super useful so I don’t recommend it.

                  Not sure if it helps but a couple of my “aha” moments learning signal processing:

                  If you know projecting a vector onto a set of basis vectors from linear algebra, the DFT is projecting your signal onto a set of vectors with sin waves of different frequencies in them. Coming from a compsci background that was the easiest explanation for me to understand.

                  A lowpass filter is like multiplying the frequency domain version of your signal by [1, 1, 1, ..., 0, 0, 0], where the change from 1 to 0 happens at the cutoff frequency. Multiplication in the frequency domain is equivalent to convolution in the time domain, so sure enough you can take the IDFT of [1, 1, 1, ..., 0, 0, 0] and convolve that with things to get a (crappy) lowpass filter.