@majke thank you for a very interesting article and analysis!

I’m trying to reproduce and follow your analysis on my mac and would appreciate some help. On my mac I can get the code to compile by removing sections 1 and 2 and my removing the MAP_POPULATE and MAP_LOCKED flags. Is it OK to do so? (revised code)

When I run this on my mac my mode is 0 ns with 1000 ns following and an occasional 3000 ns or higher. This pattern is much less smooth than yours and I wonder why.

I’m trying to follow your analysis. As fas as I can follow the loop duration variable is redundant, since you have a timestamp and the loop duration merely stores the diff to the last timestamp. So you have point events with their times. You’ve tried to smoothen these point events by convolving a triangular window to them well, doing something slightly unorthodox, which amounts to literal linear interpolation between points.

The linear interpolation generates a lot of high frequencies in the Fourier transform because of the corners.

In neuroscience (see, I knew that esoteric training would come in useful someday) we have a similar data set that comes from neuronal firing. A popular way to perform Fourier analysis on them is to convolve the train of deltas with gaussians. This is like dropping a cloth over a set of spikes - you get pointy heads where the spikes are and a graceful tail where they are not. This leads to smooth curves which behave more politely in the frequency domain.

There is a hypothesis behind doing this in neuroscience (neurons act in concert, with a slight jitter between them, blah blah) but basically smoothing delta trains is a dirty deed most practical scientists will let you do by looking the other way.

Since I can’t regenerate the data I’m requesting you to retry your analysis with gaussian smoothing of your delta train and/or give me some pointers as to how to get proper results out of my mac.

Thank you very kindly!

-Kaushik

PS. Also, if you are not inclined to help figure out how it would work on a mac, if you could send me your data set I could try out the gaussian convolution and send the results back to you.

~/2018-11-memory-refresh$ cat example-data.csv |python3 ./analyze-dram.py
[*] Input data: min=111 avg=176 med=167 max=11909 items=131072
[*] Cutoff range 212-inf
[ ] 127893 items below cutoff, 0 items above cutoff, 3179 items non-zero
[*] Running FFT
[*] Top frequency above 2kHz below 350kHz has magnitude of 7590
[+] Top frequency spikes above 2kHz are at:
127884Hz 4544
127927Hz 5295
255812Hz 7590
383739Hz 5799
511624Hz 6932
639551Hz 5911
767436Hz 6001
895363Hz 5682
1023248Hz 4774
1151175Hz 5107
1406987Hz 4263

(a) trying to run it on mac: why not, but the power saving settings may introduce even more jitter. Also - is there a reliable fast clock_gettime(CLOCK_MONOTONIC) on mac these days?

(b) I’m very much not an expert on DSP and signal analysis. Please do explain why and how to use the suggested gaussian smoothing.

My interpretation of your data is in this notebook

In brief, I did a simple time domain analysis first by plotting the interval histogram. The histogram shows a prominent periodicity at 16.7 us with some slower components.

When I do a frequency domain analysis by smoothing the delta train with a gaussian I see this prominent period with higher harmonics. I’ve forgotten how to interpret the higher harmonics, but the base frequency is consistent with the 16.7 us periodicity.

This is roughly twice the 7.8 us you report in your article.

Treating this whole thing as a black box, I’d say it typically takes 16.7us to complete one cycle of operations though there are instances when things take a lot longer, though this is less common by a factor of about 100.

It looks like you’re interpreting the data differently from @majke; one analysis is on duration of each event (167ns); the other is on the delta between events (7818ns).

Every cycle only one time stamp is dropped (rt1 = realtime_now();). There is no differentiation made between the duration of the event and the delta between them. It’s not a square wave with a duty cycle.

Well. I don’t care how long the long stall was. All I care is about the gap between long stalls. I think if you make a histogram of durations between long-stalls (when long is avg*1.4 or higher), then indeed I think you will find the 7.8us period with simple histogram. Having said that, this will depend on the noise in data. I’ve had some runs of the over which simpler analysis failed.

To diminish the harmonics on the FFT ouput, instead of fitting your data to a window (applying a rectangular filter, as you noted) you can apply a Hamming window. This should have fewer harmonics (though having some will be a necessary evil).

@majke thank you for a very interesting article and analysis!

I’m trying to reproduce and follow your analysis on my mac and would appreciate some help. On my mac I can get the code to compile by removing sections 1 and 2 and my removing the

`MAP_POPULATE`

and`MAP_LOCKED`

flags. Is it OK to do so? (revised code)When I run this on my mac my mode is 0 ns with 1000 ns following and an occasional 3000 ns or higher. This pattern is much less smooth than yours and I wonder why.

I’m trying to follow your analysis. As fas as I can follow the loop duration variable is redundant, since you have a timestamp and the loop duration merely stores the diff to the last timestamp. So you have point events with their times. You’ve tried to smoothen these point events by

~~convolving a triangular window to them~~well, doing something slightly unorthodox, which amounts to literal linear interpolation between points.The linear interpolation generates a lot of high frequencies in the Fourier transform because of the corners.

In neuroscience (see, I knew that esoteric training would come in useful someday) we have a similar data set that comes from neuronal firing. A popular way to perform Fourier analysis on them is to convolve the train of deltas with gaussians. This is like dropping a cloth over a set of spikes - you get pointy heads where the spikes are and a graceful tail where they are not. This leads to smooth curves which behave more politely in the frequency domain.

There is a hypothesis behind doing this in neuroscience (neurons act in concert, with a slight jitter between them, blah blah) but basically smoothing delta trains is a dirty deed most practical scientists will let you do by looking the other way.

Since I can’t regenerate the data I’m requesting you to retry your analysis with gaussian smoothing of your delta train and/or give me some pointers as to how to get proper results out of my mac.

Thank you very kindly!

-Kaushik

PS. Also, if you are not inclined to help figure out how it would work on a mac, if you could send me your data set I could try out the gaussian convolution and send the results back to you.

Raw data at your service: https://raw.githubusercontent.com/cloudflare/cloudflare-blog/master/2018-11-memory-refresh/example-data.csv

(a) trying to run it on mac: why not, but the power saving settings may introduce even more jitter. Also - is there a reliable fast clock_gettime(CLOCK_MONOTONIC) on mac these days?

(b) I’m very much not an expert on DSP and signal analysis. Please do explain why and how to use the suggested gaussian smoothing.

@majke very cool, thanks!

My interpretation of your data is in this notebook

In brief, I did a simple time domain analysis first by plotting the interval histogram. The histogram shows a prominent periodicity at 16.7 us with some slower components.

When I do a frequency domain analysis by smoothing the delta train with a gaussian I see this prominent period with higher harmonics. I’ve forgotten how to interpret the higher harmonics, but the base frequency is consistent with the 16.7 us periodicity.

This is roughly twice the 7.8 us you report in your article.

Treating this whole thing as a black box, I’d say it typically takes 16.7us to complete one cycle of operations though there are instances when things take a lot longer, though this is less common by a factor of about 100.

Tag! You’re it :)

It looks like you’re interpreting the data differently from @majke; one analysis is on duration of each event (167ns); the other is on the delta between events (7818ns).

Every cycle only one time stamp is dropped (

`rt1 = realtime_now();`

). There is no differentiation made between the duration of the event and the delta between them. It’s not a square wave with a duty cycle.Well. I don’t care how long the long stall was. All I care is about the gap between long stalls. I think if you make a histogram of durations between long-stalls (when long is avg*1.4 or higher), then indeed I think you will find the 7.8us period with simple histogram. Having said that, this will depend on the noise in data. I’ve had some runs of the over which simpler analysis failed.

@majke ah very interesting! Thanks again for a very educational article.

Here’s a histogram of “durations between long loop runs”

https://uploads.disquscdn.com/images/4d3d1472115285539d85539d3c84b8f5e8e56821cd2e8d84686c7031b51335b2.png

You can definitely see the spike at 7800ns, but I’m not sure how to extract it algorithmically without cheating.

I’m no expert, but I would assume that ASLR might screw with your results.

This was really interesting! Thanks.

No you can’t, this is Ubuntu-specific.

It doesn’t work on Ubuntu either.

Wait, really? At all?

To diminish the harmonics on the FFT ouput, instead of fitting your data to a window (applying a rectangular filter, as you noted) you can apply a Hamming window. This should have fewer harmonics (though having some will be a necessary evil).