1. 65
  1.  

  2. 38

    Since scientific code is all the rage on the front-page today, let me tell you a funny story about how, many, many years ago, in a former lifetime, back when I was a very clueless undergrad, I played a stupid game with the decimals of pi and therefore won a very stupid prize. The decimals of pi are involved.

    Many years ago, when nobody had yet figured out that I’d cleverly infiltrated an EE program despite being a computer nerd, I was working on some number-crunching code. The code in question attempted to tl;dr figure out how a weird piece of hardware worked when exposed to some high-frequency electrical signals, and generate an equivalent SPICE model that behaved in the same way when exposed to any high-frequency signal, except it wouldn’t take a few days to simulate.

    I had a reference Matlab implementation that worked, slowly, on “coarse” models. I was trying to write a fast program that worked quickly on “fine” models. I was armed, of course, with said Matlab code, and some sample code from a very fast linear equation solver than happened to solve a similar enough class of problems. And right at the top of the main header was a #define MATH_PI 3.14159265.

    Now I was going through one of those wannabe nerd phases where I was memorising the digits of pi so of course I couldn’t just let that be and fixed it to say `#define MATH_PI 3.1415926535 instead. (Or something like that. I don’t recall how many digits were involved, but I am certain I added a few more).

    The first runs were okay but as I worked on larger and larger models, they started to slowly drift away from the numbers that the reference model gave (and I mean slowly – on large enough models, I’d start the program on Monday afternoon and have the results on Tuesday morning). Not by much. The errors were no more than 5%-10% or something like that on the larger models but it was enough for me to notice it, and they were perfectly reproducible. And the reference model was definitely good, it had been validated against experimental results with real-life devices, so there was no doubt there. My program had a bug.

    Trouble was, this was the third or fourth linear solver I was trying. Not because I liked playing with linear equation solvers but because the matrices were very ill-conditioned so a lot of solvers choked on them (earlier attempts at fixing the “it takes days to run the sim” by employing these new fancy GPGPU things had quickly ran into the problem that iterative solvers wouldn’t ever converge). I’d also done a bunch of other things that could upset the numbers – I was running with a custom-compiled, super-optimised BLAS, for example, so I wasn’t “replicating” the original Matlab environment precisely. So there was always the possibility that the program really couldn’t get me the correct results.

    But I knew this couldn’t be the case. Not only did I have good reasons to suspect this particular solver (I don’t know if it was UMFPACK but it may have been UMFPACK? Or maybe MUMPS?) should have worked, but the sample code I had dealt just fine with systems that were even worse.

    I pored over the code for days, I suspected all sorts of things – maybe I wasn’t reading the data correctly, maybe I was overwriting memory, maybe I was printing out the results incorrectly, maybe my fancy BLAS and LAPACK versions had the wrong compile flags so of course they didn’t give the same results that Matlab’s BLAS code gave, maybe…

    …it was days before I figured out maybe adding two stupid extra digits to a constant that would then be multiplied by anything from 3e9 to 6e10 (frequencies of 3 to 60 GHz, \omega is 2 * \pi that) would have been enough to knock off my results ever so slightly.

    1. 11

      That’s fantastic, thank you for that.

      For my understanding, may I ask whether the results with more digits were more or less real-world accurate? You mention that the reference model was experimentally validated and your results were 5-10% off.

      Was it a case of “your results were different to experimentally-correct reference by more than the experimental error, and thus ‘wrong’” or “your results caused you concern by being different to reference (due to more accuracy), but were actually ‘more right’”?

      If the former, I’m not understanding how increased accuracy introduced deviation from the “right answer”.

      1. 17

        (Sorry I kept editing this answer: this was a very long time ago and some of my memories are foggy, so I had to look over my old notes. I lost some of them, and some are less detailed than they should be, so it took me a while to fill in all the blanks…)

        You have no idea how tough this question is :-).

        Lemme explain the context a bit – as in, explain what the problem being solved was and how. At high enough frequencies, complex enough structures (even straightforward ones, like conductors!) behave in all sorts of “odd” ways. Treating them as “circuit” elements, the way you know them from Kirchhoff’s equations, doesn’t work. You have to run “full” EM field simulations on them. That, in turn, poses a problem: when you’re designing, say, a radio receiver, you’re designing a circuit. You want to be able to reason, and simulate, in terms of inductors and resistors and transistors. Furthermore, it takes a very long time to run a “full” EM field simulation on a tiny structure, like an interconnect or a resistor (hours, or even days). Discretising an entire IC and running a field simulator on it was hopeless. You could isolate the “interesting” modules away, of course – simulate them separately from the rest of the circuit, and plug the results of the whole field sim in SPICE – but then it would take days to do a SPICE test run instead of minutes and nobody wanted that.

        So this program basically took a tiny, problematic structure, like an interconnect or a planar inductor, it ran a number of simulations in order to check how that structure behaved when exposed to signals of various frequencies over a very wide range, and then generated an equivalent SPICE model that behaved as closely as possible to the original. It wasn’t 100% identical but it was close enough for engineering purposes, and it was something that you could plug into your circuit model. The idea was that you (or the semiconductor fab) would run this for all the standard blocks non-standard blocks, duh!, once, which would take a few weeks, and then you’d just give the equivalent models to the people doing the circuit design. Once the design was reasonably close to a final form, you would run it against full field simulations as needed, too. It was expected that the results you got in this manner wouldn’t match real-life data. The point wasn’t to get something that was 100% accurate, but to get something accurate enough to work with.

        This whole thing had two parts then. So the first part was generating an equivalent SPICE model based on a series of simulations. The other part was doing as few simulations as possible. When I’m saying “an equivalent SPICE model”, I mean “a tiny circuit that has the same frequency characteristics as the original”. Trouble is, the frequency characteristics had to cover a huge range (3-60 GHz). If you run the sim even at 100 MHz intervals (so 3.1, 3.2, 3.3 GHz and so on) you still had to run 570 simulations – which could take days – and then you’d figure out that the frequency response was mostly flat between 3 and 57 GHz and 57.5 and 60 GHz, whereas the 100 MHz interval was too coarse between 57 and 57.5.

        So basically this whole thing tried to run as few simulations as possible while still yielding useful results, interpolate the frequency response, and generate a SPICE circuit that had the same frequency response.

        The reference model did all of that stuff but it used a different scheme for “run as few simulations as possible”. It produced pretty accurate results but we wanted to come up with a way to run even fewer simulations. As in, where the reference model gave you good results with 2-300 test runs, the new one could manage with 40-50.

        So now for your question, let me start with the easy answer: they were definitely ‘wrong’, as in, the reference model matched experimental data more closely, and that’s what got my attention (edit: but see below, that was purely accidental). The reference model itself wasn’t overlapping the experimental curve 100% of the time, which was of course to be expected because it was interpolating from simulation data after all, but it was closer to real-life measurements.

        That being said, while I expected some differences between the two models, the two models should have agreed on certain points. For example, while the interpolation curves could be different, the frequency characteristics should have been identical in the points where you ran the simulations. By picking the test problems carefully, I could coax the new model into running simulations at the same frequency points as the other model would use for some of its runs, and I could see they didn’t agree.

        (Edit: it’s worth pointing out that the “disagreement” observed in this manner was completely accidental in terms of physical reality. Adding two more digits to pi didn’t distort the physical reality one bit, it was still the same pi after all :). The values that the two models obtained by running a full simulation for the same point were virtually identical, they differed by very tiny values if at all, i.e. they differed at the 4th or 5th decimal. However, these were 20-30 sample points that covered a 30-60 GHz interval, so the interpolated curves could be significantly more different. The fact that it was further from real-life experiments was just happenstance, it could’ve been closer, depending on discretisation parameters, for example. But the two models used the same interpolation method. If they ran the full simulations for the same frequencies – i.e. if they extrapolated values from the same set of simulations – then they should have obtained the same curves.

        Had the two models ran simulations at “every point” from the frequency range, they would have both overlapped the experimentally-determined frequency characteristic very closely, with barely any difference between them at all, and the interpolated curves would have been pretty much identical, as expected. But the whole point of this was to not run them at every point.)

        More generally: the fancy name “reference model” hides a very mundane reality. The underlying basis of both models was the same – Maxwell’s equations, a specific discretisation scheme, and a way to couple the 3D + time space of the “EM field” world to the purely topological space of a circuit (i.e. where it doesn’t matter where each element is placed, all that matters is how the elements are connected). They ran the same equations on the same numbers. The reference model had been validated on real-life devices because we wanted it to have real-life applications and we wanted to be confident about it but early on, it had also been validated under some simplified conditions, on simple structures, where Maxwell’s equations have analytical solutions that you can work out using nothing but pen, paper, and more calculus than I could recall from my first year. So even without experimental data, I had very good reasons to suspect that the reference model was sound, and that if my results didn’t agree with its results, mine had a bug.

        There is, of course, the related problem: the reference model may have been wrong. That is absolutely correct, and in fact this is one of the reasons why testing this kind of software is such a weird thing. When you develop application software according to a spec, the spec is authoritative. When you do a test run, you check against the spec. If your program doesn’t do what the spec says it does, then it’s wrong. Sometimes testing can reveal that the spec can be improved, so the spec gets amended.

        This isn’t how the process of writing most scientific code works. I mean, yes, the first time around, when you get unexpected results, you check your program for bugs. But it routinely turns out that it’s the model that needs amending, not the program. It’s not at all like Agile though and I can’t think of any good analogy – the best way to describe it is that you write the program in order to test the spec. You don’t know if the spec (i.e. the model, let alone your algorithm to implement the model, or the program that implements the algorithm) is correct. Most of the time, by the time you get to the software writing part, you have very good mathematical/physical reasons to believe that the model is good but you do sometimes end up throwing it away.

    2. 14

      As a good rule of thumb, I’d always define constants with as many significant digits as my machine number can express. It’s no coincidence that the JPL is using π with 16 significant digits (because it corresponds to what float64 (i.e. double) can roughly express, which cannot be stated precisely due to subnormal numbers and other factors):

      The context given by Dr. Rayman, which correctly and elegantly shows that you don’t need many significant digits for pi to calculate even astronomically large things, namely misses one crucial aspect: Most calculations are iterative in nature. Calculating the circumference is just one multiplication, but if you do a lot, inaccuracies can quickly pile up and you wouldn’t want to have any more imprecision than what your machine number system already imposes on you on each operation. If you’re interested in more and want to see pathological examples, check out chapter 2.4 of my thesis on this topic.

      And who knows if we, some day, make the move to float128 oder float256 (even though I’d favour a move to posits (see this for an introduction))? Of course, C code would not be affected, because double will always be float64, same with Java and other languages. But if we’re talking about numerical code where you are somehow defining π for some reason, I’d better make sure and just go all out on it. Even if you specify 200 digits, there will be no slowdown of any kind as the compiler will just take as much as it needs and convert it to the internal machine number representation at compile-time. Even when parsing a long constant at runtime, discarding superfluous digits is very easy, and the added space in the binary/script is negligible considering the massive amounts of bloat we pile on in other places.

      1. 3

        I wonder how big the error would be if we assumed Pi to be exactly 3

        1. 1

          Okay, so on duck duck go: “pi * 25000000000” results in 78539816339.7 And when setting pi to 3 we get 75000000000

          So this is a pretty big error: 3539816339.699997

          Did I calculate this correctly?

          1. 4

            You calculated correctly, but the answer to “how big the error would be” is it depends :). There are two ways to look at the error of a measurement or a computation. The first one is the one you tried above – the absolute magnitude – which tells you half the picture.

            The other half is the relative error. The difference between 3 and 3.141592 is 0.141592 or about 4.5%. A 4.5% error is good enough for some things (e.g. weighing scales at the farmers market) and really bad for some other things (e.g. high-precision scales for the chemical industry).

            Both of these tell you useful things – you generally want to keep both in mind when making any kind of assessment.

            (Edit: oh yeah, one other interesting thing. Relative error is adimensional, it’s just a percentage. Absolute error has whatever unit you’re measuring, and sometimes that tells you a bit about how big an error is in physical terms. In your case, if we’re talking, say, 3 539 816 339 molecules of water, that’s a tiny fraction of a water drop. If we’re talking 3 539 816 339 grains of rice, that’s about 410,000 cups of rice which is, like, a lot of rice).

            1. 2

              Ah! That’s good to know thanks :) So generally I would want to additionally examine the relative error and see how it relates to the absolute magnitude between two values.

        2. 2

          Well there you have it, the precise accuracy that’s “good enough for government work!”

          1. 1

            Well there you have it, the precise accuracy that’s “good enough for government work!”

            1. 3

              I think this is a dupe

              1. 8

                Well there you have it, the precise accuracy that’s “good enough for government work!”

            2. 1

              If I see correctly, that’s the precision a double can hold.

              1. 1

                I really liked the article. It gives some good intuition on why you don’t need that many digits.

                Wouldn’t another argument be that the significant digits in a calculation should match (or at least be close to) the least precise measurement in the model? If that’s the case, how many significant digits does our measurement of the distance for voyager 1 have? I would doubt that it is close to 17 digits, so there is really no need for more than 16 decimal places for pi. By the way, the blog post gives the distance to 3 significance digits, but I suspect that they have better measurements for it.

                1. 0

                  ALL OF THEM !