1. 4

    For all the shitstorm, I see no actual bug report invalidating results in the open issues. Can anyone please point it out?

    Otherwise it feels all the testards and drive-by team leaders will teach researches better than to open up their code.

    1. 7

      This pretty much matches my impression as well – it’s hard to wonder if any of the people who wrote those “analyses” ever used – let alone wrote – simulation software.

      There’s enough incorrect material in them that writing a rebuttal would be tedious (plus I really ought to stop wasting time on Lobste.rs and get back to work…). But I just want to point out that the rhetoric matches the analysis.

      For example, in this article you see things like:

      “A few people have claimed I don’t understand models, as if Google has no experience with them.”

      unless the author of the article is Google, that’s pretty much irrelevant. Google has hundreds of thousands of employees, I bet quite a few of them don’t understand models. (The author of this article is definitely one of them, by the way).

      Edit: it’s nothing to be ashamed of, at any given moment there’s an infinite amount of things any of us doesn’t understand. But ranting about things one does understand usually gives better results.

      1. 4

        Are you saying that nondeterminism doesn’t matter because the model is supposed to be nondeterministic? Then why are they nevertheless fixing the nondeterminism bugs?

        Do you understand the value of reproducible research, which logically implies making the source code open in this case? Are you aware that Ferguson’s code wasn’t open source for over a decade, and that is part of the problem?

        1. 8

          To answer the nondeterminism part, normally you take a large set of runs and analyze them as a group.

          For example, a monte carlo of a gamma particle tunneling through radiation shielding is inherently non deterministic, however a large number of runs allows you to find the distance necessary for most if not all particles to be stopped safely. Nondeterminism is not an issue if the behaviors involved allow you to derive reproduceable results from the aggregate.

          That said, software bugs like incorrect branching can also be nondeterministic. The degree to how much they affected the simulation is often done through error propagation analysis or comparing the results before and after. Not all bugs are created equal - many can be obviously wrong but not “infect” the results enough to trash them. Still can muddy it tho.

          That’s why yes you fix bugs in nondeterministic models because the model is meant to be the only source of nondeterminism. Bugs have to be reduced out enough to avoid tainting the result set

          1. 3

            To answer the nondeterminism part, normally you take a large set of runs and analyze them as a group.

            If your simulation is meant to be nondeterministic, then good reproduceable science uses a strong PRNG and takes a seed from a configuration. You run it with a fixed set of seeds but can then reproduce the same results by providing the same set of seeds. If it’s not meant to be nondeterministic then it’s a bug and it’s impossible to know its severity without knowing more (but in C++, it can be any kind of undefined behaviour and so the end result can be complete nonsense).

            1. 2

              For example, a monte carlo of a gamma particle tunneling through radiation shielding is inherently non deterministic, however a large number of runs allows you to find the distance necessary for most if not all particles to be stopped safely.

              Sorry if I misunderstand, but surely being careful with when and who is calling your PRNG helps limit this, especially in a single-threaded case?

              Over in game development the issues around non-determinism are a pretty well-known if not always well-solved problem and have been for near two decades, at least.

              1. 8

                (Note: not parent).

                There are processes – I’m not sure if gamma particle tunneling is one of them because solid-state physics isn’t exactly my field, but if I recall things correctly, it is – which are inherently probabilistic. It’s supposed to give different results each time you run it, otherwise it’s not a very useful simulator, and I’m pretty sure I read at least one paper discussing various approaches to getting a useful source of randomness for this sort of software.

                (Edit: there are various ways to cope with this and reconcile the inherent determinism of a machine with the inherent probabilistic character of a physical process, assuming you really do have one that’s probabilistic. It’s not as simple as yeah, we just write simulators that give different results each time you write them.)

                In this particular (i.e. Ferguson’s code) case, the non-determinism (fancy name for a bug. It’s a bug) manifests itself as a constant-ish extra error term – you get curves that have the same shape but don’t coincide exactly, at least not over the duration where the model is likely to give useful results..

                Unfortunately, that’s exactly what you expect to get when doing stochastic process simulation, which is probably is a plausible reason why it wasn’t caught for a long time. This kind of error gets “folded” under the expected variation. That can have two outcomes:

                • If the errors are random, then averaging several runs will indeed cancel them out
                • If the errors are systematic, then averaging several runs will yield an extra (likely time-dependent) error factor, but it’s hard to say if that actually changes the simulation outcome significantly without doing an actual analysis.

                Thing is, the latter case is usually swept under the rug because these models are meant to investigate trends, not exact values. If you look at the two graphs ( https://github.com/mrc-ide/covid-sim/issues/116#issuecomment-617304550 – that’s actually the only substantial example of “non-determinancy” that the article cites), both of them say pretty much the same thing: there’s a period modest, then accelerated growth, that settles for a linear growth after 50-60 days.

                It’s not really relevant if you reach 200,000 deaths in 62 or in 68 days – not because “reproducible outcomes don’t matter” but because there is an inherent expectation that a model that’s supposed to tell you how a flu will spread over 90-150 days in a non-homogenous population of 40,000,000 people is not going to be accurate down to a few days.

                Edit: to clarify – I’m not saying that’s not a bug, it is. But it’s definitely not clear that its impact over the simulation results is enough to invalidate them – in fact, if I were to speculate (which is exactly what the authors of these critical articles do, since they don’t actually run any numbers, either) I’d say they probably don’t. The one bug report shows only two curves, and that’s not even enough to refute the authors’ argument that averaging enough runs will cancel out these errors.

                Edit: also to clarify – what parent comment is saying is, IMHO, completely correct. The only source of non-determinism in the result should be the non-determinism in the model, and bugs that introduce extra error factors should absolutely be fixed. However – and this is the erroneous message that these articles are sending – tainted result sets can still provide valid conclusions. In fact, many result sets from actual, physical measurements – let alone simulations – are tainted, and we still use them to make decisions every day.

            2. 3

              Do you understand the value of reproducible research, which logically implies making the source code open in this case? Are you aware that Ferguson’s code wasn’t open source for over a decade, and that is part of the problem?

              There is a culture problem in academia around this, but it is getting better and more journals are requiring source code with paper submissions.

              In this case, the model has been reproduced by researchers using different Probabilistic Programming Languages (Turing.jl and STAN), which is the bar it needed to reach. Discussion of the implementation quality isn’t really useful or scientifically interesting. It’s the inputs and modelling assumptions that are interesting.

              (Draft?) replication post here: https://turing.ml/dev/posts/2020-05-04-Imperial-Report13-analysis Code for that post is here: https://github.com/cambridge-mlg/Covid19

          2. 4

            There’s coverage from the first link in the submission.

            1. 0

              “Lockdown sceptics”, seriously? “Stay sceptical, but presuppose the conclusion you want to reach and find facts in support of it”?

              1. 6

                That’s neither here nor there, let’s stay on discussion about the issues they’ve found.

                1. -2

                  Yeah, they may have a perfectly good breakdown of issues in the simulation which affects results, I’m not discussing that. I didn’t take the time to read it (and probably won’t; the topic doesn’t interest me that much), and I should’ve been more clear that I’m not saying their findings are invalid. I just thought it was worth pointing out, and probably should be something people keep in mind while reading their review.

          1. 2

            I wonder how this privacy bug arose. Was it perhaps a developer deciding to serialize information about the users’ friends as JSON/XML “subobjects”, to be “helpful”, without considering the permissions that the friend had given to the app? In other words, was it a Confused Deputy bug?

            1. 2

              It’s a little too close to what happened with Cambridge Analytica for comfort.

            1. 3

              The end-to-end argument challenges this optimism directly. No matter the sophistication of the underlying building blocks, it argues, we’ll always have to define and enforce the essential correctness properties of our system at the topmost end-to-end layer of design. We can’t trivially derive correctness from the correctness of our subsystems: we must always consider it as an end-to-end property.

              Mathematically, this is completely untrue. Many desirable properties can be specified mathematically, and at least in principle, proven of some system in a proof assistant like Coq or Isabelle or Idris. The reason you can’t compose some correct TCP stack with some correct application-level code which assumes the absence of network partitions and have it be correct, is not because “correctness doesn’t compose”, but because TCP does not and cannot guarantee the absence of network partitions in the first place.

              1. 13

                I agree with almost everything the OP has said.

                Where there is confusion is around the case that this is a “quiet crisis”. I used to think that software managers didn’t know that open-plan offices and ageism produced a low quality of result. The older I get, the more I’m aware that they do know. They just don’t care.

                Business has this anti-intellectual culture and the flip side of that is that not knowing how this technical “voodoo” works is a point of pride, because only the low-status peons actually know that “mechanical” stuff. This also makes it really easy to blame “tech” when things go wrong. Executives are expected to be on top of things like the latest corporate logo redesign or the press coverage they’re getting, but technical excellence isn’t valued and technical failure can always be blamed on the programmers– even if it’s a software company– with no consequences for the individual executive. Thus, conditions for programmers will deteriorate. Sticking the programmers in an open-plan cattle pen saves money on paper (that’s a bonus for some cost-cutting shithead who doesn’t actually do anything). When the programmers all become less productive, the blame can be thrown on them as individuals.

                The OP is the rare software manager who actually cares more about doing his job than promoting himself and climbing the ranks. Very few do.

                Is this a “quiet crisis”? I don’t know. I mean, technical excellence isn’t rewarded in the corporate world, and companies are still profitable. I might personally think it sucks that there’s so much tolerance of imprecision-of-thought and half-assed work, but we’re not actually seeing these executives bear the consequences of their decisions, so I’m increasingly convinced that everything we fight for, as principled technologists, actually doesn’t matter to the global economy.

                1. 6

                  “Doesn’t matter to corporations” is very different from “doesn’t matter to the global economy”. Robin Hanson, who is a professor of economics at GMU, has uncovered numerous ways in which people behave, to put it bluntly, irrationally and hypocritically, and some of these relate to the corporate world in particular. For example, Hanson claims (I believe correctly) that if corporations really wanted to hear an unbiased view of how likely a project was to come in on time and on budget, they’d get feedback from the people who actually have insight into this, i.e. the front-line employees - in our case, software engineers. Hanson believes that internal prediction markets are a good way of obtaining accurate feedback - I’m afraid (I know you disagree with me here) I prefer Scrum - but regardless of what you think about that, the general principle is sound. But getting accurate feedback from front-line employees is a relatively uncommon activity in the corporate world. Why is this? Well, we can conclude one of two things: either corporations by and large do not want to hear accurate estimates, they instead want to hear political estimates manufactured by managers and executives (and in some cases individual engineers) to make them sound good. Or, alternatively, there are a large number of middle managers and executives who genuinely believe that a military-style command-and-control hierarchy is the best way to obtain accurate estimates. I’m not sure which possibility is worse.

                  But neither of these possibilities imply that this local “political” optimum is necessarily economically optimal, either for the corporation itself or for the economy as a whole. Traditional corporate organisation is something humans fell into - it was copied from the military, and we shouldn’t expect that it will continue to be seen as the most effective organisational form forever.

                  1. 3

                    You’re correct. Also “corporations” don’t have discernible wills. People within corporations do. Executives don’t actually care if projects succeed or fail– only how it will affect them. They’d rather have a major failure that they can blame on a rival than a success that puts them at risk later on.

                  2. 3

                    What are the options for the peon ?

                    1. Go join another corporate
                    2. Work at a younger corporate
                    3. Become an executive
                    4. Join other peons and start a peon collective and wait till the peon leader sells out in the end to become another executive.
                    5. Join peon anonymous
                    6. Quit / Die

                    Some want to work within the system.
                    Some outside of it.
                    The input of the System is your soul. The output is profit.

                    Monsters exist because we lack the discipline to defeat them. That’s the first assumption made by the monster.

                    1. 2

                      I think we can start by not thinking of ourselves as peons, and to demand that our employers treat us as trusted professionals. Organizing around our interests can help us at denying talent to employers who refuse to do so, and we can make this a global effort.

                      So #4 seems like the strongest suggestion. And sure, a “peon leader” is a SPOF. That’s why there needs to be at least the threat of competition. If one programmer collective turns corrupt, then another should replace it.

                      The best strategy that I can see is to create an exam system like what the actuarial sciences have, and build up a professional society from that.

                      1. 1

                        Fair enough. The best-case scenario I can see is for ACM / IEEE like organisations to do it.

                        The challenge I see is for selling that idea to the industry and a vast number of hippy-dippy-dropout programmers to actually write it.

                        Culturally I suppose the best we can do is to

                        1. Increase Awareness
                        2. Encourage corporates that do better things.

                        I have an idea, let’s build a website that reviews ….. oh wait that’s glassdoor.

                        1. 1

                          I think sites like glassdoor are the best hope/tool we have. I left my old job based on being bored of the grind and realising that after being there for 8 years without a significant pay rise i was now being under paid, when i asked around my own work I found out that even the new graduates straight out of uni were paid more. I had slipped through the cracks and was stuck in a rut, half my fault, half my boss’s. It took glassdoor or similar to prompt me initially.