1. 3
  1.  

  2. 2

    I rather vehemently disagree with the article. I think they misidentified the problem and proposed a terrible “solution”.

    The root problem feels like it is simply ‘poor engineering’, but they don’t give enough context for me to be sure. They created a non-deterministic system… how the heck did they test it, know it even worked? I honestly have so many unanswered questions (@haxor, can you give us some more info, specifically on why “The non-determinism made sense.”).

    You can’t future-proof (don’t feel bad, no one else can either) – it is just a a fantastic waste of time, and upgrades “problems you might have someday” to “problems you have right now”. Worst of all “future-proofing” leads to premature abstraction… which is of negative value. Because you will be wrong about what future-you needs anyway, and bad abstractions are insanely resistant to refactoring.

    Again, it feels more like an issue of not being data-oriented (Mike Acton’s definition) and reasonable and less about a failure to “future proof” (which of course, I applaud).

    1. 2

      It’s hard to describe, which is why I left it out of the post. The core of the system is deterministic. We have unittests and end-to-end tests to verify aggregation. But there’s an intermediate step that happens in production, where we take the raw data and pass it through what is essentially a machine learning classifier. Then we take the ML output and the raw data together to produce the aggregated results.

      The ML classifications change slightly over time. The model gets retrained, parameters change, etc. We’d only aggregate data a single time, so these variations didn’t matter before. In a few instances for bug fixes we’d have to reaggregate a second time, but it would be within a few days of the original aggregation so the output would only change slightly.

      Now we’re trying to reaggregate the data all the time. The slight changes in the ML classifier day to day compound over days and weeks into large differences. What we should have done is store the ML classifications back into another data file so it was set in stone. Then the final process of aggregation would be fully deterministic. We didn’t do this because the day-to-day changes were so slight. We never imagined we’d be reaggregating years later.

      I agree that future-proofing leads to a lot of problems. But in this case I think we should have been more diligent in making sure our entire data transformation was deterministic for every stage in the pipeline.

      1. 2

        I agree that future-proofing leads to a lot of problems. But in this case I think we should have been more diligent in making sure our entire data transformation was deterministic for every stage in the pipeline.

        That has absolutely nothing to do with future-proofing! That is just good engineering, requires NO thought about the future, not even the next day. At some point, a bunch of engineers in a room said “Ohh, but it only changes a little, so lets ignore it” – which is where the problem came from, that is the root, that should NEVER have been allowed to happen. It wasn’t a lack of thinking of the future, it was not caring about correctness and overvaluing code and undervaluing data (LIE #3).

        If you focus on good engineering principals (which by the way, reproducible results is rather high up that list), you will in a better position at every point in the future, the ones you can foresee and the ones you can’t.

        I really think there is a good story to be told here, but it has nothing to do with future proofing. Talk about the dangers of non-deterministic code, talk about the dangers of focusing on the code rather than the data, talk about the dangers of letting your production and test drift (non reproducible results), talk about the dangers of not remembering “where there is one, there are many” in terms of data – there are a LOT of good lessons I think you could have extracted and shared with people.

        Instead, you have missed that opportunity and created a post that is bluntly harmful to the community and every engineer who follows it.

        1. 2

          I agree that reproducibility is good engineering practice. To be clear: Within a day, our system is deterministic. The ML classifier is stable, everything is fine. The same inputs generate the same outputs. The system was built to do the aggregation a single time within a short timespan, so from our perspective it was good enough and well-made.

          We never thought we’d want to reaggregate the data years later. It wasn’t one of our requirements. We saved a bunch of time by not enabling that to happen. I regret that we didn’t consider the possibility more seriously. If we knew that it would become a requirement someday, we would have paid the cost of future-proofing. It would have slowed our implementation down and hurt our time to market, but it would have been worth it.

          I think it’s a bit extreme to say this is “bluntly harmful”. Hidden dependencies and unknown requirements are common problems we all face in writing software. Minimal future-proofing is one reasonable way to deal with them.

          1. 1

            I think it’s a bit extreme to say this is “bluntly harmful”.

            That was me trying to be really polite… using toned down language, my feelings are far stronger than that implies, as I think “future proofing” is basically evil, a plague in our industry and responsible for countless failures of companies and projects. Future discounting is vital for healthy projects.

            Hidden dependencies and unknown requirements are common problems we all face in writing software. Minimal future-proofing is one reasonable way to deal with them.

            Specifically, what is minimal future proofing? How do I know when I have done enough to qualify as minimal future proof? Since by definition it is unbounded, how many hours a week should my team be spending future proofing? During this time, what activities should we be doing? How does this allow me to know the unknowns and see what is hidden?


            What if it becomes possible someday to run our system on a thousands of ARM chips for better results or cost? What design decisions would we change?

            What if it becomes possible someday to run our system on quantum computer for better results or cost? What design decisions would we change?

            … just those two questions could eat massive amounts of time and debates … and they are just two reasonable futures of infinite possibilities that could happen in the “future”.

    2. 1

      @haxor, how do you feel this message interplays with your response to the Static/Dynamic thread here? Specifically, the implied message in that response is that your survival depends on running as fast as possible always, whereas the message here is to sit down and think a bit harder. Do you think you could have convinced 1-year-ago you that he should slow down a bit and consider things?

      The future is very hard to predict, do you think you are feeling a bit of “hindsight is always 20-20”? Could you have really predicted that these decisions would negatively affect you?

      Finally, you’re a bit vague on the actual problem. You say:

      The problem is that this data transformation is non-deterministic and may produce slightly different results each time it runs.

      Would it be possible to simply make it deterministic? Would customers notice if all of a sudden things were always the same?

      1. 1

        Thanks for the close read of this post and that other thread.

        Do you think you could have convinced 1-year-ago you that he should slow down a bit and consider things?

        Probably not :) So I’m trying to get better at taking a small amount of extra time to consider the downsides while moving fast.

        Do you think you are feeling a bit of “hindsight is always 20-20”?

        Definitely.

        Could you have really predicted that these decisions would negatively affect you?

        No. But what I appreciate now, which I didn’t before, is that there’s some minimal cost of future-proofing that you could pay into that may return large dividends later on. So it’s worth considering those possibilities more deliberately even in the beginning when you’re moving fast. It’s worth taking a small amount of time to take a step back.

        Would it be possible to simply make it deterministic? Would customers notice if all of a sudden things were always the same?

        It should be going forward. That’s exactly what we’re working on now. But all the data of the past may be stuck in time.

        1. 2

          Do you think you could have convinced 1-year-ago you that he should slow down a bit and consider things?

          Probably not :) So I’m trying to get better at taking a small amount of extra time to consider the downsides while moving fast.

          This is a constant problem I run into. For some reason, the software industry is very reluctant to learn from those who came before them. Most teams I work with fight really hard against thinking a little bit harder. While the confirmation bias is probably strong with me, I have not seen many examples where stopping and thinking a little bit harder has not been the right choice. This has often been visible in terms of missed deadlines because the team did not think hard enough about what they actually were trying to do and how to accomplish it. Missed deadlines almost always translate into a scramble to reach it which then translates into technical debt.

          How do you think you would most successfully communicate what you’ve learned to the incoming generations?

          1. 1

            The advice I’ve heard here is to have a checklist of everything you know that could go wrong. It’s just a list of possibilities. Many won’t apply (like scrolling behavior on an app doesn’t apply to backend scaling behaviors). When you’ve finished scoping out a project, doing design, or building a prototype and are going to build “the real thing”, you can go through the checklist and see if there’s anything you forgot to think about. This is especially helpful with productioning, where you want to ensure you’ve thought through all of the failure modes of your system.

            Another thing I’ve seen is to enumerate risks. Define all of the assets your system has. Talk about the risks those assets have from a security standpoint, scalability, etc. Then talk about how you can mitigate the risk by making changes to the system.

            1. 1

              The problem I have experienced is not what to do but convincing people that it is a good idea to do it.

      2. 1

        Without a really good crystal ball how do you know what future-proofing you are going to need? It’s really hard from your post to determine if this problem you’ve identified today was completely obvious to everyone but you. Personally I watch out for things like cross cutting and leaky abstractions so whatever terrible choice that were made N units of time ago, are encapsulated well enough that “fixing” them doesn’t require a whole re-write.

        1. 1

          I’m not sure. Predicting the future is hard. Two other great engineers built the system with me. They also missed this nuance. Since then we’ve had a dozen different people work on it and this short-coming wasn’t obviously a problem until now.

          1. 1

            Which is why future proofing is impossible, regardless of the amount of money you throw at the “task” of “future proofing”. Which by the way – since you had this breakthrough – how many hours a week are you going to spend attacking this problem of “not enough future proofing”?