1. 76
  1.  

  2. 44

    Let’s see. I’ve:

    • brought down credit card processing for a national retailer
    • cut off all Internet access to a major United States military installation (twice!)
    • helped bring down an entire automobile manufacturing facility
    • helped break all internet access via cell phone for one of the major US cell networks
    • shipped incorrectly optimized code that caused the (important, air-gapped) system to become accidentally quadratic

    So, you know. Be careful.

    1. 21

      I can relate to this, though I wouldn’t say I’m on the same level as you, there’s for me a long way to go. Two of my most significant achievements:

      • Brought down the debit card processing for a multinational bank issuer (only on one country sadly, and just for 2 hours).
      • Deleted the entire payment records database for a local private university on a semester, nobody noticed it, and I managed to restore it using the only backup in existence: an Excel sheet I was using to verify data.
      1. 9

        I managed to restore it using the only backup in existence: an Excel sheet I was using to verify data.

        Deeply in awe right now.

        1. 7

          To make things more exciting, that Excel file was just in memory, so a complete computer crash would have left me helpless. I learnt my lesson and do double checks on data edition, it is funny now how I make sure about doing things right even when I’m just changing a value for parameter on a file.

      2. 14

        Sounds like you’ve had a productive and interesting career, then!

        1. 15

          plot twist: he’s an intern

        2. 5

          Did you face hard consequences

          1. 15

            brought down credit card processing for a national retailer

            No. I fixed it before it became a problem and explained to the rest of my team what had happened.

            cut off all Internet access to a major United States military installation (twice!)

            First time: no, because they installed the update without testing it in their environment first. Resulted in a lot of paperwork on their end, though.

            Second time: whoo boy. I had written the compiler that turned network intrusion signatures into code that could run on our devices. I messed up the code generator for one part, so that in certain rare circumstances, an and would be an or…which meant that certain signatures would suddenly start matching just about everything. Some customers had it set up that certain signature matches would result in blocked traffic. You can see where this is headed.

            The compiler had actually been pretty extensively tested, but the problem manifested on an older configuration that didn’t have a device still in our testing pool (I know, I know).

            I had to spend a couple of days doing non-stop calls with my boss to various impacted customers, apologizing, answering their questions, and basically (and deservedly) eating crow.

            helped bring down an entire automobile manufacturing facility

            helped break all internet access via cell phone for one of the major US cell networks

            These two ended up being a lot of noise and phone calls but, ultimately, the customer had deployed stuff into their environment without testing it first. The issues on our side were from being too aggressive with what we defined as “malicious network traffic”.

            shipped incorrectly optimized code that caused the (important, air-gapped) system to become accidentally quadratic

            Not from the customer or my company, no, but from myself, very much so. I just about had a nervous breakdown, seriously. It got bad enough that I had resolved to quit as soon as I figured out what the problem was (I certainly wasn’t going to quit and leave the problem for someone else), and had convinced myself that I was just terrible at my job and had been faking it all these years. I was miserable, working long hours every night for weeks trying to figure out the problem, constantly preoccupied, not enjoying time with my family.

            Finally figured out the problem, got the fix in, and ended up staying and being reassured that I didn’t suck, which was nice.

            (Moral of this last story, database query optimizers can sometimes make the wrong decision so don’t assume that the solutions they pick in your test environment are the ones they’re going to pick in the field…)

            I actually got a job offer during that time from what might have been my dream job. I turned it down because I didn’t want to leave my current company in a bad state. I don’t know if I made the right decision, but I’m happy, so I suppose that all worked out…

            1. 2

              Thanks for sharing.

        3. 14

          In case you dislike reading twitter threads (as I do), here’s a slightly better version that was linked to at the end.

          1. 10

            It should be automatically done for every twitter link posted on Lobsters

          2. 14

            My own story: there was a triangle of 3 people working with a new customer, each of whom thought the next was the one handling communication with the customer. Turns out, the person who was actually supposed to be handling it was me. Upshot was, the customer didn’t get any of their time-sensitive requests handled and ditched us. My first job, and I lost us a customer paying more than my salary was worth.

            Not computers, but still technology: I was working on an oil rig drilling a natural gas well. The client I’d worked with before, but it was a new supervisor, new rig, and my first job working as senior person. My job was “geosteering”, basically being on-site geologist and letting the drill crew know where in the target rock formation they were to keep up with the minor weaves and wobbles of the layer of rock they wanted to be in. It is as much art as anything else, and I was pretty okay at it, but I screwed up badly – the target formation dipped down in a way that didn’t show up on any of the nearby wells or seismic surveys, and I misinterpreted what was going on and didn’t notice for nearly a day. Looked dumb to my superior who’d suggested earlier that might be the case, I had to call up the head of the drill crew with 20 years of experience on me to admit I’d screwed up, etc. I was just lucky my mistake had taken us above the target area, since right below it was a very hard rock formation that would have taken days to drill out of. IIRC a day of drill rig time starts at 5 figures and goes up from there.

            Kinda surprised I don’t have more of these, in retrospect. Plenty of more minor ones, like pulling the wrong drive out of a RAID array and needing to restore a customer’s server from backups, but those are less dramatic. Has any data center tech of any experience not done that once?

            Edit: shout out to people having really bad days, like dropping a satellite on the floor: https://upload.wikimedia.org/wikipedia/commons/4/43/NOAA-N'_accident.jpg

            1. 5

              Reminded me of one of my big screwups in my oilfield days. One of the things our crews did when we got to the rig was to install a special pressure sensor on the main drilling mud line, which is pressurized to 1500-3000psi or so during normal operations. The rig manager assigned the least experienced rig worker at the site to help me, as it normally goes. He showed me where their mud line hookup was and where the rig’s pile of spare parts and adapter was, and I grabbed an adapter that looked like it would fit, and we proceeded to get it all screwed together. Note that these are 2” NPT connections that require enthusiastic action on a 4 foot pipe wrench to install or remove.

              Anyways, everything seemed to work, and we went on our way, drilling according to the plan. Then 3 days later, the rig manager came back to our trailer with the adapter I had used to tell me that it was an adapter meant for water pipe, and only rated to 200psi. We had been drilling with it in place, at 2500psi, the whole time, over 10 times the rated pressure. That thing could have let go at any moment, and could have easily killed somebody if they happened to be in the way at the time.

              The rig manager promptly found an adapter with the correct rating, and we reinstalled and got back to drilling. That was quite the reality check for me. I learned to be much more careful about verifying the ratings of adapters and slings and other kinds of parts. There’s a reason why every company in the industry has hardcore policies about things like throwing out and destroying things that don’t have easily visible ratings.

              Speaking of safety policies, at least whoever was working on that satellite had correctly roped off the area where it would hit if it fell. Fortunately, that thing didn’t hurt anybody when it fell, besides somebody’s pride, budget, and timeline.

              1. 2

                Oof. That sounds like an oilfield story, yeah. I’ve screwed up multiple other things now that I’ve unrepressed the memories, but not quite like that. I know a LOT more about machinery now than I did at the time, so all I was really knew was that if I ever needed to touch something mechanical, I had to get the rig crew to help. Ideally by asking someone who knew what they were doing, as your tale demonstrates.

                Out of curiosity, can I ask what you did in the oilfield? The first things that come to mind would be MWD or whatever the IT system that reports all the rig data is called… I’ve forgotten so much of the random-ass terminology. But there’s so many other things going on that you could have been doing something I’ve never heard of.

                1. 2

                  That was MWD all right. We needed special pressure sensors that were sensitive to pressure changes in the frequency ranges our tools operated in. Funny career path, I ended up later on working with the people who specced out and designed those sensors and the software that demodulated the digital data being sent. Data management on the rig tends to be a hodgepodge of companies all measuring slightly different things for different reasons and reporting that data in different ways.

                  I’ve screwed up plenty of software stuff too over the years. Alas, nothing comes to mind that makes a good story - mostly not too dramatic consequences, usually too deep in the weeds of some piece of technology to explain simply.

            2. 7

              This thread started some interesting stories, as a reminder that everyone makes mistakes - the trick is learning from your mistakes, and trying to ensure that processes are in place to stop such mistakes in the future.