1. 3

    “Teach Ansible to talk to Github on your behalf” enables all your servers to establish arbitrary SSH connections using the private key(s) in your agent. That’s pretty terrible unless you explicitly and consciously decide to have zero isolation between your hosts. Similarly for storing all secrets in one file: you’re sharing all the secrets with all the hosts.

    If the author does think that’s fine, then I think the article should at least clearly state the implications, as certainly not everyone agrees. I, for example, manage a bunch of servers with ansible where maybe a dozen people have root access. I don’t want those people to be able to SSH anywhere with my keys.

    As for “Add Github to known_hosts properly and securely”, doing the keyscan on your laptop does nothing to prevent MITM attacks, not on first use nor on subsequent executions of the task. It will just write whatever ssh-keyscan returns this time into the known_hosts. The bit about having to write another play for updating seems wrong to me. Since talking to Github via ssh would blow up badly if Github ever changed their hostkey, I think hardcoding it is a fine solution. (If we ignore the general inadequatenes of transport security for authenticating code you’re about to execute, but that’s a different discussion.)

    1. 1

      I agree with this. I thought the points were overall pretty good, especially the variable handling, vagrant setup, and error handling. All points I came to through some hard experience. I don’t think a lot of this is easy to understand when getting started and I like to see articles like this.

      But the SSH connections are pretty important and it involves the security engineering of the application architecture. I do some not-great-things as well but they are all an element of the politics behind the infrastructure rather than designing a secure system. Telling the difference is not obvious.

      I’d take the points of SSH with a grain of salt and not architect like this unless you are sure it has to be done. I understand it is a trade-off. A good self-test is to name the trade-offs and discuss it with you team.

    1. 2

      For tech books, I like to get the feeling for the book by reading parts of it with a sample chapter (or on Safari Books, which I have a personal account). If I think the book is good and I feel it has lasting value then I prefer to read it in print. In print I feel I can dedicate recollection a lot easier to the things that matter, including remembering to keep reading it. Ideally, I’d just buy them all in print, but that can get out of hand :)

      I have a Safari Books Online account for the times when I need to tech up on something quickly, e.g. if I need to figure out how to pull something together and get running quick having access to lots of resources of which I only need a small amount of info, I think that’s great. But in depth analysis requires dedicated time to the material, otherwise I don’t learn.

      For non-tech books: Absolutely in print. For a lot of the same reasons for as tech books, I can recall more and I can get engrossed in the material. I think online reading can be OK, but it’s the stuff that goes with it being online that’s an issue. There’s always something else to do on my iPad.

      1. [Comment removed by author]

        1. 3

          You can never truly trust a method called distributefundsequally, because there is no mathematical truth or concept behind it.

          I 100% agree with this statement.

          I’d prefer to say something like “Naming is Documentation.”

          I’ve learned to treat documentation like I treat the news; that it’s made up of statements which may or may not be true, that may help or confuse your view of the world, so take them with some amount of inherent uncertainty.

          1. 1

            I think we’re in agreement on the spirit of what I intended the post to say, but differ on how to say it. Communication is hard.

            I also think that my post intends to address your last sentence there; Badly written methods will have a name that leaves even more uncertainty about what lives within, and the better the naming the less likely that someone should be suspicious of its contents.

            At the end of the day however, if you have behavior inside such a method that doesn’t match up with its name you don’t have a naming problem you have a people problem.

          2. 2

            I think that’s a fair thing to say. Also, yes, the naming is useful to a team. As soon as I read the parent article I immediately reached into my own code base and began to update an interface where the common method was ‘execute’, and I’ve been telling my team that this really means “send a request external to our network”. The generic name was left-over from bottom-up design. I didn’t know what it was doing when I initially wrote it.

            I’m getting to the end of an effort that was to allow for generic integrations between companies and I chose to use this as an experiment to deploy a minimalistic replication log on top of postgres, taking many of the ideas exposed in Martin Kleppmann’s blog and Jay Kreps writing.

            So far I think it’s been successful, but the biggest challenge I’ve been facing is solidifying an useful abstraction. I have a generic way to append actions onto a log and processing that log has it’s own abstraction where we can just hook in one or more commands and have a queue run things for us (that’s where the ‘execute’ method was). The main functions of this were to enable reliable re-processing of events (e.g. re-send a faild API request), and to isolate the processor inside of a monolithic application in order to see the benefits of beginning to break the application up into smaller microservices (hence, getting to fault tolerance and resilience around the log abstraction). I’m finding that these abstractions are not trivial. Go figure.

          1. 5

            Love this article! I’m embarrassed to admit I’ve never thought about collecting the data with tcpdump (which I’ve been using for decades at this point :) and viewing it in Wireshark.

            1. 3

              Probably safer to capture it in tcpdump than Wireshark - and then you can view the capture file off line as well.

              1. 5

                I think that was the main thing I realised reading this; at first I thought “why wouldn’t I just collect with wireshark as well?”, but duh, I need to capture on my servers too!

                1. 4

                  I didn’t follow, what’s wrong with using tshark on servers?

                  I fully admit that I only use tshark more commonly than tcpdump because I find tshark a little easier to get what I’m looking for. I’m trying to decide if I should look into tcpdump more for safety considerations in production environments.

                  Some of the things I consider when capturing on servers (using tcpdump or tshark):

                  • I may wait to do a capture until load is lower (so as to not impact back pressure, timing of events in a brittle system)
                  • If there is redundancy in place I may decide to go ahead and capture immediately if it’s a high priority (hopefully we tested said redundancy!)
                  • If we are actively failing stuff then I may just jump in and capture with out regard to safety (accepting even more failure in the face of failure)
                  • If I’m in a non-production environment then I can accept failure conditions more readily (depends on your team)
                  1. 2

                    tshark is totally fine! I just didn’t know it existed, nor did I know exporting tcpdump output into Wireshark was a thing (etc. etc.), so it was always “do I install Wireshark (and as a result, X and half of the world) onto this server?”, and the answer was usually “heck no, keep X away from that”.

                    Using tshark directly I have no problems with :) I just am super familiar with Wireshark’s GUI, so being able to use that even with data collected from servers is a big positive.

                    1. 2

                      Ah, ha. I understand, and you’re totally right to stay away from installing unnecessary X packages on a server. I know for sure that tshark installs with limited dependencies (no X) on Debian.