1. 9
  1.  

  2. [Comment removed by author]

    1. 3

      Complete disagreement on disabling ssh. I have no idea why anyone would do this. AWS console tools are relatively weak, and can’t provide anywhere near the control of a shell.

      The author’s point is that if you’re thinking about SSHing into your AWS instances, you’ve got a potential weakness in your infrastructure.

      To frame this in terms of the author’s point: why would you want to SSH into an AWS instance? If something goes wrong you should be looking at the logs on your central syslog server to see what went wrong and using automation to stop that instance/start a replacement. If you have some sort of administrative task to perform (hotfix, security patch, etc.) you should write a deployment task and execute that instead. All your instances are stateless, so this shouldn’t be a problem…

      That’s the dream, at least.

      1. [Comment removed by author]

        1. 4

          I don’t believe you’re seeing the forest for the trees. If your logging infrastructure and tooling doesn’t allow you to identify why a node failed after the fact and subsequently why a new node couldn’t spin up to take over, that’s a problem in the logging tool set. The goal is to never need to log in via ssh. If you can gather the interesting data relevant to the problem at hand better/faster/whatever via a loop over all servers involved to manually awk/grep/sed through the log files, the problem isn’t your skill set: it’s the monitoring/logging/tooling not providing you the actually interesting information in a relevant manner.

          1. [Comment removed by author]

            1. 2

              Also, there is a huge difference between preferring not to use a shell and disabling it entirely.

              The author addressed this in a subsequent edit: the reason he disabled SSH was to keep himself from cheating and reaching for SSH when he should be improving his toolchain instead. He noted that in cases of debugging issues on individual nodes SSH gets turned back on.

              In general, when I read about successful AWS deployments I read about people that understand running instances can die without warning, that keeping little state on the machines is the best way to mitigate that risk, and that the real reason for using the system is to scale resources up & down in response to external load. In contrast, the AWS “horror stories” I read almost always come from people that don’t know the E in EC2 stands for “elastic”. They overspend for EC2 instances and pretend they’re just another box in their rack.

              That’s the main thrust of the article: if you treat your AWS infrastructure like you treat your existing server farm, you’re probably not getting as many benefits as you’d think - and you’re probably at risk of a big headache when AWS does something particularly AWS-y, like announcing one of your instances will be retired in the next ten minutes…

              I can say I’ve had really good luck with the author’s approach on a small scale. AWS does a good job at scaling up & down quickly and at a price that a small startup can handle. But I’ve never lived in a world with thousands of servers - what would your approach be?

              1. [Comment removed by author]

                1. 1

                  I’ve got to say this conversation is really neat. It’s easy for me to forget that I use these tools in my little startup bubble, but so do companies like Netflix - on a scale I find unimaginable.

                  I totally get that EC2 instances are ephemeral…indeed, most of the code I build lives on spot instances, which are even more iffy. Graceful shutdown is still needed…there’s really no value in architecting a system with absolutely zero local state just because your cloud vendor refuses to provide a reasonable shutdown window.

                  For example, I’ve had good luck with building into spot instances before but I can guarantee that’s all just law of averages. I guess it’s easy for someone in my shoes to make a stateless instance when it’s going to gracefully shut down for lack of load long before it gets killed due to bid constraints.

                  For example…lets say we were processing log files from stored on S3. Well, we could put each individual log line in SQS and always be stateless! Except now your AWS bill will be multiplied by ten for this process…SQS is more expensive than S3 byte-for-byte. There’s always a trade-off…

                  Done that too, and it works really well when you can log at full bore for a few bucks a month ;-)

                  All this leads me to ask:

                  you can go broke doing things the AWS way if you’re not careful.

                  what did the cost analysis look like for a big environment like yours? I’ve worked in small to midsize companies that have evaluated AWS and they’ve never seemed to find the price sweet spot that justified change.

    2. 2

      Amusing (maybe? I was amused.) fact I recently discovered. I once put a file in S3 and then deleted it. Amazon still charges me $0.01 per month because as near as I can tell, they execute a ListAllMyBuckets command on my behalf when I sign into the console. Or maybe somebody else is listing my buckets? I’m sure not. At least, I see one or two of those requests in the usage report and nothing else, and Amazon rounds $0.00005 up to $0.01. In principle I’d like for this to be correct, but as a practical matter my principles don’t activate for at least another order of magnitude of money.