1. 31

I know the best answer is: get your hands dirty, but I would like to start with some foundational knowledge first: books and courses. What do you recommend?

  1. 45

    I learned AWS on the job.

    1. 8

      I am learning on the job at the moment, and this youtube channel is the best resource I’ve found.

      He’s got a nice style and a knack for cutting through irrelevant details to expose the essence of the services. You can try this excellent:

      18 minute overview of the core AWS services

      to see if you like it.

      I’ve also done (or partially done) a few on LinkedIn learning and somewhere else I forgot, and they weren’t as good as this free resource.

      1. 1

        very strange list of “most important” IMO. But what is most important to one will not be most important to another.

      2. 1

        +1. I did a 3 day course offered by AWS. It was helpful to learn the foundational blocks like access management, ARNs and course taught us how to build a web service.

        For any new services, I’ve found their docs helpful and there are usually tutorial style documents that teach you how to do a specific thing with the service.

        1. 1

          I did this as well but I recommend not having to learn how to deploy, configure, and maintain Hadoop clusters at the same time as in my case.

        2. 21

          Full disclosure: I work for Azure, but the advice here is identical for all the big three.

          The foundations are compute, storage, and networking. You create virtual networks and network storage volumes, and then you create virtual machines and attach the storage volumes and connect them to the virtual networks. You design in terms of these. Then you start looking to see if some of the services on top of that are worth outsourcing parts of your design to. AWS IAM/Azure AAD/Google Cloud IAM is one of the most common first ones because it gets secrets off your machines. DNS is the other one.

          So what do you need to learn first? IP networking and the basic ideas behind network storage. You should be able to sketch out a layout in terms of these three fundamentals independent of a cloud provider or running your own hardware. Then spend time with the docs for networking (AWS VPC/Azure Networking/Google VPC), network storage volumes (AWS EBS/Azure Storage/Google Cloud Storage), and compute (AWS EC2/Azure Compute/Google Compute Engine) to learn how to do the things you want to do in your particular cloud provider.

          Once you have that, it’s worth exploring the other services provided. Blob storage (AWS S3/Azure Storage/Google Cloud Storage) is often high value. Managed databases are often high value add if you have a small team and database operations will cost more in admin overhead than the vendor markup, that can be a good investment. Azure and Google have high quality ML services if those are useful to you. This is a question of poking around and seeing what’s there.

          It can be tempting to go overboard and say, “Oh, I need this, so I’ll grab this service, and that, so another service…” Every service you add is one more dependency that you have to understand and one more interface where you need to marshal and manage data. If you only need a queue in a local program, don’t use Kinesis, just write a queue in your program. If you just need to store a couple of megabytes of temporary files that a program can access, put them on the machine’s disk instead of using S3.

          This is more or less how I learned AWS (back before I worked for Microsoft). I learned the fundamentals in the 1990’s before there were big cloud providers, and when it was time to learn them, I just went hunting for how to accomplish specific tasks.

          1. 6

            Full disclosure: I work for Azure, but the advice here is identical for all the big three.

            In the same spirit, I work for MSR in collaboration with some Azure folks.

            The foundations are compute, storage, and networking. You create virtual networks and network storage volumes, and then you create virtual machines and attach the storage volumes and connect them to the virtual networks. You design in terms of these.

            I think there’s a mindset shift here that’s quite important. You can treat the cloud as a way of cheaply acquiring computers, or you can treat it as a mainframe that someone else manages.

            The former approach is very easy: if you’re familiar with *NIX systems, you can spin up a load of them on demand with a little bit of orchestration. Storage, networking, and so on are exactly the same as if you bought a pile of computers, except that you can provision or deprovision them very cheaply.

            The latter approach gives you a lot more value, at the expense of some lock in. For example, if you buy VMs from Azure, the unit of accounting is a multiple of reserved VCPUs and GiBs of RAM per hour. If you use Azure Functions, the units of accounting is CPU-seconds and MiBs of RAM per second. Scaling down is as important as scaling up in a lot of cases: you want your deployment to cost as close to zero as possible while not in use. If you provision a load of VMs that are only used intermittently then this is going to cost more than just buying the hardware. If you use Azure Functions or Azure Container Instances, then this will be a lot cheaper for bursty workloads.

            This applies to other resources as well. If you’re writing a FaaS-based service, then you can use something like Azure Data Lake for storage very easily and pay per GiB for the data that you actually store and this will grow on demand. If you write a *NIX application and deploy it in a VM, then you’ll be using managed disk storage and paying per GiB for the size of the disk and you’ll need to manually resize the disk once it starts to get full and if you want to scale up then you need to either move to explicitly using cloud storage (or database) APIs rather than filesystem ones, or build some other communication mechanism.

            1. 4

              I think there’s a mindset shift here that’s quite important. You can treat the cloud as a way of cheaply acquiring computers, or you can treat it as a mainframe that someone else manages.

              This is true, and I think it’s important not to learn it this way. If the mental building blocks you have are based on the service catalog, then you are intrinsically limited in how you can conceptualize systems. Compare it to circuit design in electrical engineering. An electrical engineer works out of large catalogs of premade components, but those are chosen based on a framework of fundamentals, and dropping down to those fundamentals and building your own circuitry is always an option.

              1. 2

                This is true, and I think it’s important not to learn it this way. If the mental building blocks you have are based on the service catalog, then you are intrinsically limited in how you can conceptualize systems.

                I don’t think I agree. It’s important to not to be limited by the specific set of services, true, but a lot of the benefits of using the cloud don’t appear if you think of it purely in terms of IaaS offerings. A lot of the services that exist can’t be built directly on top of IaaS with the same scalability or reliability (at least, not for the same price) and that’s only going to increase over time. Consider something like Azure Durable Functions. You could build something like this on top of VMs but the reserved capacity that you’d require would make the cost an order of magnitude higher (not to mention the fact that the core ideas behind it required an entire PhD to develop).

                Compare it to circuit design in electrical engineering. An electrical engineer works out of large catalogs of premade components, but those are chosen based on a framework of fundamentals, and dropping down to those fundamentals and building your own circuitry is always an option.

                I think that’s a very good analogy. The folks who think about gate-level layout are very important but if you’re working in SystemVerilog or some higher-level abstraction then that impinges on you only when you’re considering critical path length. The kinds of useful abstractions that you build when constructing something like a CPU or an accelerator core are so far removed from individual transistors that understanding that level of detail can be more of an impediment than a help. The simplest component that you’ll deal with is something like an adder or a mux, and that will come from a cell library that someone else has carefully optimised.

                Knowing how to build one is intellectually interesting but can be a problem because the way that you learned about building them at university is not how they work at 7nm (for example, in a university VLSI course, wires are just plumbing that you largely ignore and can put as many as you like in your design for free, at 7nm they’re incredibly expensive). The low-level detail will change slightly every process generation and someone working at the higher levels of abstraction doesn’t have the time to learn precisely how it’s changed with each iteration. They need to understand how the costs of components have changed but not why.

                The cloud is precisely the same. Looking at hardware trends and roadmaps for the next 5-10 years, the underlying building blocks are going to change considerably. If you focus on VMs running *NIX and Ethernet networks as building blocks then you’re going to have a mental model that doesn’t adapt to these changes.

          2. 15

            Try specifying infrastructure in a tool like Terraform and see how things work. It’s a way to “get your hands dirty without ending up with a bunch of garbage resources messing up your mental model. You can tear everything down with one command. Also, the terraform docs for AWS serve as a concise summary of all the types of resources and how they fit together.

            1. 8

              This is how I use AWS. Period. I have almost never touched the web console, except to manage IAM and Route53 bases that I treat as data and not resources in Terraform. Oh, and I think I’ve created a few S3 buckets by hand so that they’re treated similarly. That is, so that a terraform destroy doesn’t inadvertently cause me to, uh, activate my backup restoration procedures in a flurry of expletives.

              1. 2

                Do you have any thoughts on Pulumi? It’s been recommended to me, but at this point I barely understand the difference between it and Terraform (I haven’t used either). I’m trying to pick a tool to do DevOpsy stuff and was going to go with Terraform mostly on the basis that I’ve heard of it and it’s not Chef 😅

                1. 1

                  I’ve not used Pulumi yet. It was in its infancy when I made a significant investment in Terraform.

                  I came up with https://github.com/colindean/terraforming-with-types at about the same time I heard of Pulumi. I left the org wherein I was using Terraform heavily shortly thereafter and never picked it back up seriously. I just started using Terraform at my current job about three weeks ago to manage our GitHub sprawl.

                  1. 1

                    Interesting about Terraforming-with-types. One thing that appeals about Pulumi is its support for Typescript, which we’ll be using across the rest of the codebase.

              2. 4

                +1 for not using the UI

                I tried terraform superficially and was a bit surprised about its restrictions. Maybe what I tried was a bit too complicated.

                CDK allows you to create the declarative resources spec in a programming language.

                I am mostly using CDK now but ironically with quite simple projects which would have also worked with terraform, I guess.

                https://docs.aws.amazon.com/cdk/v2/guide/home.html

                1. 4

                  We’re also using cdk at work. I quite like it. But is it possible to explore AWS with cdk as a beginner? I always take the opposite approach: I look at the service I want to use in the web interface. And when I understood the service, I write my cdk code.

                  1. 1

                    I think the documentation is often so focused on the UI but I pretty much always use cdk or something else in version control. Otherwise I quickly forget how I set it up.

                    I like UIs for inspection but hate them for setting up things.

                2. 1

                  To those wanting to go down this path: there is a very approachable series on YouTube on managing AWS with Terraform.

                3. 4

                  I work for AWS, my opinions are my own. I also used to be an an instructor at Amazon Technical Academy so I have experience teaching, seeing teachers in action, and writing curricula.

                  Thank you for asking this question. This is a tough question for me because there are so many angles that personally involve me.

                  RE: the term “AWS”, just like the IT industry as a whole, AWS means different things to different people. Some people are data engineers, some are networking gurus, some just want to do X or Y. Sure, use A Cloud Guru and Solutions Architects courses to get a broad understanding of AWS. But even then you will continue to feel left out, small, ignorant because there are always someone else solving some other problem (witness the long thread from Azure engineers about how important networking is, whereas many teams in AWS never encounter networking as a concept).

                  RE: the term “learn”, I highly recommend the O’Reilly book Apprenticeship Patterns. I personally love making “Breakable Toys”, solving a problem you are comfortable with but in a new toolset or language or framework you are not comfortable in. This gives you permission to fail and hones your attention because you are not worried about solving a novel problem. What is your go to breakable toy? What problems are you comfortable solving?

                  RE: “foundational knowledge”, I recommend trying out the newest version of AWS CDK, following a tutorial, then trying a breakable toy.

                  I love making silly Alexa skills (did you know they reimburse you $25 a month if you make an Alexa skill that is used at least once per month?). So to learn the Rust runtime for Lambda I made a Lambda function that calls a weather API for air pollution data, dumps the info to S3, then expose that data for Alexa. For me calling an API using code, parsing it, putting it into S3 wasn’t a big deal. But using a new Lambda runtime was mysterious and took a while to solve for me.

                  I could talk about this topic for years, write a book, anything. I love teaching, I love helping learners, I like AWS, and (to be frank) I do not know how easy or hard it is to learn AWS. I’d like to learn more about you and other’s difficulties.

                  1. 4

                    At first played around with some simple things on the free budget, just to learn what AWS offers. But apart from that I use normal hosting for my private stuff.

                    Then I joined a company hosting on AWS and learned it on the job. My company also had an account at acloudguru. Since I’m thinking about doing the AWS Solution Architect Associate, I watched the material there. Haven’t done the exam yet, but it’s part of my 2022 goals (among many others).

                    My recommendation would be:

                    • get to know the names, core features and naming conventions of the most essential products (e.g. EC2, S3, RDS)
                    • approach everything else from a concept perspective, i.e. think about what you need in general and then research what AWS offers in that regard and whether it works for you

                    As part of our AWS contract (because we’re a startup?) we also did a Well Architected Review together with an AWS consulting company and got some info on services we did not use, yet. With this I just want to say: No need to learn everything at once. Focus on the essential services. Even the “AWS Solution Architect Associate” certificate seems to consist of only a few core services.

                    1. 4

                      I mean, their docs are surprisingly good. They’re all I’ve really needed. To start, I’d suggest reading up on IAM, ECS, and probably S3. From there, just read the docs on whatever you need as you realize you need it.

                      EDIT: Since you’ve mentioned wanting a course, they offer courses you can take at your own pace: https://explore.skillbuilder.aws/learn

                      1. 2

                        What have you tried so far?

                        1. 3

                          Official docs, watching other people doing things, clicking around without any clue, and The Good Parts of AWS book by Daniel Vassallo (and I didn’t enjoy it).

                          I feel like I need some kind of course where you build some real parts of the infrastructure.

                        2. 2

                          If you have access to O’Reilly’s learn site they have a couple of AWS instructor led courses worth taking. They take you from creating a root account and securing it and some basics. Then you can dig in to specific services. I don’t work for them, just a customer.

                          1. 2

                            The best AWS person I know (he works as a networks specialist in their Professional Services team, and he is fucking amazing to work with) recommended:

                            • A Cloud Guru for the course materials
                            • WhizLabs for practice tests

                            Apparently these are the recommendations internal to AWS!

                            The second-best AWS specialist I know is a huge fan of https://learn.cantrill.io courses.

                            I have subscribed to both, because I am just not price sensitive for major strategic investments in my skills, and am making slow progress. That’s mainly on me trying to be a good dad to a toddler, rather than the courses themselves!

                            1. 1

                              Honestly I learned through documentation. Was told we were moving our entire infrastructure to server less functions and spent months reading docs and reimplementing everything. The documentation on AWS Lambda is some of the best documentation I’ve ever read.

                              1. 1

                                My employer paid me to take an AWS Certification class, letting me take however much time I needed to study. Practical experience filled in the gaps.

                                1. 1

                                  Not AWS but I learned Azure on the job, we’re in the process of migrating our infra from on-prem to Azure so there’s a bunch of hands on learning. Occasionally I’ll need to refer to official docs or MSFT employee if it’s urgent.

                                  1. 1

                                    FWIW, AWS provides a huge variety of services, and even the “lower-level” services like EC2 and S3 provide a lot of configurable knobs to turn. I don’t know a lot of folks who’ve “learned AWS” at a broad level, but I know many who understand the particular corners they needed to do their work.

                                    The training and certification options from AWS are structured around either what you want to work with (ML, data analytics, databases, etc) or the role you want to play on a team (developer, ops, architect, etc). I think their materials are only ok (not bad, not amazing) but they do at least provide a number of different structured learning paths depending what you want to work on.

                                    1. 3

                                      Yes I think OP is asking the wrong question, he should think about what his end goal is. It would be a fool’s errand to try to learn AWS broadly without a plan.

                                    2. 1

                                      I used https://learn.cantrill.io/ , and passed my Certified Solutions Architect (Associate) late last year. Even if certifications aren’t your main goal, the SAA-C02 syllabus covers core AWS services to a good depth, and a lot of other services at a “what does this do, and what’s its high-level architecture?” level.

                                      1. 1

                                        I’d like to turn the question around. How did your learn something you have high proficiency in? Follow that same path to this.

                                        I learn a bunch by doing and I also worked on a large team in AWS. Because of that, it was fairly easy to nibble off chunks of stuff I could model with my old knowledge (oh, this is like a circuit, this looks like a VPN, oh, this is like a gre tunnel) that applied to the corner of the system I was caring for. Because of this, I don’t know some foundational aspects of AWS, but that’s true with most systems I’ve interacted with. “Oh, I guess I know very little about filesystems, now that I’m faced with performance issues”.

                                        1. 1

                                          Combination of OJT and CloudGuru