1. 3

    hey @patrickdlogan, @GeoffWozniak, @nickpsecurity and all the others, Thanks to your precious feedback I finally improved the article by adding a section about Grid computing and improving the section dedicated to virtualization.

    Let me know if they sound fair enough in your opinion :)

    Thanks again for the support!

    1. 3

      Ok, maybe I need to try to write up a thorough history for new researchers to draw on since there’s a lot of it. For one, there’s nothing new about the main concepts of cloud computing: mainframes were already doing them. They had big machines running virtualized workloads with I/O accelerators that users connected to with dumb terminals. They were centrally managed. The machines were leased charging for usage by the minute or something like that (can’t recall). Here’s both a description of mainframe virtualization features that are cloudlike and a rant on how the wheel is reinvented:

      http://www.clipper.com/research/TCG2014009.pdf

      http://www.winestockwebdesign.com/Essays/Eternal_Mainframe.html

      MULTICS also tried to make computing a utility but it cost too much ($7+ mil per installation). It had better security and usability than a lot of systems.

      http://multicians.org/history.html

      https://www.acsac.org/2002/papers/classic-multics.pdf

      Another factor in concept of moving workloads onto multiple, external machines were distributed operating systems. They were mainly used for distributing workloads among commodity computers in one location (i.e. server room). That’s what grid computing does, though. A lot of the tech for Single-System Image in clusters is similar to capabilities cloud vendors developed. These two fields stayed really silod for some reason, though.

      https://en.wikipedia.org/wiki/Distributed_operating_system

      https://en.wikipedia.org/wiki/Amoeba_(operating_system)

      https://en.wikipedia.org/wiki/MOSIX

      https://en.wikipedia.org/wiki/Convergent_Technologies_Operating_System

      Then you have the grid computing platforms. For availability and harnessing various supercomputers, the concept of metacomputing was born. I’m not sure how far back it goes since I didn’t research it. I didn’t think it would be easy to get a lot of raw performance or low latency out of the concept when I studied Beowulf Clusters and grids. I’ll leave that one to you. :)

      Later, people got tired of the complexity and costs that came with the freedom of managing their own infrastructure. There were solutions to different problems that basically simplified things with a bunch of automation. The industry didn’t do that for politics as usual outside a few companies here and there trying to play it smart. The result was a move back to the mainframe model but on commodity hardware and FOSS tech. Definitely an improvement in cost, flexibility, and avoiding lock-in so long as one keeps their code portable and data onsite. Like mainframe vendors, cloud vendors encouraged all in approach sneakily letting incoming data be free with outgoing data costing money. (wink) Mainframes still have security advantages for LPAR’s with availability advantages on top of it. That they change very carefully and slowly with lots of testing helps a lot. There’s mainframes that haven’t accidentally gone down for decades. That plus total, backward compatibility with new stuff incrementally added is why big businesses love them.

      So, you might want to research on mainframes, distributed OS’s, and single-system image tech before putting together your final concept of what happened. The big picture will be spread among them.

      1. 2

        That’s another awesome insight @nickpsecurity, I’ll make sure I’ll research a bit more about this! It’s funny how big inventions are pretty often old things redesigned to address more specific problems :)

      2. 2

        Nice work, putting this together.

      1. 11

        Hey @loige, nice writeup! I’ve been aching to asks a few questions to someone ‘in the know’ for a while, so here goes:

        How do serverless developers ensure their code performs to spec (local testing), handles anticipated load (stress testing) and degrades deterministically under adverse network conditions (Jepsen-style or chaos- testing)? How do you implement backpressure? Load shedding? What about logging? Configuration? Continuous Integration?

        All instances of applications written in a serverless style that I’ve come across so far (admittedly not too many) seemed to offer a Faustian bargain: “hello world” is super easy, but when stuff breaks, your only recourse is $BIGCO support. Additionally, your business is now non-trivially coupled to the $BIGCO and at the mercy of their decisions.

        Can anyone with production experience chime in on the above issues?

        1. 8

          Great questions!

          How do serverless developers ensure their code performs to spec (local testing)

          AWS e.g. provides a local implementation of Lambda for testing. Otherwise normal testing applies: abstract out business logic into testable units that don’t depend on the transport layer.

          handles anticipated load (stress testing)

          Staging environment.

          and degrades deterministically under adverse network conditions (Jepsen-style or chaos- testing)?

          Trust Amazon / Microsoft / Google. Exporting this problem to your provider is one of the major value adds of serverless architecture.

          How do you implement backpressure? Load shedding?

          Providers usually have features for this, like rate limiting for different events. But it’s not turtles all the way down, eventually your code will touch a real datastore that can overload, and you have to detect and propagate that condition same as any other architecture.

          What about logging?

          Also a provider value add.

          Configuration?

          Providers have environment variables or something spiritually similar.

          Continuous Integration?

          Same as local testing, but automated?

          but when stuff breaks, your only recourse is $BIGCO support

          If their underlying infrastructure breaks, yep. But every architecture has this problem, it just depends on who your provider is. When your PaaS provider breaks, when your IaaS provider breaks, when your colo provider breaks, when your datacenter breaks, when your electrical provider blacks out, when your fuel provider misses a delivery, when your fuel mines have an accident. The only difference is how big the provider is, and how much money its customers pay it to not break. Serverless is at the bottom of the money food chain, if you want less problems then you take on more responsibility and spend the money to do it better than the provider for your use case, or use more than one provider.

          Additionally, your business is now non-trivially coupled to the $BIGCO and at the mercy of their decisions.

          Double-edged sword. You’ve non-trivially coupled to $BIGCO because you want them to make a lot of architectural decisions for you. So again, do it yourself, or use more than one provider.

          1. 4

            And great answers, thank you ;)

            Having skimmed the SAM Local doc, it looks like they took the same approach as they did with DynamoDB local. I think this alleviates a lot of the practical issues around integrated testing. DynamoDB Local is great, but it’s still impossible to toggle throttling errors and other adverse conditions to check how the system handles these, end-to-end.

            The staging-env and CI solution seems to be a natural extension of server-full development, fair enough. For stress testing specifically, though, it’s great to have full access to the SUT, and to be able to diagnose which components break (and why) as the load increases. This approach goes contrary to the opaque nature of the serverless substrate. You only get the metrics AWS/Google/etc. can provide you. I presume dtrace and friends are not welcome residents.

            f their underlying infrastructure breaks, yep. But every architecture has this problem, it just depends on who your provider is. When your PaaS provider breaks, when your IaaS provider breaks, when your colo provider breaks, when your datacenter breaks, (…)

            Well, there’s something to be said for being able to abstract away the service provider and just assume that there are simply nodes in a network. I want to know the ways in which a distributed system can fail – actually recreating the failing state is one way to find out and understand how the system behaves and what kind of countermeasures can be taken.

            if you want less problems then you take on more responsibility

            This is something of a pet peeve of mine. Because people delegate so much trust to cloud providers, individual engineers building software on top of these clouds are held to a lower and lower standard. If there is a hiccup, they can always blame “AWS issues”[1]. Rank-and-file developers won’t get asked why their software was not designed to gracefully handle these elusive “issues”. I think the learned word for this is the deskilling of the workforce.

            [1] The lack of transparency on the part of the cloud providers around minor issues doesn’t help.

            1. 3

              For stress testing specifically, though, it’s great to have full access to the SUT, and to be able to diagnose which components break (and why) as the load increases.

              It is great, and if you need it enough you’ll pay for it. If you won’t pay for it, you don’t need it, you just want it. If you can’t pay for it, and actually do need it, then that’s not a new problem either. Plenty of businesses fail because they don’t have enough money to pay for what they need.

              This is something of a pet peeve of mine. Because people delegate so much trust to cloud providers, individual engineers building software on top of these clouds are held to a lower and lower standard. If there is a hiccup, they can always blame “AWS issues”[1]. Rank-and-file developers won’t get asked why their software was not designed to gracefully handle these elusive “issues”

              I just meant to say you don’t have access to your provider’s infrastructure. But building more resilient systems takes more time, more skill, or both. In other words, money. Probably you’re right to a certain extent, but a lot of the time the money just isn’t there to build out that kind of resiliency. Businesses invest in however much resiliency will make them the most money for the cost.

              So when you see that happening, ask yourself “would the engineering cost required to prevent this hiccup provide more business value than spending the same amount of money elsewhere?”

          2. 4

            @pzel You’ve hit the nail on the head here. See this post on AWS Lambda Reserved Concurrency for some of the issues you still face with Serverless style applications.

            The Serverless architecture style makes a ton of sense for a lot of applications, however there are lots of missing pieces operationally. Things like the Serverless framework fill in the gaps for some of these, but not all of them. In 5 years time I’m sure a lot of these problems will have been solved, and questions of best practices will have some good answers, but right now it is very early.

            1. 1

              I agree with @danielcompton on the fact that serverless is still a pretty new practice in the market and we are still lacking an ecosystem able to support all the possible use cases. Time will come and it will get better, but having spent the last 2 years building enterprise serverless applications, I have to say that the whole ecosystem is not so immature and it can be used already today with some extra effort. I believe in most of the cases the benefits (not having to worry too much on the underlying infrastructure, don’t pay for idle, higher focus on business logic, high availability and auto-scalability) overcome by a lot the extra effort needed to learn and use serverless today.

            2. 3

              Even though @peter already gave you some great answers, I will try to complement them with my personal experience/knowledge (I have used serverless on AWS for almost 2 years now building fairly complex enterprise apps).

              How do serverless developers ensure their code performs to spec (local testing)

              The way I do is a combination of the following practices:

              • unit testing
              • acceptance testing (with mocked services)
              • local testing (manual, mostly using the serverless framework invoke local functionality, but pretty much equivalent to SAM). Not everything could be locally tested depending on which services you use.
              • remote testing environment (to test things that are hard to test locally)
              • CI pipeline with multiple environments (run automated and manual tests in QA before deploying to production)
              • smoke testing

              What about logging?

              In AWS you can use cloudwatch very easily. You can also integrate third parties like loggly. I am sure other cloud providers will have their own facilities around logging.

              Configuration?

              In AWS you can use parameters storage to hold sensible variables and you can propagate them to your lambda functions using environment variables. In terms of infrastructure as code (which you can include in the broad definition of “configuration”) you can adopt tools like terraform or cloudformation (in AWS specifically, predefined choice by the serverless framework).

              Continuous Integration?

              I tried serverless successfully with both Jenkins and CircleCI, but I guess almost any CI tool will do it. You just need to configure your testing steps and your deployment strategy into a CI pipeline.

              when stuff breaks, your only recourse is $BIGCO support

              Sure. But it’s kind of proof that your hand-rolled solution will be more likely to break than the one provided by any major cloud provider. Also, those cloud providers very often provide refunds if you have outages given by the provider infrastructure (assuming you followed their best practices on high availability setups).

              your business is now non-trivially coupled to the $BIGCO

              This is my favourite as I have a very opinionated view on this matter. I simply believe it’s not possible to avoid vendor lock-in. Of course vendor lock-in comes in many shapes and forms and at different layers, but my point is that it’s fairly unpractical to come up with an architecture that is so generic that it’s not affected by any kind of vendor lock-in. When you are using a cloud provider and a methodology like serverless it’s totally true you have a very high vendor lock-in, as you will be using specific services (e.g. API Gateway, Lambda, DynamoDB, S3 in AWS) that are unique in that provider and equivalent services will have very different interfaces with other providers. But I believe the question should be: is it more convenient/practical to pay the risk of the vendor lock-in, rather than spending a decent amount of extra time and effort to come up with a more abstracted infrastructure/app that allows switching the cloud provider if needed? In my experience, I found out that it’s very rarely a good idea to over-abstract solutions only to reduce the vendor lock-in.

              I hope this can add another perspective to the discussion and enrich it a little bit. Feel free to ask more questions if you think my answer wasn’t sufficient here :)

              1. 6

                This is my favourite as I have a very opinionated view on this matter. I simply believe it’s not possible to avoid vendor lock-in. Of course vendor lock-in comes in many shapes and forms and at different layers, but my point is that it’s fairly unpractical to come up with an architecture that is so generic that it’s not affected by any kind of vendor lock-in.

                Really? I find it quite easy to avoid vendor lock-in - simple running open-source tools on a VPS or dedicated server almost completely eliminates it. Even if a tool you use is discontinued, you still can use it, and have the option of maintaining it yourself. That’s not at all the case with AWS Lambda/etc. Is there some form of vendor lock in I should be worried about here, or do you simply consider this an unpractical architecture?

                When you are using a cloud provider and a methodology like serverless it’s totally true you have a very high vendor lock-in, as you will be using specific services (e.g. API Gateway, Lambda, DynamoDB, S3 in AWS) that are unique in that provider and equivalent services will have very different interfaces with other providers. But I believe the question should be: is it more convenient/practical to pay the risk of the vendor lock-in, rather than spending a decent amount of extra time and effort to come up with a more abstracted infrastructure/app that allows switching the cloud provider if needed? In my experience, I found out that it’s very rarely a good idea to over-abstract solutions only to reduce the vendor lock-in.

                The thing about vendor lock-in is that there’s a quite low probability that you will pay an extremely high price (for example, the API/service you’re using being shut down). Even if it’s been amazing in all the cases you’ve used it in, it’s still entirely possible for the expected value of using these services to be negative, due to the possibility of vendor lock-in issues. Thus, I don’t buy that it’s worth the risk - you’re free to so your own risk/benefit calculations though :)

                1. 1

                  I probably have to clarify that for me “vendor lock-in” is a very high level concept that includes every sort of “tech lock-in” (which would probably be a better buzz word!).

                  My view is that even if you use an open source tech and you host it yourself, you end up taking a lot of complex tech decisions from which is going to be difficult (and expensive!) to move away.

                  Have you ever tried to migrate from redis to memcache (or vice versa)? Even though the two systems are quite similar and a migration might seem trivial, in a complex infrastructure, moving from one system to the other is still going to be a fairly complex operation with a lot of implications (code changes, language-driver changes, different interface, data migration, provisioning changes, etc.).

                  Also, another thing I am very opinionated about is what’s valuable when developing a tech product (especially if in a startup context). I believe delivering value to the customers/stakeholders is the most important thing while building a product. Whatever abstraction makes easier for the team to focus on business value I think it deserves my attention. On that respect I found Serverless to be a very good abstraction, so I am happy to pay some tradeoffs in having less “tech-freedom” (I have to stick to the solutions given by my cloud provider) and higher vendor lock-in.

                2. 2

                  I simply believe it’s not possible to avoid vendor lock-in.

                  Well, there is vendor lock-in and vendor lock-in… Ever heard of Oracle Forms?

              1. 18

                This history fails to acknowledge the grid computing era which predates cloud computing by several years. One of the goals of grid computing was much like “serverless” functions, ie the ability to have a function run on demand, on any available node in the grid.

                1. 22

                  History begins with the Internet in the world of computing these days. It is an inconvenient truth that virtualization has existed in mainframes since the 1960s.

                  1. 3

                    Definitely true. Maybe I can mention this as well :)

                    1. 2

                      Really?

                      1. 7

                        CP-40 was a research project in 1964 that ran on the 360. IBM released a product from that called VM in 1972. I wouldn’t doubt you could still run it on a z machine. This eventually turned into z/VM, which has a long line of products before it.

                        Edit: Here’s an article from 2009 about it, interviewing one of the people who worked on it.

                      2. 2

                        Any good books/sites/anything to read about this? That’s absolutely fascinating!

                      3. 3

                        Thank you for your comment @patrickdlogan. This is definitely a good hint to improve the article, maybe I can add an extra section to provide this bit of history. I will start to dig some info, so feel free to send me any link you might think to be relevant for this section :)

                        1. 3

                          Probably as good a place as any is this Wikipedia article.

                          https://en.m.wikipedia.org/wiki/Grid_computing

                          1. 1

                            thanks!

                          2. 3

                            Here’s a link to one of the old ones that were easy to acquire:

                            http://toolkit.globus.org/toolkit/

                            Click What Is Globus at bottom left to see some familiar-looking concepts in a chart.

                            1. 2

                              BOINC is a similar thing, also open source.

                              1. 2

                                Thanks :)

                          1. 3

                            Woow congrats to this amazing community, its founders and maintainers :)

                            1. 2

                              Not sure why 3 people flagged it as spam. This is a no profit initiative for web devs, which I think is really relevant for this website. Can you please argument why you think this is spam? Am I violating any rule of this website?

                              1. 1

                                /me raises hand

                                I know what NPM is because people complain about it. Why would I want gulping tasks to be more modular?

                                1. 5

                                  gulp is a build tool. Typically what happens is a gulpfile starts out small and grows into a giant file with tasks depending on other tasks, etc. This tool looks like it solidifies the convention of putting each task in a separate file, which makes it easier to edit each command independently, etc.

                                  1. 1

                                    That’s exactly the point, thanks a million for commenting it out!

                                    Let me know if you are going to try it and what are your thoughts about it! ;)

                                    1. 1

                                      np. Unfortunately I don’t use gulp anymore (webpack and npm/make/shell lately) so I no longer have a need for this tool, although I remember the pain point.

                                  2. 3

                                    To traspone a famous Rasmus Lerdofs’s quote: “There are only two kind of tools, the ones that people don’t use and the ones that people complains about”…

                                    Apart from that, I built the tool because my gulpfiles often grown wild and I am experiment this solution to try to deal with the problem using a different approach. Any good reason why this wouldn’t make sense to you (apart from resembling NPM to you)?

                                    Thanks by the way for giving a shot to the article!

                                    1. 1

                                      I don’t know if your cozy gulp files resemble NPM. I’ve never seen either! The title of this article doesn’t contain any information that helps me understand what it is. If not for the javascript flag it may well have not been English. I understand now that gulp is a build tool, but I have one of those already and I have a good sense of what it’s good and poor at doing. ^_^

                                  1. 3

                                    I’m confused, is he basically saying that working on open source stuff got him a job offer to work on closed source stuff and now he’s making lots of money and working from home?

                                    1. 1

                                      I think the lesson here is that if you have passion and use it a lot than great things might happen. In the case of the IT business it’s even easier, thanks to the great exposure that open source can give you. But anyway when you have passion you don’t do things strictly for the exposure or for the money…

                                      1. 1

                                        Thanks for your comment :) I know rsync but I am not in control of the server in this specific project and the only way I have to update the code is through FTP… I am trying to make as much explicit as possible that this is not a “new fancy deploy technique” but people still seems to complain. Probably they just stop to the title or probably I still need to change something in the very first paragraphs… Any suggestion on this side is really really appreciated!

                                        1. 2

                                          Not complaining so much as puzzled. Reason you provided makes sense. Haven’t seen people use (s)ftp in ages. There are ways to provide restricted ssh access for the purposes of rsync btw.

                                          1. 1

                                            Sorry, it was not an attack to you of course. I just wanted to figure out why people still tell me to use a more modern approach when I am stating as clear as I can that I can’t use anything else apart plain FTP in this very specific case and I wasn’t able to figure out a better FTP-based solution. A lot of people are even insulting me on Reddit, but that’s not a big deal for me honestly, it just make me laugh :) What concerns me is that maybe I am missing some chance to improve the post or to learn some alternative solution, which would be a shame!

                                      1. 1

                                        Didn’t even know this was a possibility! Thanks for the in depth writeup on the methods used. Might have to use this to replace my deployHQ setup.

                                        1. 1

                                          Glad you appreciated the post, but please don’t use this as standard deployment mechanism for new projects. There are far better ways to deploy code (probably your deployHQ is already a better alternative), so use this only when you have a legacy project and when you are not in control of the development/deployment environment.

                                          1. 1

                                            Yep. Just static sites that don’t justify the deployHQ cost. For a new project I’m using a gulp plugin to upload Shopify theme changes as I make them through their API. Has made the development process so much easier.