1. 3

    Be careful before using this, it’s trivially vulnerable to server side request forgery (i.e. if your server tries to fetch a user-submitted url they can pass urls or ip addresses on your private network).

    1. 2

      Good point. There should definitely be some care taken with the urls. I don’t think it’s that much of an issue though in this case. Jsdom only accepts html and xml formatted responses. Worst case scenario, it returns all the open graph and twitter tags from an internal document. Accepted urls should be monitored closely but sensitive information really shouldn’t be placed in Open Graph tags.

      1. 4

        I don’t think it’s that much of an issue though in this case. Jsdom only accepts html and xml formatted responses. Worst case scenario, it returns all the open graph and twitter tags from an internal document.

        • even if internal documents shouldn’t have sensitive information in meta tags, you have no guarantee of that. Someone who pulls in this library would have no idea that that’s a requirement, it’s not called out anywhere and even if it was they might miss it. Also why would this library force that restriction on an organization, perhaps it’s very reasonable for them to put some content into a tag, when they consider it to be in the internal network?
        • returning page content isn’t the only bad thing that can happen, so it’s moot that JSDOM expects an html or xml response. There have been many cases of SSRF being escalated to RCE by hitting non-http services (for instance http://blog.orange.tw/2017/07/how-i-chained-4-vulnerabilities-on.html, http://www.kernelpicnic.net/2017/05/29/Pivoting-from-blind-SSRF-to-RCE-with-Hashicorp-Consul.html, and others). Right now it happens to be the case that you don’t have any user input in the body, but there’s no guarantee that won’t change in the future either. Vulnerabilities are introduced like this all the time - it looks safe and then a change is made down the line because no one understands or realizes the assumptions being used.
        • even without RCE risk, internal services can have mutating effects from GET requests. They’re not supposed to, it’s against the RFC, but developers do it all the time. Google and Twitter services built internally have special GET endpoints /quitquitquit and /abortabortabort that kill services - if your package was deployed internally there someone could use it to take down production services.

        Not trying to be harsh - I think it’s a useful library if you make it secure by default. There are existing libraries that do SSRF filtering for you - you could swap in one of those to fetch the page contents and pass that to JSDOM.

        Libraries really have to be secure by default, otherwise developers will mess it up, 100% guaranteed.

        1. 3

          Not trying to be harsh

          That’s really not harsh at all. I’m a student; I learn through conversations like this.

          I see what you mean. It would make sense to have an ssrf option and I’ll work on adding that. I’m pretty sure it’s only ever going to send get requests though and get endpoints that kill services should be heavily protected. I get that “should” doesn’t imply “would” but having unauthenticated get endpoints that kill internal services has to be seen as careless.

    1. 1

      I guess this still has the problem where another thread attempts to concurrently update the document in question. Consider the following sequence, thread 1 uses Fawn, thread 2 uses anything else:

      • T1 saves the document
      • T1 updates the document
      • T2 issues a separate update to the document
      • T1 tries an update, but fails
      • T1 “rolls back” to the save point

      In this case, we’ve lost the change made by T2. Transactions generally protect against this by locking the document (or row) in question until T1 has completed a rollback or a commit.

      1. 1

        You’re right. It’s also mentioned in the MongoDB docs:

        Because only single-document operations are atomic with MongoDB, two-phase commits can only offer transaction-like semantics. It is possible for applications to return intermediate data at intermediate points during the two-phase commit or rollback.

        So it really depends on the use case. Most of the time in an application, you only have one update point per collection. If the app has multiple update points for a single collection, using two phase commits does become sketchy.

        1. 1

          I’ve seen this happen in a variety of ways, most notably an error in a client leads to them sending the same request multiple times in quick succession, or the client has retry logic, or a user hits a button twice…

      1. 4

        Why not just use the other databases that were designed to have transactions from the start instead of bolting it on later?

        1. 1

          There’s also nothing wrong with that.

          1. 1

            …but…shiny. SQUIRREL!

          1. 4

            My concern is that as we add these sorts of features to Mongo, we keep pushing it outside of what it’s good at. One of the things I’ve seen in a similar way, for example, is almost immediately adding some kind of schema mechanism on top of Mongo–schemaless being one of its nominal strongpoints.

            If you need transactions, use a real database.

            Edit: Must. Be. Positive.

            1. 6

              My concern is that as we add these sorts of features to Mongo, we keep pushing it outside of what it’s good at.

              You know, in years of experience cleaning up the use of Mongo at various employers, I’m still not clear on what that is, other than marketing to people who don’t know better.

              To a first approximation, none of the shops who chose to use it seem to have actually wanted a non-transactional document store with data so heterogenous and unreliable that they (should have been but didn’t) write extremely granular key-existence checks for every time they needed to deal with a record.

              All they really wanted was “DB but fast”, which they mistook Mongo for. But yeah, I agree, don’t staple a two-phase commit protocol to Mongo – just use something less ill-suited for anything anyone actually wants in real life.

              1. 2

                Well, if you like using a tool that doesn’t quite have all the features you want, there’s really nothing wrong with adding those features yourself; that’s why the MongoDB docs include a description for two-phase commits as a way to handle transactions. Also, most of these added features (schemas for instance) are implemented via third party libraries so if they fit your use case, great, otherwise you can completely avoid them and use the DB as you see fit.