1. 7
  1.  

    1. 3

      The link is specifically to the part about FixProxy — a stand-alone proxy with a (initially limited but hopefully increasing in the future) set of scripts to convert JS-dependent websites into plain HTML pages with the content.

      1. 3

        I was thinking about this the other day, a “YouTube-dl” for news or articles that would acquire the actual content of the page but remove JavaScript and various unnecessary(?) parts of the HTML. I wrote a custom piece of code for The Old New Thing which archived the content and comments but removed all javascript but never extended this to any other site. Maybe FixProxy is what I need. There was also some Firefox extension called GreaseMonkey but it appears to be no longer as popular now.

        1. 1

          I don’t even remember if GreaseMonkey-injected JavaScript works with normal JavaScript disabled…

          In my scraping code I have a few script-circumvention fixes, and also some curl-impersonate delegation, as there are sites that will provide reasonable HTML but only if the client is similar enough to a recognised browser. I don’t really clean up the HTML, but I convert it to a text dump and read that.

      2. 2

        Yup, I saw this in HN some days ago. It’s been in my plans for a while, but hopefully this works well enough for my purposes.

        If you use NoScript for browsing, it’s infuriating how many websites that absolutely do not need JS render as blank pages. And if you active NoScript, well, they require a beefy CPU to move.

        One of the offenders that rub me worst is Mastodon, because of its federated nature, adding NoScript exceptions is a pain. I kinda pondered contributing to some Mastodon client to be able to navigate through threads. (Tusky on Android kinda does this.) I don’t know it’s because my account is on a Takahe server, or if this happens to Mastodon users too, but I tend to get partial threads locally, which means if I want to reply, I’d better check the upstream thread to see if I’m repeating what someone already said.

        1. 3

          I don’t know it’s because my account is on a Takahe server, or if this happens to Mastodon users too, but I tend to get partial threads locally, which means if I want to reply, I’d better check the upstream thread to see if I’m repeating what someone already said.

          This is a known issue across most/all fedi implementations - the only messages you get in threads are messages that your instance holds for some other reason, for example because someone on your instance follows the author. There’s an outstanding PR to Mastodon upstream to explicitly fetch all replies, and I imagine in time the same approach will be duplicated by other implementations.

          1. 3

            One more thing to know is: an impressive number of pages render as blank if you drop JS… and render just fine if you drop both JS and CSS.

            1. 2

              Interesting. I’ve not validated this, but sometimes Firefox shows me Reader Mode in what looks a blank page… and it works. It might be a symptom of this phenomenon.