1. 17
  1.  

  2. 5

    Fun fact: some websites (IGN from the top of my head) only show ads in readability mode. And just like cookie popup websites, it’s a good reminder that you should go someplace else.

    1. 3

      for instance, any node with class skyscraper is considered hidden (maybe the logic is that those elements are shrouded by clouds)

      Is “skyscraper” some kind of design lingo that, as a non-designer, I wouldn’t necessarily know? Like the term hero image?

      1. 3

        It’s a term used to describe a certain ad banner size back in the late 90ies, early 00s.

        I guess the name stuck even now where ads are practically allowed to have any size and do whatever they want with the content they are embedded in

        1. 1

          Ah yes, the tall sidebar ads. I remember them fondly, because they were at least out of the damn way.

      2. 1

        The whole process is a bit heuristical in approach as can be seen in for example hardcoding of site names and attributes, calculation of title similarities using the length difference, and node removal thresholds, but in a crude sense the codebase feels pragmatic. Almost industrial.

        I can attest to this. Trick is to find the place most publishers are least likely to mess with the metadata. For example, you can see the effect of (albeit basic) support for JSON-LD, in the test cases.

        That should also answer the question: What was ever wrong with just <title>?

        Thankfully nobody has decided to call the if statements with arbitrary numeric thresholds machine learning yet.

        Might keep an eye on fathom.