1. 7

  2. 4

    Heck, I might even start using an RSS reader again. Is Google’s still the best one out there?

    Too soon, man :(

    1. 3

      How did you handle duplicates in this analysis? Many posts get submitted multiple times and it’s not clear if you counted those as different or combined them.

      1. 3

        I didn’t do any sort of deduplication, so articles that were submitted multiple times were considered distinct in the analysis. I think that makes sense for the average/mean/median scores, but perhaps the duplicates should have been subtracted from the total article count for each blog. I just did a quick pass over the data and it looks like 92.4% of the URLs submitted to Hacker News are unique. When limiting that to the submissions that were identified as blog articles, the fraction is only slightly higher at 93.5%. I don’t think that should make that much of a difference, but you still raise a good point!

        1. 2

          I would expect a really high percentage of URLs to be unique because of spam (and articles that are so low-quality as to be almost indistinguishable from spam). I don’ t mean to make work for you, but what does it look like if you only consider articles that get 5 upvotes? Or reach the front page?

          1. 3

            97.1% of blog submissions that get at least 5 upvotes are unique and that rises to 98.2% for submissions that get at least 10 (which is the number that I like to use as an approximation for making the front page). This is probably partially caused by even really great articles having a good chance of never getting upvoted though (in addition to spammy submissions being removed).

      2. 3

        A very Pareto optimal post :)

        I would question whether or not the approach taken is suitable for finding “good” blog posts. Hacker News gets gamed by plenty of people. There’s also a cult of multiple personalities that seems to take hold with people regularly getting their blog posts submitted because their posts will be guaranteed to be upvoted. It guarantees votes for the poster to be first to submit a Gabriel Weinberg or Daring Fireball link, regardless of quality.

        Still, the Pareto approach is really well explained here, and it’s a shining example of the difference between a HN optimal and good post IMHO :)

        1. 3

          Thanks! I completely agree about the cults of personality. That was my motivation for the second list of posts where I restricted the maximum number of distinct submitters for a blog. It significantly limited the number of candidate blogs, but it did effectively eliminate the blogs that people race to submit.

        2. 2

          nice clear explanation of pareto optimality