1. 17

  2. 5

    Looks like example of this is Instagram’s “explore” feed. 2-3 years ago it was really interesting and showed relevant and weird pics for me. Then it suddenly started to show only things I especially hate: non-thematic video bloggers, cars, dogs, hunting, rap and rappers’ fashion. Now it consists of “average dull” content, like ubiquitous video bloggers that dye their hair every day, highly-promoted “funny cats” and reviews of decorative cosmetics. This might be just because they removed machine learning altogether, replacing it with simpler “what’s popular” algorithm, but the same also happened to ads in FB ad network.

    1. 7

      The client is the advertiser, they care about how much inventory they can fill.

      1. 1

        AFAIK, there’s no option for buying shows on Explore, at least in public interface for advertisers, at least last time I looked at it. Most of content that I see there is non-strictly-commercial, just of topics that are opposite of what I’m interested in.

        1. 1

          But those topics might be the most valuable ones for them, right? Like, if no one buys ads targeted at niche X, then Instagram doesn’t have any incentive to promote niche X.

      2. 4

        I feel like often when a model goes off the rails, it’s because the loss function being optimized isn’t appropriate for the domain. What’s popular tends to be what’s recommended if the algorithm doesn’t think it has enough data on you. But that’s a pretty weird situation if you’ve been on the platform for years and the recommendations used to be good.

        1. 2

          Just speculating here, but are you privacy-conscious when using the Internet? If so, there might not be enough “hooks” in your behavior that Instagram can latch onto.

          1. 1

            I’m using adblocker, and it cuts out Facebook buttons, but I’ve been using it for last 5-7 years, and recommendations worked before, and not on all sites, and no fancy things like separation of cookies on different tabs. But as I understand, website visits is not the most important data and I leave more information on the Instagram itself, and on other places where I’m logged in. Also, mostly only mainstream news websites has Facebook trackers, what’s linked here at Lobsters usually have Google Analytics at most, and I doubt Facebook can buy data from Google.

            1. 1

              Thanks for the clarification and expansion.

              Instagram in its entirety for me represents “digital sharecropping” in its most extreme form so far. There’s barely any pretense to serve the needs of its users - the people producing the content. The only entities who matter are the advertisers.

              1. 1

                How it’s different from, say, good old Flickr? Users generate content, platform shows advertisements too. The only ethical advantage of Flickr is that it has field to mark your photos under CC licenses, so others can use your photos. For non-CC photos, it even has anti-download measures, almost like Instagram. Instagram has far worse community, being mainstream and mobile-oriented, but it’s not related to advertising model.

                Almost every text/photo/video publishing platform, except maybe Medium works that way. But Medium is much worse and shady as hell.

                1. 2

                  Flickr vs IG is a difference in degree, not in kind, I agree.

                  I concede that IG probably has an order of magnitude more users than Flickr at its peak.

                  Below is a list of stuff that Flickr has that Facebook lacks:

                  User control
                  • Users can organize their photos in groups and sets, and post to other groups. Group owners can set rules.
                  • Relevant tags - tags can be for the user only, not signal intent like on IG
                  • Rich API - I have Lightroom integration, have plethora of apps to choose from, I’ve written my own script to download my images
                  • Original size images - with access control
                  • follower access control - friends/family can have more rights than the general public

                  I wish I had numbers on this, but I’ve heard it reported that Flickr is/was self-sufficient from paid users (I have a paid account). Advertising is extra, but not the driving force.

                  Flickr is (now) owned by SmugMug, another image hosting site with paid tiers. Their business model is to host images, facilitate print sales, give photographers a platform etc.


                  IG now is a social/news network with images.

                  Flickr is an image hosting platform with social features.

                  For me as a photographer, I much prefer the level of control Flickr gives me. However I am not a seller of photos. For those that are, IG is absolutely essential, and its constant changing of timeline sorting, visibility algorithms etc. screws those photographers over. But that’s fine, because those photographers are not the customer of IG.

                  (Edit spelling)

        2. 5

          I just wrote an article two days ago just trying to get down some thoughts here. Even without stark changes to the data like was suggested (stylishly!) here, model performance often degrades over time.


          1. 1

            You should totally submit that article.

          2. 2

            I had (mis)fortune of deploying a DL model to production in 2017. It was an image classifier and we almost entirely relied on home-built benchmark dataset for quality check. Maintaining benchmark took significant effort.

            1. 1

              This reminds me of a thing I saw explained a while ago (can’t remember by whom, sorry). There’s a similar problem that can happen with code implementing generic optimisation algorithms like hill-climbing or Newton Raphson: it’s possible to accidentally break your code in a way that causes it to approach the goal much slower than it ought to, without actually breaking it completely. e.g. if you accidentally stop mating candidate solutions with one another in a genetic algorithm, it’ll still tend to climb towards the goal really slowly, acting like a bad slow simulated annealing implementation.

              1. 2

                There is definitely a related thing where it’s hard to debug optimization algorithms since if you mostly got the details right it still find the local/global optimum. It just is way slower than it should be.