1. 6

  2. 6

    Twitter seems to have ~211M daily active users and this says 400 billion events per day.

    I also found a number of close to 10k tweets per second, so that’s 812M tweets per day, averaging 4 tweets per user.

    That’s 2000 events per daily active user or 500 events per tweet. Maybe stop doing so many analytics things? Oh wait, if the goal isn’t actually letting people communicate with each other…

    1. 1

      While I liked your summary, does this include retweets as tweets? How about likes?

      If 500 people on average like a tweet, or perhaps if hundreds of thousands like some tweets driving the average up, then you can get closer before analytics.

      1. 1

        I know my reply was tongue in cheek and half-hearted, but I actually thought about this for a moment.

        Let’s assume you have a typical pub-sub infrastructure for the events, how do you count? If one tweet is sent to 100 followers you can now either say “I have one event that is sent to 100 consumers” or “I have 100 events”. I’m not saying either way of counting is wrong, but without further details you can easily tune your numbers to say you do “1 billion events” when in reality you send 100 million events to 100 subscribers. It’s complicated ;)

        1. 1

          I think this article is from the data team, so they’re more interested in being able to run queries later. (I assume an “event” is anything they’re interested in knowing, which can extend to non-tweet events like following.) For the actual infrastructure, it was generally measured in tweets (posts) per second, and fan-out rate (how much downstream work does each post generate).

          1. 1

            Fair, but (and I fully agree that this is an edge case) many people, including me, use it as a messenger/microblog with public posts and are still kinda dumbfounded about their scale and bespoke architecture when all we do could indeed be done with the thing they had when we signed up 12y ago, a simple Rails app ;)

    2. 3

      generate petabyte (PB) scale data every day

      And then somebody asks me why I laugh when the IT industry talks about sustainability.

      Looks like twitter gave up and went the “throw stuff at google” route.

      1. 2

        I’d always assumed that Twitter had some magical in-house system running for full-text search. But if the data is backed by BigTable then I wonder how it works? They have a bunch of operators that they support both in the Search API and in their GNIP/Powertrack rules, such as “[term 1] within [n words] of [term 2]”.

        1. 1

          Interesting that they’re starting to offload some of this data work to gcloud now.