1. 16
  1.  

  2. 3

    Interesting article thank you for posting!

    I then estimate the latency between any two points using the following formula

    What is the basis for having confidence in this formula? Rather than using all of the WonderProxy data maybe you could use 60% of WonderProxy locations to determine the “training” accuracy of this model, then the other 40% to evaluate the “testing” accuracy. That way you can try other models using only the training data to find better models and evaluate their accuracy. Splitting up by location will show you if the model is successfully generalizing across locations.

    With the above formula, we can pick a specific location in the world, estimate its latency to the rest of the world and plot the results as a map. I will call this a “latency map” for the rest of the article.

    By rest of the world do you mean all the WonderProxy endpoints? Or all the other estimated locations of AWS regions? Probably the former!

    With this ability in hand, we are only left with a search problem of finding the best set of regions according to worldwide median latency.

    Earlier in you were using the 95th percentile, why switch to the 50th percentile here? Maybe you could do all your calculations on both p50 and p95; might be interesting?

    Thanks for the pointer to the WonderProxy dataset, I’ll have fun thinking up use cases for it!

    1. 2

      Thank you for your comments!

      What is the basis for having confidence in this formula? Rather than using all of the WonderProxy data maybe you could use 60% of WonderProxy locations to determine the “training” accuracy of this model, then the other 40% to evaluate the “testing” accuracy.

      You are right, I do not have a way to measure the reliability of this approach. Your idea of separating it to a training/testing dataset is great, thank you for that!

      I also believe that the reliability of latency data is the weakest part of this experiment. It’s old, does not have perfect coverage, and even then it’s only sampling a single day, and likely transient network issues are present on the dataset.

      By rest of the world do you mean all the WonderProxy endpoints? Or all the other estimated locations of AWS regions? Probably the former!

      By rest of the world, I do mean every single point on the world. This can be a WonderProxy endpoint, AWS region, or any random location. The estimation formula works on any location pairs; they first find the closest WonderProxy endpoints, and extrapolate the latency between WonderProxy endpoints using the distance difference.

      Earlier in you were using the 95th percentile, why switch to the 50th percentile here? Maybe you could do all your calculations on both p50 and p95; might be interesting?

      This was a mistake, the entirety of this article is using 95th percentile. I fixed the typo.

      But you are right that the choice of p50 vs p95 matters. Initially I used p50, but there the differences between different AWS regions were much less pronounced. I believe this is because p50 neglects quite a bit of world population that live away from population centers (South America, Australia/New Zealand etc.). So I felt like p95 would be “fairer”. You are right that this would be an interesting thing to experiemnt on.

      1. 2

        You could sample latency data via RIPE Atlas, though that would require running your own probe to gain the credits if you wish to gather data directly on AWS zones. But since all measurements are public, you can extract data in a similar manner to the WonderProxy results, with enough data to train a fairly sophisticated model.

    2. 1

      Hey, author here! I’m happy to answer any questions, or hear your suggestions/fixes or requests.