1. 25
  1.  

  2. 4

    It’d be interesting to compare httpx here: https://www.python-httpx.org/

    1. 1

      I’ll add this tonight! Looks great, hadn’t heard of it.

    2. 2

      Great article! I might try async requests sometime.

      When using Requests, I throttled to avoid overwhelming remote server. Like in BASIC days, I just used “sleep” to guarantee a minimal amount of time between requests. Has the advantage that it’s less likely to be buggy or have unpredictable behavior than my custom, throttling code.

      1. 4

        Adding sleep is good, especially if you’re trying to remain undetected.

        However people typically gravely underestimate what a server can handle. Even at home servers can handle 10k connections/s if configured properly.

        There’s something to be said about being nice, but in general I say that you can hit things as hard as you want and the server won’t stutter.

        1. 2

          There’s something to be said about being nice, but in general I say that you can hit things as hard as you want and the server won’t stutter.

          That works fine when there is simple rate-limiting and tracking on the server end. When you are dealing with larger APIs, or services that might be sensitive to request rates (e.g. LinkedIn), then you need to be aware of how they may take action later. Your client may appear to be working the first time around and then you get blocked later. It is worth understanding more about the service you are making requests to and taking a cautious approach because the response may be more sophisticated than you are prepared to deal with.

          1. 1

            you get blocked later

            Yes please always check this first, you don’t want to run into captcha requests (yt-dl..)

      2. 2

        It’s worth looking at using a forward proxy (eg trafficserver, which is widely packaged in distros) to limit external connections and do other things. As this is something they optimise specifically they are usually faster and lower overhead at doing it and they can even do SSL MITM with some configuration.

        It’s also worth looking at uvloop - I would use that in preference to the built in loop under all circumstances.

        1. 1

          If you use an async library such as gevent or eventlet, you didn’t need to write further code than your second version. They provide their own threading module with a compatible API.

          1. 1

            http://pycurl.io works REALLY WELL at this too. ;)

            1. 1

              I really like Trio. There is a good implementation of HTTP client https://github.com/python-trio/hip