1. 33
  1.  

  2. 7

    After years of resolving hostnames with gethostbyname(3), it turns out I actually have to call res_init(3) first in order for it to work properly?

    Since when is that a thing?

    1. 13

      I’m wondering the same. Given the number of other applications mentioned with the same kind of issue (at least Pidgin/GAIM and Thunderbird mentioned explicitly in the article) it seems like quite a lot of programmers also don’t know that this is a thing.

      Notably:

      • Beej’s Guide to Network Programming doesn’t mention res_init or resolver. So if, like many people (hi) you picked up Berkeley Sockets by reading Beej… you quite plausibly won’t have heard of these functions.
      • The man pages for gethostbyname, getaddrinfo and so on on at least Linux, FreeBSD, MirBSD, OpenBSD and OS X all mention resolver(3) only in the “See also:” section, amidst a sea of other barely related junk. There is no other indication given that it might be worth looking at.
      • None of the man pages on any of the OSes have anything anywhere in the prose that hints that you might want to sometimes explicitly initialise the resolver(3).
      • None of the example code in any if the documentation hints anywhere at a possibility that one might be expected to initialise resolver(3).
      • I guess this is rather hard to notice by normal testing; who tests network-using programs by starting them while the network is not configured yet?
      • edit: The various OSes man pages for resolver(3) do not mention that it has an effect on the behaviour of getaddrinfo() and friends. You might be able to infer that it will, but there’s nothing saying so explicitly.
      • edit: The API in resolver(3) is very low level, rather discouraging anyone from actually reading it.
      • edit: The glibc man page for resolver(3) even documents res_init() as a deprecated function that you should not call.
      • edit: This is documentation that actively discourages reading any further. I mean that seriously; look at this for a first paragraph (on Linux):

      Note: This page is incomplete (various resolver functions provided by glibc are not described) and likely out of date.

      So overall, this seems like a mild but widespread annoyance that will probably continue appearing in applications for the foreseeable future and users just respond by restarting their apps sometimes.

      edit: Nice one, Ulrich. Yeah that’s an awesome response. Obviously that bug isn’t a problem at all because of the existence of an undocumented workaround.

      1. 7

        Ah yes, the belief that tribal knowledge is sufficient. “The solution exists somewhere, and I know of N people who know this, therefore it is your fault you don’t know it, and we do not need to change anything.” Tribal knowledge is evil. Document everything. Especially when someone comes to you and says “I have this issue right now, there’s no documentation even acknowledging the issue, please help.”

        1. 3

          This is especially true for those N people, because now they will be pestered every time someone has an issue like this or any other unrelated issue that might be solved by tribal knowledge. Documenting common issues makes life easier for everyone.

        2. 5

          Nice one, Ulrich. Yeah that’s an awesome response. Obviously that bug isn’t a problem at all because of the existence of an undocumented workaround.

          There seems to be more traction for fixing this over here.

          1. 3

            Following links from the bug you linked, it seems that Debian patches glibc to automatically deal with this stuff, so it’s not a problem on Debian-based distros.

          2. 8

            Usually no. The fact that getaddrinfo bubbles down into res_init is mostly an implementation detail (artifact). Code that talks directly to res usually ends up being more brittle, in much the same way that “I know sizeof(int) == sizeof(long)” code is brittle. Of course, sometimes there are quirks.

            To wind back the clock a bit, mostly this is the result of certain assumptions, like the fact that the upstream resolver is fixed. resolv.conf was a static file. Then along comes dhcp and wifi and so forth. Ideally the system would just adapt but the problem is people already wrote existing code that fiddled with res_init and res directly. So instead of a clean break, the new dynamic system rewrites resolv.conf from time to time so that legacy code can read it.

            1. 6

              Neat! This is a great example of how a “dumb” legacy behavior grew out of requirements, and only seems dumb if you don’t know all the context.

            2. 2

              It’s not just you. I’ve been writing socket code since the last millennium and this… is not a thing. The custom logic should probably be in gethostbyname with other os-specific logic, but the glibc maintainer is a notoriously bad actor.