1. 7

  2. 12

    A silly exercise, IMO. The restrictions on valid forms (e.g. IP addresses are OK, but only if they’re publically routable) are unsurprisingly difficult to express as a regular expression. The version listed as “@stephenhay” is quite reasonable; if you need to restrict it further then parse it and test the components individually.

    1. 7

      In my opinion, this is sort of like email addresses: do the minimum amount necessary to break it into components and then just use them. Don’t waste a lot of time on especially strict validation, as that happens later on anyway. So a “valid” domain is anything you can cram into a DNS query; you’ll never actually encode “does this resolve” into a regex (especially because the answer is constantly changing), so you know whether or not it’s “valid” when your DNS server replies with A or NX, respectively.

      1. 3

        The complicated ones are also wrong now. They reject http://tedu.ninja/ which is totally legit.

      2. 3

        This seems futile. Even if the URL is “legal” that doesn’t say anything as to whether it actually exists. And even if it does currently exist, it can go away at any time.

        And it seems inevitable that such complicated regexes will have bugs that will either accept too much (so what’s the point?) or reject valid corner case valid URLs, which will upset users.