Sure, martin@com is technically valid. Who uses that? No one. Things like multiple @s for routing or UUCP address have long since been deprecated. How many people have tried to register on your service with such an email address and failed? That number is most probably exactly zero.
Case sensitively was certainly a mistake. “Oops I emailed my email to glenn@ instead of Glenn@”. What a horrible idea, and everyone ignoring this part of the RFC is a good thing. Maybe Postfix can add RFCLY_CORRECT or RFC_ME_HARDER.
These RFCs aren’t stone tablets from the mountain; many of them are old, full of legacy cruft, and sometimes contain bad or even outright stupid ideas.
However, at the end of the day, it’s good to remember that your initial regex along the lines of [a-z0-9.-]+@[a-z0-9.-]+.[a-z0-9]+ is quite simply… wrong.
“Wrong” in what way? If the RFC is all you care about and don’t look at anything else, sure. But the world isn’t that simple. Someone registering an account on your service with an invalid email is annoying because you have no way to contact them, and you also don’t want to bother them with a mandatory email verification step (these sort of things tend to lower subscription rates, and thus cost real money). Guaranteeing that your email can actually be delivered is impossible, but we can at least make an attempt.
Things like not allowing IDN or UTF-8 is annoying and I wish we’d finally move to an UTF-8 world, but this too has their reasons; not everything accepts this (yet), and there are some security concerns with homoglyphs as well; martin@ and mаrtin@ are not the same address, and neither are märtin@ and märtin@, or martin@ and martin@. You can probably do all sorts of creative things with various control characters as well.
So that regexp only accepts emails that are guaranteed to be correct, can be accepted by the entire pipeline now and in the future, and doesn’t suffer from some tricky potential security issues. There is some sense in that, and it’s not necessarily “wrong”.
Anyway, what’s really needed is a simplification of this entire circus. The complexities are actually far worse than what’s outlined in this article if you want to parse things like From: and To: headers.
So that regexp only accepts emails that are guaranteed to be correct, can be accepted by the entire pipeline now and in the future, and doesn’t suffer from some tricky potential security issues. There is some sense in that, and it’s not necessarily “wrong”.
I get what you are saying, but that regexp also rejects things such as +, which makes as much sense to me as rejecting - or ., or q for that matter.
i agree with most of the other comments, this seems a bit excessive. the more concerning cases of email validation being wrong that i’ve seen in the wild tend to be whitelists:
blocked because my address used the .pink tld.
blocked because my address used the .io, .co, and .sh tlds. .co and .io in particular seem common enough these days.
blocked because my university .ac.uk address wasn’t on a “whitelist” of “legitimate institutions”. my uni is pretty highly regarded, so i don’t know what their criteria are. i played around with this one, and it accepted only about ten uk university domains.
blocked because i used plus addressing.
blocked because “it looks like you’re using a custom domain. to continue, please use your actual gmail or outlook address”. paraphrased, but it made the assumption that i would be using a mainstream email provider and wanted me to use that address.
blocked because i wasn’t using a gmail address. perhaps it accepted something else too, but the only error i got was “your email is not valid” so i can’t say.
i care way more about this kind of email centralisation and gatekeeping than i do about whether an old rfc says emails should accept multiple @s for routing purposes.
I use + signs in my addresses. I come across so many websites that don’t allow it, it’s insane. There are also websites that don’t allow their name in an address, so something+aliexpress@domain won’t work.
I think it’s actually the RFCs and the old email softwares that are wrong. What actual benefit is there of accepting email addresses that aren’t in the form word@hostname?
We don’t need relay servers any more and trying to do anything with them except error is a bad idea.
Of all the “your e-mail validation is wrong” rants this is the most “well ackshually”.
Yes, it’s true - the author found many (most?) edge cases that even people working with mail servers for 20 years haven’t ever seen in person. Maybe you need 30-40 years experience for that.
There’s blatanly wrong validation and there’s 100% correct validation. And then there’s a sensible middle path that will make you reject one out of a million VALID IN-USE ones. Nobody cares about RFC-valid ones no one has used since the 90s.
Yes, email is more complex than many think, but I don’t agree with all these objections.
/\A[^@\s]+@[^@\s]+\.[^@\s]+[a-z]\z/ is good enough for my use case.
From the point of view of a service which asks for users email addresses to send them messages (which is… probably almost every service which asks for an email address), the point of validation is to catch user error as early as possible: in the registration process. It’s much better to tell a user that their email address is wrong up front, than it is to let them make an account which they later need to go through a recovery mechanism to retrieve because they signed up with the wrong email address.
I have implemented more restrictive email validation rules in a service in response to users getting things wrong, and opening a support ticket to recover their account. Guess how many people have opened a support ticket and said “hey, I can’t register because your service only lets me use an email address with a valid TLD”: none.
The domain name does not need to resolve.
Ok, but I can’t do anything with an email address which doesn’t resolve to anything. If my use case is “I want to email users”, then an address at a nonexistent domain is certainly invalid. Why would I even ask for an email address if I don’t need to send messages to it?
If a user gives a domain which doesn’t exist, it was almost certainly a typo.
But you can also use special domains, such as e.g., the .onion domain, and then even configure your mail server to send mail as a hidden service.
My service isn’t set up to email hidden services so, again, this is an invalid address as far as I’m concerned.
You can have dotless domain names.
But if ICANN now forbids such domains, then there’s a very small number of such email addresses. It seems far more likely that, by requiring a dot, I would catch someone’s unintentional typo than I would forbid a legitimate user.
Your “domain” can be an IP address.
Ok, an IP address isn’t likely to be a typo. It’s suspicious as hell though. Normal people don’t use IP addresses directly, so you’re probably trying something nefarious.
I would like to use ""@jornane.no as my contact address, especially since most clients will add the quotes if the local part would be illegal otherwise (so my mail address can be written as @jornane.no). But sadly any mail client I know about refuses an empty local part between the quotes, even though it is valid according to RFC5321
This is an excellent example of when look-before-you-leap designs are inadequate; there’s no way to validate an email address without sending a test email.
there’s no way to validate an email address without sending a test email.
Since the post says that an email address doesn’t even need to use a resolvable domain, it seems that even sending a test email wouldn’t satisfy the author.
I’m pretty sure the comment you replied to considers that the email contains an action that the user must execute, a link to click, etc. I’ts not the validity of the send operation that is considered, but that of it’s side effect.
Sure,
martin@com
is technically valid. Who uses that? No one. Things like multiple@
s for routing or UUCP address have long since been deprecated. How many people have tried to register on your service with such an email address and failed? That number is most probably exactly zero.Case sensitively was certainly a mistake. “Oops I emailed my email to glenn@ instead of Glenn@”. What a horrible idea, and everyone ignoring this part of the RFC is a good thing. Maybe Postfix can add RFCLY_CORRECT or RFC_ME_HARDER.
These RFCs aren’t stone tablets from the mountain; many of them are old, full of legacy cruft, and sometimes contain bad or even outright stupid ideas.
“Wrong” in what way? If the RFC is all you care about and don’t look at anything else, sure. But the world isn’t that simple. Someone registering an account on your service with an invalid email is annoying because you have no way to contact them, and you also don’t want to bother them with a mandatory email verification step (these sort of things tend to lower subscription rates, and thus cost real money). Guaranteeing that your email can actually be delivered is impossible, but we can at least make an attempt.
Things like not allowing IDN or UTF-8 is annoying and I wish we’d finally move to an UTF-8 world, but this too has their reasons; not everything accepts this (yet), and there are some security concerns with homoglyphs as well; martin@ and mаrtin@ are not the same address, and neither are märtin@ and
märtin@
, ormartin@
andmartin@
. You can probably do all sorts of creative things with various control characters as well.So that regexp only accepts emails that are guaranteed to be correct, can be accepted by the entire pipeline now and in the future, and doesn’t suffer from some tricky potential security issues. There is some sense in that, and it’s not necessarily “wrong”.
Anyway, what’s really needed is a simplification of this entire circus. The complexities are actually far worse than what’s outlined in this article if you want to parse things like
From:
andTo:
headers.I get what you are saying, but that regexp also rejects things such as
+
, which makes as much sense to me as rejecting-
or.
, orq
for that matter.Yeah, rejecting
+
is too strict.i agree with most of the other comments, this seems a bit excessive. the more concerning cases of email validation being wrong that i’ve seen in the wild tend to be whitelists:
.pink
tld..io
,.co
, and.sh
tlds..co
and.io
in particular seem common enough these days..ac.uk
address wasn’t on a “whitelist” of “legitimate institutions”. my uni is pretty highly regarded, so i don’t know what their criteria are. i played around with this one, and it accepted only about ten uk university domains.i care way more about this kind of email centralisation and gatekeeping than i do about whether an old rfc says emails should accept multiple @s for routing purposes.
I use
+
signs in my addresses. I come across so many websites that don’t allow it, it’s insane. There are also websites that don’t allow their name in an address, sosomething+aliexpress@domain
won’t work.I think it’s actually the RFCs and the old email softwares that are wrong. What actual benefit is there of accepting email addresses that aren’t in the form word@hostname?
We don’t need relay servers any more and trying to do anything with them except error is a bad idea.
plz don’t share my email, thx
Of all the “your e-mail validation is wrong” rants this is the most “well ackshually”.
Yes, it’s true - the author found many (most?) edge cases that even people working with mail servers for 20 years haven’t ever seen in person. Maybe you need 30-40 years experience for that.
There’s blatanly wrong validation and there’s 100% correct validation. And then there’s a sensible middle path that will make you reject one out of a million VALID IN-USE ones. Nobody cares about RFC-valid ones no one has used since the 90s.
Yes, email is more complex than many think, but I don’t agree with all these objections.
/\A[^@\s]+@[^@\s]+\.[^@\s]+[a-z]\z/
is good enough for my use case.From the point of view of a service which asks for users email addresses to send them messages (which is… probably almost every service which asks for an email address), the point of validation is to catch user error as early as possible: in the registration process. It’s much better to tell a user that their email address is wrong up front, than it is to let them make an account which they later need to go through a recovery mechanism to retrieve because they signed up with the wrong email address.
I have implemented more restrictive email validation rules in a service in response to users getting things wrong, and opening a support ticket to recover their account. Guess how many people have opened a support ticket and said “hey, I can’t register because your service only lets me use an email address with a valid TLD”: none.
Ok, but I can’t do anything with an email address which doesn’t resolve to anything. If my use case is “I want to email users”, then an address at a nonexistent domain is certainly invalid. Why would I even ask for an email address if I don’t need to send messages to it?
If a user gives a domain which doesn’t exist, it was almost certainly a typo.
My service isn’t set up to email hidden services so, again, this is an invalid address as far as I’m concerned.
But if ICANN now forbids such domains, then there’s a very small number of such email addresses. It seems far more likely that, by requiring a dot, I would catch someone’s unintentional typo than I would forbid a legitimate user.
Ok, an IP address isn’t likely to be a typo. It’s suspicious as hell though. Normal people don’t use IP addresses directly, so you’re probably trying something nefarious.
I would like to use
""@jornane.no
as my contact address, especially since most clients will add the quotes if the local part would be illegal otherwise (so my mail address can be written as@jornane.no
). But sadly any mail client I know about refuses an empty local part between the quotes, even though it is valid according to RFC5321This is an excellent example of when look-before-you-leap designs are inadequate; there’s no way to validate an email address without sending a test email.
Since the post says that an email address doesn’t even need to use a resolvable domain, it seems that even sending a test email wouldn’t satisfy the author.
I’m pretty sure the comment you replied to considers that the email contains an action that the user must execute, a link to click, etc. I’ts not the validity of the send operation that is considered, but that of it’s side effect.
Use the HTML5 input email address validation regex: https://html.spec.whatwg.org/multipage/input.html#email-state-(type=email)
This might be a weird use case but for internal mails you don’t even need the @domain part.
So “user1” or “root” is a valid email address. Verify that!?