Having been burned by many of these over the course of 15 years of professional work, going through the various RFCs, implementations, and widely-accepted “facts”, the only certainty I’m willing to accept about email addresses is that it must contain an @ symbol.
According to RFC correct e-mail address must contain at least one @ that is not directly surrounded by @ signs. So what I use is [^@]@[^@] and it should match 100% of valid email addresses.
Regarding the syntax of email addresses: it’s an unholy mess. And I strongly urge people to simply not bother trying to support everything that’s allowed by the RFCs, and instead to follow the lead of HTML5, which introduced a simplified and sensible validation rule, including a reasonable sample regex, for its input type="email".
It’s also worth reading the explanatory note for why HTML5 did this, which I quote here:
This requirement is a willful violation of RFC 5322, which defines a syntax for email addresses that is simultaneously too strict (before the “@” character), too vague (after the “@” character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.
This validation rule is already applied automatically by web browsers when you set type="email" on a form input, so you can just do that and hand off the blame to the browser and the spec if someone tries to complain “but it’s technically legal!” about their pet email address that the HTML5 rule rejects.
And one of the great advancements from adopting this validation rule is that anything which passes is guaranteed to have exactly one @ character occurring in it and that @ will cleanly separate the local-part and the domain (while a “but it’s technically legal!” RFC-based address may have multiple such characters cleverly smuggled in and requires complex and specialized parsing to separate out the local-part and the domain).
You should be careful not to use the perl regex that’s linked in the html5 reference because it excludes Unicode characters in email addresses. Fortunately, browsers don’t follow that part of the spec and are more relaxed about what they accept, so input type=email is still good.
For example because the address contains your name and your name has characters that are not in ASCII. People are now used to transliterate their names into ASCII, but this is not a great thing.
This regex works okay: /^[^@]+@[^@]+$/ and is easily understood.
Here’s a less permissive function I wrote for validating email once. It just verbosely assembles a regex:
export const isValidEmail = (email: string) => {
// This overly complicated RegExp was written by Colin Caine as a bit of a joke.
// It validates that a given email address is not obviously wrong while permitting
// non-ascii mailbox addresses and domains (most regex for this task don't do this,
// including the one in the HTML Spec).
//
// Valid emails using very rarely used features of email, like quoted mailbox
// addresses or comments will not be validated and that is intentional.
const atext_ascii = "[A-Za-z0-9.!#$%&'*+/=?^_`{|}~-]"
const nonascii = "[^\u0000-\u009f]" // Excludes ascii and C1 control codes
const atext = `(?:${atext_ascii}|${nonascii})`
// IDNA domains would be impossible to validate properly with regex.
// This is designed to exclude some invalid ASCII domains but does not attempt
// to validate non-ascii characters in domains
const let_dig = `(?:[a-zA-Z0-9]|${nonascii})`
const ldh_str = `(?:[a-zA-Z0-9-]|${nonascii})`
const label = `${let_dig}(?:${ldh_str}{0,61}${let_dig})?`
const domain = `${label}(?:\\.${label})*`
const email_re = RegExp(`^${atext}+@${domain}$`, 'u')
return email_re.test(email)
}
I don’t make any particular guarantees of correctness, but it does pass these tests:
If you send a url in an email, the user will be the first to click it
Tread carefully with magic links - quite a few email cients (outlook online, gmail) visit any included URLs. So a link valid once will be invalidated as soon as the email hits the gmail server.
Email addresses have at least 2 characters at the left of the @.
Meaning, e@example.com is invalid.
The first culprit I saw was phpBB, which refused my email address because I included a single letter on the left (the right being my own domain name, so I didn’t want to be redundant). I have since given up and now use full words (generally my first name).
I’ll add: that the user will access e-mails on the same machine they use for web browsing.
I have multiple computers, and not all of them have e-mail set up (intentionally due to security, since an e-mail inbox is a key to the whole digital kingdom). It’s annoying when sites sent me links via e-mail and expect me to continue my browsing session from there. That link will end up on another machine! And it will probably be doubly inconvenient due to the need to send passwords over to log in again.
Having been burned by many of these over the course of 15 years of professional work, going through the various RFCs, implementations, and widely-accepted “facts”, the only certainty I’m willing to accept about email addresses is that it must contain an
@
symbol.And even then, I’m suspicious.
According to RFC correct e-mail address must contain at least one
@
that is not directly surrounded by@
signs. So what I use is[^@]@[^@]
and it should match 100% of valid email addresses.Perhaps some day you might be asked to support bang paths? 😅
(Fortunately they don’t count because they aren’t usable on the modern internet, but it’s still amusing / worrying.)
Regarding the syntax of email addresses: it’s an unholy mess. And I strongly urge people to simply not bother trying to support everything that’s allowed by the RFCs, and instead to follow the lead of HTML5, which introduced a simplified and sensible validation rule, including a reasonable sample regex, for its
input type="email"
.It’s also worth reading the explanatory note for why HTML5 did this, which I quote here:
This validation rule is already applied automatically by web browsers when you set
type="email"
on a form input, so you can just do that and hand off the blame to the browser and the spec if someone tries to complain “but it’s technically legal!” about their pet email address that the HTML5 rule rejects.And one of the great advancements from adopting this validation rule is that anything which passes is guaranteed to have exactly one
@
character occurring in it and that@
will cleanly separate the local-part and the domain (while a “but it’s technically legal!” RFC-based address may have multiple such characters cleverly smuggled in and requires complex and specialized parsing to separate out the local-part and the domain).You should be careful not to use the perl regex that’s linked in the html5 reference because it excludes Unicode characters in email addresses. Fortunately, browsers don’t follow that part of the spec and are more relaxed about what they accept, so input type=email is still good.
Why would you use unicode in an email address? Do you have the “fixed” regex available somewhere?
For example because the address contains your name and your name has characters that are not in ASCII. People are now used to transliterate their names into ASCII, but this is not a great thing.
This regex works okay:
/^[^@]+@[^@]+$/
and is easily understood.Here’s a less permissive function I wrote for validating email once. It just verbosely assembles a regex:
I don’t make any particular guarantees of correctness, but it does pass these tests:
Yes, I know it would be more efficient to construct the regex only once.
Tread carefully with magic links - quite a few email cients (outlook online, gmail) visit any included URLs. So a link valid once will be invalidated as soon as the email hits the gmail server.
I have one to add:
@
.Meaning,
e@example.com
is invalid.The first culprit I saw was phpBB, which refused my email address because I included a single letter on the left (the right being my own domain name, so I didn’t want to be redundant). I have since given up and now use full words (generally my first name).
apparently having a dot is another. apparently ai (the tld itself) has an mx record. so the email could be a@ai …
I’ll add: that the user will access e-mails on the same machine they use for web browsing.
I have multiple computers, and not all of them have e-mail set up (intentionally due to security, since an e-mail inbox is a key to the whole digital kingdom). It’s annoying when sites sent me links via e-mail and expect me to continue my browsing session from there. That link will end up on another machine! And it will probably be doubly inconvenient due to the need to send passwords over to log in again.
Compared to the other “falsehoods” lists I’ve read; this one is pretty tame. Nothing on the list that isn’t fairly obvious.
That’s why it’s so depressing: we can guess that each of those items in this list was actually assumed true by enough systems to be noticed.