Why You Should Stop Overworking Your Email Regex

Front-end validation for entry fields can be a tedious task. Email address forms always seem to prompt a wall of discussion around all the use cases that should be covered. Depending on how many different organizations or geographic regions use the product, validation can get complicated quickly. If you find yourself writing long and complex email regex patterns for one entry form, you may want to consider whether the value you are providing is worth the time and effort.

A First Attempt

You might start structuring your email regex with something simple. You include just alphabetical and numeric characters, an @ symbol, followed by some more alphabetic characters, and a dot.

^[a-zA-Z\d]+@[a-zA-Z\d]+\.[a-zA-Z\d]+$

In my case, manual testing of a user registration flow quickly revealed problems with this simplified structure. I wasn’t able to use the + symbol to alias my email. I would have to create a new email address every time I wanted to test the workflow. Or I could adjust the regex pattern.

But What About…

No matter how many iterations of adjusting you complete, there will probably be more exceptions that you have not worked out yet. Do you enforce a character limit? And what do you do about other non-numeric non-alphabetic characters common in email addresses?

!#$%&'*+-/=?^_`{|}~

If you are developing for a larger corporation, international emails might include non-ASCII characters. Departmental email structures may have different standards regarding subdomains. In that case, it isn’t safe to assume that only one dot should be allowed to follow the @ symbol. Even spaces may be included in certain contexts.

This spiral of caveats can keep piling up until you end up with something like the official RFC 5322 email regex which still doesn’t cover all the bases.

\A(?:[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*
|  "(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
|  \\[\x01-\x09\x0b\x0c\x0e-\x7f])*")
@ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
|  \[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:
(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]
|  \\[\x01-\x09\x0b\x0c\x0e-\x7f])+)
\])\z

Just Keep It Simple

An email entry field should not be the bottleneck for a day’s worth of development work. Even if you spend time working out every case that could apply to the product’s users, they are still able to enter a nonexistent email address.

So stick with simplicity in the frontend, and let backend verification do the heavy lifting. Check for some characters before an @ symbol, the @ symbol, and characters with at least one dot following.

^[^@]+@[^@]+\.[^@]+$

The same can be argued with phone numbers or any other field that can be variant in different contexts.

Overworking Your Email Regex

A level of front-end validation is helpful and important. However, if the number of cases you need to cover is piling up, taking up time that could be more valuable elsewhere, and reducing code readability, just keep it simple.

Conversation
  • OFK says:

    I’ll go one step further with the logic and say that entering a VALID address is the responsibility of the user, if they wish to get the “verify your account” link in the mail, and/or future notification.

    We, developers, need to build robust software, sure, but users DO have some responsibilty about using our software the correct way!
    (Something they sometimes like to forget, how convenient for them).

    And email FORM validation is hardly a security risk, so… if user can’t be bothered to enter a valid email address? Tough luck!

  • Comments are closed.