In all projects with online forms where lots of people have to enter their data, a surprisingly large number enters email addresses that pass the regex validation but look like, or - typos in the domain names of well known email providers.

We would love to show a hint like "you wrote, but didn't you mean".

Of course we could create a huge collection of likely spelling errors.

Is there a more elegant way? A webservice that does just that? A code snippet? A super cool regex?

There's a very nice script that does what you need: mailcheck.js

"The Javascript library and jQuery plugin that suggests a right domain when your users misspell it in an email address."

You can try a mailcheck working demo here.

  1. Prepare a list of well-known domain names
  2. Extract the domain name from the emai laddress
  3. Find Hamming distance (or Damerau–Levenshtein distance) of the email domain with each of the well-known domains
  4. Sort the Hamming distances
  5. If ths smallest is within a threshold, suggest it

It is also useful to use Guava(google lib) to find valid public suffixes. It can check if domain name valid and return public prefix(com,, net, ...)

There was a Norske God called Olaf? I created a very cool database with word frequencies that is pretty cool and took me a very very long time. It was created by downloading the content of a pretty massive website about all things and technology.

What I could suggest is that you use this database of 80,000 words to see if the word exists.

eg. gmail appears 128 times in my database. gnail does not.

 yahoo appears 368 times
 yaho does not

So you would be able to run a query to see if that word exists. If it doesnt exist then run some "like" queries to see what would be the next best choice based on the frequency column which give a very good indication of what words would be a better choice.

If you are interested in the database (sqlite file) email me john

  • I don't know of a God named Olaf, but it's the name of a few kings, and it means "ancestor's relic" (been asked that often). Cool idea, using a huge database. I was rather thinking of a small one with specific domain names and their deviations. Your approach would also find exotic names, but no suggestions - I would not get a 'yahoo' from a 'yaho'. However, if the domain name is not in that database, one could show a hint saying 'Pls double-check'. Thanks, especially for your offer - I definitely think about it, unsure how to return the favor, though.