Tuesday, August 12, 2008

Email validation using PHP regular expression

Regular expression is a wonderful concept, using this we can play with strings in PHP. The practical use of regular expression is:

* Email validation
* validating domain name
* validating post code format etc.

There are two types of regular expression functions:

1. ereg functions or POSIX extended regular expression function, which is the standard functions for PHP
2. preg functions or perl Compatible regular expressions

For this email validation program, we are going to use PHP’s standard regular expression function "ereg".

Before entering the email validation program, its better to refer few basic syntax about regular expression functions.



\s -> this means an empty white space
^ -> this means start of a string
$ -> this means end of a string
. -> this means any character
(cat|mat) -> this means cat or mat
[0-9] -> this means all numbers from 0 to 9 inclusive
[a-z] -> this means all lowercase letters from a to z inclusive
[A-Z] -> this means all uppercase letters from A to Z inclusive
[^a-z] -> this means no occurrence of lowercase letters from a to z inclusive, the hat symbol (^) inside the sets denotes "not"
? -> this means zero or one of the proceeding characters
* -> this means zero or more characters
+ -> this means one or more characters
{3} -> this means exactly three characters
{3,} -> this means three or more characters
{3,6} -> this means 3 to 6 characters, it may be 3 or 4 or 5 or 6

Here is the php source code for email validation using php regx function:


$email = "myemail@mydomain.com";

if (ereg(‘^[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.([a-zA-Z]{2,4})$’,$email)
{
echo ‘Valid email id’;
} else {
echo ‘Invalid email id’;
}
?>

Now I am going to split the string pattern into 3 parts, ie

1] ^[a-zA-Z0-9._-]+@
2] [a-zA-Z0-9._-]+\.
3] ([a-zA-Z]{2,4})$

1]^[a-zA-Z0-9._-]+@

Here ^ symbol denotes that this is the start of the email part.
a-zA-Z0-9._- denotes combination of characters to form the username section of an email
+ symbol denotes, it should have 1 or more characters from the proceeding sets
@ symbol is a default symbol between username and domain name parts.

2] [a-zA-Z0-9._-]+\.
a-zA-Z0-9._- denotes combination of characters to form the domain name without the tld
+ symbol denotes, it should have 1 or more characters from the proceeding sets
\. symbol denotes, the dot operator proceeding to the tld, here we are using \ to escape

3] ([a-zA-Z]{2,4})$

a-zA-Z denotes the combination of chars to form the tld name
{2,4} denotes the length of the tld may be between 2 to 4 characters
$ denotes the end of an email.

No comments: