Lesson 17 - Regular ExpressionsRegular Expressions are one of the trickiest things to learn. There are a lot of components to it, but it can at the same time be very strong. A regular expression is an expression which lets you match an arbitrary strong, dissect it, and check it for validity. A regular expression uses as set of characters to match strings someone inputs to them. For those of you that have used DOS or the Unix shell, "dir *.txt" (for DOS) or "ls *.txt" are both regular expressions which ask that the dir/ls commands only return strings that end with ".txt" and have "any other character" before them.
Why would you want to use regular expressions in you're scripts? The biggest reason would be to validate what a user inputs into fields in a HTML form and submits to your PHP script. I won't go into the negatives, but for example, if you had the HTML field "age", you would only expect the user to input a number. If the user inputs anything other than numbers, you don't want that information to go into your database. You can use regular expressions to validate what the user inputs in the "age" field, and if they type in something bad, you can warn them.
The six basic simple characters used in regular expression are:
Pattern: a* Matches: '', 'a', 'aa', ... Explanation: match "a" zero or more times
Pattern: b+ Matches: 'b', 'bb', ... Explanation: match "b" one or more times
Pattern: ab?c Matches: 'ac', 'abc' Explanation: match "a" followed by "b" optionally and then "c"
Pattern: [abc] Matches: 'a' or 'b' or 'c' Explanation: match "a" or "b" or "c" once
Pattern: [a-c] Matches: 'a' or 'b' or 'c' Explanation: Abbreviation for the above
Pattern: [abc]* Matches: '', 'accb', ... Explanation: Combination of "one from a set" and "zero or more"; match "a" or "b" or "c" zero or more times from the set
The "^" character is used to check to see whether something "starts at the beginning of the string". The "$" character is used to check whether something "finishes at the end of the string". The "|" character is used as the "or" separator. The "|" character is not like the square bracket characters, because the | character separates regular expressions, NOT characters. Brackets are used to group regular expressions. Curly brackets are used to match regular expressions a certain amount of times (or a minimum/maximum amount of times). I know this is a little too much to take it, but soon there will be a massive amount of examples to explain all of these regular expressions characters.
There are also a few special characters which are used to set common characters. Those are:
\t -> Tab \n -> Newline \r -> Carriage Return \* -> Asterisk \\ -> Backslash \d -> Digits [0-9] \w -> Word [a-zA-Z0-9_] (letters, numbers, and the underscore) \s -> Space [\t\r\n] (a tab, a carriage return, a newline) . -> Anything except end-of-line [^\n] (literally any character that isn't a newline)
The function used in PHP to match a string using regular expressions is the preg_match() function. This function uses Perl's regular expression feature to match a string. The function takes the following [simple] parameters:
int preg_match (string pattern, string subject)
The "pattern" must start and end with the "/" character. The main reason for this is that this function uses the Perl regular expressions library, and Perl uses "/"'s in its functions (if you used Perl, regular expressions don't use functions, instead they use m///). The function returns a 1 if the "pattern" matched something in the "subject" and 0 otherwise.
Here are a few simple examples:
echo preg_match("/a/", "a"); //matches "a" echo preg_match("/b/", "a"); //doesn't match, needs a "b" echo preg_match("/a+/",""); //doesn't match, needs to have at least 1 "a" echo preg_match("/a+/","a"); //matches, at least one "a" echo preg_match("/a+/","aaaaaa"); //matches, at least one "a" echo preg_match("/a*/",""); //matches, 0 or more "a"'s echo preg_match("/a*/","aaaaaaaaaa"); //matches, 0 or more "a"'s echo preg_match("/[xyz]/","x"); //matches, there is an "x" echo preg_match("/[xyz]/","y"); //matches, there is an "y" echo preg_match("/[xyz]/","z"); //matches, there is an "z" echo preg_match("/[xyz]/","a"); //doesn't match, there is neither "x", "y", or "z" echo preg_match("/[a-z]/","q"); //matches, "q" is in the range from "a" to "z" echo preg_match("/[0-9]/","5"); //matches, "5" is in the range from "0" to "9" echo preg_match("/[0-9]/","s"); //doesn't match, "s" is not in the range from "0" to "9"
examples of the "|" character:
//note that the | does not match only the chararacter before or after, //the | character matches everything either before or after unless you group //them
//not grouped echo preg_match("/ab|cd/","ab"); //matches echo preg_match("/ab|cd/","cd"); //matches
//grouped echo preg_match("/a(b|c)d/","abd"); //matches echo preg_match("/a(b|c)d/","acd"); //matches echo preg_match("/a(b|c)d/","ad"); //doesn't match
examples of the "*" character:
echo preg_match("/ab*/","abbbb"); //matches echo preg_match("/ab*/","bbbbb"); //fails
examples of the "+" character:
echo preg_match("/a+b/","aaaab"); //matches echo preg_match("/a+b/","b"); //fails
examples with "\w" character:
echo preg_match("/\w+/","abc"); //matches echo preg_match("/\w+/","a_b_c"); //matches echo preg_match("/\w+/","0123456789"); //matches echo preg_match("/\w+/","-"); //fails, "-" is not a part of \w echo preg_match("/\w+/"," "); //fails, space is not a part of \w echo preg_match("/\w+/",""); //fails, have to have a least one \w
examples with "?" character:
echo preg_match("/a?b?c?/","a"); //matches echo preg_match("/a?b?c?/","b"); //matches echo preg_match("/a?b?c?/","c"); //matches echo preg_match("/a?b?c?/","abc"); //matches echo preg_match("/a?b?c?/","ab"); //matches echo preg_match("/a?b?c?/","bc"); //matches
examples with "^" and "$" characters:
echo preg_match("/^im/","image"); //matches echo preg_match("/^im/","imagine"); //matches echo preg_match("/^im/","embrace"); //doesn't match echo preg_match("/er$/","programmer"); //matches echo preg_match("/er$/","designer"); //matches echo preg_match("/er$/","designing"); //doesn't match echo preg_match("/^(ab|cd)$/","ab"); //matches echo preg_match("/^(ab|cd)$/","cd"); //matches echo preg_match("/^(ab|cd)$/","abcd"); //doesn't match echo preg_match("/^(ab|cd)$/","xy"); //doesn't match
examples with curly brackets character:
echo preg_match("/a{2}/","aaa"); //matches, found "aa" somewhere echo preg_match("/^a{2}$/","aa"); //matches, entire string is "aa" echo preg_match("/^a{2}$/","aaa"); //doesn't match, entire string isn't "aa" echo preg_match("/a{2,4}/","aaa"); //matches, minimum "aa", maximum "aaaa" echo preg_match("/a{2,4}/","aaaa"); //matches echo preg_match("/a{2,4}/","a"); //doesn't match echo preg_match("/^a{2,4}$/","aabaa"); //doesn't match
a few common regular expressions (these are by no means secure... JUST simple):
echo preg_match("/^[-.\w]+\@[-.\w]+$/","
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
"); //email addresses echo preg_match("/^\d{2}$/","24"); //ages echo preg_match("/^(19|20)\d\d$/","1983"); //years echo preg_match("/^([\w\s]+)$/","hello there"); //a simple string echo preg_match("/^(http:\/\/www\.|http:\/\/|www\.)([\w\.\/\=\?\&\-]+)$/","http://www.google.com"); //urls
|