Regular Expressions Overview






A Regular Expression is a quick way to validate text against a given pattern. It’s a very powerful tool but can become very complex making it harder for others to understand your code. In .NET you can use regular expressions by using the System.Text.RegularExpressions namespace. The framework gives you the possibiltiy not only to tell you if the text matches the pattern but also to pick out matching text or even replace matching text with some other text. I’m going to give just a few brief examples here of regex matching.

Match ranges
One character matches once on its own. But combined with a range character it can match many times.
* matches 0 or more times. The pattern ga*matches gaa as well as g
+ matches 1 or more times. The pattern ga+ matches gaa but not g
? matches 0 or 1 time. The pattern ga? matches gaa as well as g. When a ? character is put right after a range character then it tells the range to be nongreedy. The pattern ga* has several different matching possibilities in gaaaa but by putting a ? character directly after will make it select the shortes available choise. The pattern ga*? will match ga in gaaaa.
{2} matches 2 times. The pattern ga{2} matches gaa but not g
{2,} matches 2 or more times. The pattern ga{2,} matches gaa but not g
{2,5} matches 2 to 5 times. The pattern ga{2,5} matches gaa but not g

Example:
46*3+2  matches i.e. 46663332 and 432 but not 462 (since you need at least one 3)

Match one of several with []
Each characters within [] represents one possible match. The pattern [abc] matches one a or one b or one c and not abc altogether. You can create ranges by useing the – character. The pattern [a-h] matches all letters inbetween a to h. The pattern [a-zA-Z0-9] matches all uppercase and lowercase letters and all numbers as well. The characters are read one by one so [10-20] matches 1 and 0-2 and 0 but not 10-20. By using the ^ character within [] you can create a pattern that matches the opposite to the range within [], i.e. [^a-zA-Z] matches all characters except uppercase and lowercase letters.

Example:
[0-10]  matches 0 and 1 but not 8 or 9!
[a-z]{3,5}  matches gje and kdeff but not kd or Geds
[a-z ]*  matches it is late now

Match groups
If you want to match a group one or several times you use the () . While [] only matched a single character at the time () matches all characters. The pattern (of) matches cup of coffee twice. By using the | character you create an OR pattern. The pattern (cof|of) matches either cof or of.

Examples:
(so)+   matches all occurences of at least one so, like so and soso and sososo
pi(zza)?  matches pi or pizza

Symbols as shortcuts
There are some specific symbols that you can use as shortcuts when writing your patterns.
\d  digital character equivalent to [0-9]
\D  non-digital character equivalent to [^0-9]
\s  matches white-space characters equivalent to [\f\n\r\t\v]
\S  matches non white-space characters
\w  matches word charactes including underscore equivalent to [A-Za-z0-9_]
\W  matches non word characters
.  matches any character except new line when using the Singleline option and any character including new line when using the Multiline option.

Example:
^\d{5}$  matches a string with five digits and nothing else.






Matching location in string
You can either match all data as one group or let each line of data represent a single group. It depends on what you want to do. The distinction is made when you create the Regex object by setting the RegexOption. Default behaviour is to treat each row as its own data.

  • RegexOptions.Singleline
    ^ match beginning of all data
    $ match end of all data
  • RegexOptions.Multiline
    ^ match beginning of row
    $ match end of row

There are a number of other ways to match location in string or special characters but they wont be listed here.

Using regular expressions in C#

Finally a little example of how to use regular expressions when programming. The IsMatch() method in the Regex class is static and can be called without creating a regex object.

A simple match in C# might look like this

if (Regex.IsMatch("my test template", "(te){1,}$"))
  Console.WriteLine("It's a match");
else
  Console.WriteLine("No match");

You can also create a regular expression by creating an instance of the Regex class and call the non static method Match().

Regex r = new Regex("(te){1,}$", RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Singleline);
if (r.Match("my test template").Success)            
  Console.WriteLine("It's a match");
else
  Console.WriteLine("No match");

The Compiled option makes comparision faster at the cost of longer startup time.

If you are into more complex usage of regular expressions then check out my other posts on the subject.