Regular expressions are a concise and flexible notation for finding and replacing patterns of with the Edit - Find , Edit - Find Next and Edit - Replace dialogs. There are several different syntax styles used for regular expressions in computing. Manifold uses the regular expression syntax used in the Microsoft .NET framework, the latest edition of the Microsoft regular expression syntax familiar to many from Microsoft ActiveX scripting languages. Note that .NET introduces slight changes from the regular expression syntax used in both Jscript and VBScript. Microsoft regular expression syntax is very similar to that used in UNIX/Linux grep command. ("grep" = "global regular expression processor").
Special characters and sequences are used in writing patterns for regular expressions. The following table describes these characters and includes short examples showing how the characters are used.
|
Character |
Description |
|
\ |
Marks the next character as either a special character or a literal. For example, "n" matches the character "n". "\n" matches a newline character. The sequence "\\" matches "\" and "\(" matches "(". |
|
^ |
Matches the beginning of input. |
|
$ |
Matches the end of input. |
|
* |
Matches the preceding character zero or more times. For example, "zo*" matches either "z" or "zoo". |
|
+ |
Matches the preceding character one or more times. For example, "zo+" matches "zoo" but not "z". |
|
? |
Matches the preceding character zero or one time. For example, "a?ve?" matches the "ve" in "never". |
|
. |
Matches any single character except a newline character. |
|
(pattern) |
Matches pattern and remembers the match. The matched substring can be retrieved from the resulting Matches collection, using Item [0]...[n]. To match parentheses characters ( ), use "\(" or "\)". |
|
x|y |
Matches either x or y. For example, "z|food" matches "z" or "food". "(z|f)ood" matches "zoo" or "food". |
|
{n} |
n is a nonnegative integer. Matches exactly n times. For example, "o{2}" does not match the "o" in "Bob," but matches the first two o's in "foooood". |
|
{n,} |
n is a nonnegative integer. Matches at least n times. For example, "o{2,}" does not match the "o" in "Bob" and matches all the o's in "foooood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*". |
|
{n,m} |
m and n are nonnegative integers. Matches at least n and at most m times. For example, "o{1,3}" matches the first three o's in "fooooood". "o{0,1}" is equivalent to "o?". |
|
[xyz] |
A character set. Matches any one of the enclosed characters. For example, "[abc]" matches the "a" in "plain". |
|
[^xyz] |
A negative character set. Matches any character not enclosed. For example, "[^abc]" matches the "p" in "plain". |
|
[a-z] |
A range of characters. Matches any character in the specified range. For example, "[a-z]" matches any lowercase alphabetic character in the range "a" through "z". |
|
[^m-z] |
A negative range characters. Matches any character not in the specified range. For example, "[m-z]" matches any character not in the range "m" through "z". |
|
\b |
Matches a word boundary, that is, the position between a word and a space. For example, "er\b" matches the "er" in "never" but not the "er" in "verb". |
|
\B |
Matches a nonword boundary. "ea*r\B" matches the "ear" in "never early". |
|
\d |
Matches a digit character. Equivalent to [0-9]. |
|
\D |
Matches a nondigit character. Equivalent to [^0-9]. |
|
\f |
Matches a form-feed character. |
|
\n |
Matches a newline character. |
|
\r |
Matches a carriage return character. |
|
\s |
Matches any white space including space, tab, form-feed, etc. Equivalent to "[ \f\n\r\t\v]". |
|
\S |
Matches any nonwhite space character. Equivalent to "[^ \f\n\r\t\v]". |
|
\t |
Matches a tab character. |
|
\v |
Matches a vertical tab character. |
|
\w |
Matches any word character including underscore. Equivalent to "[A-Za-z0-9_]". |
|
\W |
Matches any nonword character. Equivalent to "[^A-Za-z0-9_]". |
|
\num |
Matches num, where num is a positive integer. A reference back to remembered matches. For example, "(.)\1" matches two consecutive identical characters. |
|
\n |
Matches n, where n is an octal escape value. Octal escape values must be 1, 2, or 3 digits long. For example, "\11" and "\011" both match a tab character. "\0011" is the equivalent of "\001" & "1". Octal escape values must not exceed 256. If they do, only the first two digits comprise the expression. Allows ASCII codes to be used in regular expressions. |
|
\xn |
Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" & "1". Allows ASCII codes to be used in regular expressions. |
For details on options in the Find and Replace dialogs, see the Edit - Find / Find Next and Edit - Replace topics.
Regular Expressions in Find and Replace
Regular expressions can be used in the Find box to find items and also in the Replace box to specify how they should be replaced. For example, Europeans will often denote decimal places with a comma instead of a period as is used in the US. This transformation can be easily done with a Replace command using regular expressions.
Strings like "101.999" can be transformed to "101,909" (comma instead of period) by supplying a Find string of ([0-9]{3}).([0-9]{3}) and a Replace string of $1,$2

![]()
Note that the above is a search and replace on a string (that is, a text) value. Changing the period or comma notation in numeric values is done in the Windows localization settings.
Regular Expression Examples
The following examples list a regular expression in bold face that can be used to search for various patterns. Each regular expression is then followed by sample strings and whether or not the string would be OK in that regular expression or if it would fail the test posed by the regular expression.
These examples show how regular expressions can be used to find certain patterns. They do not purport to be definitive filters for the various examples given. For example, a rigorous filter for URLs would be a more complex regular expression than that that provided below since it would exclude characters such as # that are not allowed in domain names. The last example, for email addresses, shows a more robust regular expression that can be used as a true validation filter for acceptable email addresses.
First name: (?:Carlos|Mario)\s.*
|
Carlos Cramer |
OK |
|
Carlos Hernandez |
OK |
|
Carlos Gonzalez |
OK |
|
Mario Hernandez |
OK |
|
Paolo Accorti |
Fail |
Last name: \S+\s+Hernandez
|
Carlos Cramer |
Fail |
|
Carlos Hernandez |
OK |
|
Carlos Gonzalez |
Fail |
|
Mario Hernandez |
OK |
|
Paolo Accorti |
Fail |
First and last name: Carlos\s(?:Hernandez|Cramer)
|
Carlos Cramer |
OK |
|
Carlos Hernandez |
OK |
|
Carlos Gonzalez |
Fail |
|
Mario Hernandez |
Fail |
|
Paolo Accorti |
Fail |
US style phone number: (\+\d)?\s*(\(\d+\))?\s*\d[\s\d-]*
|
+7(514)555-9931 |
OK |
|
(514) 333-9931 |
OK |
|
(617) 555-3267 |
OK |
|
555-8787 |
OK |
|
(1) 03.83.00.68 |
Fail |
Phone number containing area code 514: (\+\d)?\s*\(514\)\s*\d[\s\d-]*
|
+7(514)555-9931 |
OK |
|
(514) 333-9931 |
OK |
|
(617) 555-3267 |
Fail |
|
555-8787 |
Fail |
|
(1) 03.83.00.68 |
Fail |
Phone number starting with 555: (\+\d)?\s*(\(\d+\))?\s*555[\s\d-]*
|
+7(514)555-9931 |
OK |
|
(514) 333-9931 |
Fail |
|
(617) 555-3267 |
OK |
|
555-8787 |
OK |
|
(1) 03.83.00.68 |
Fail |
URL: (?:ftp\:\\\\|http\:\\\\|mailto\:\\\\)?(\w+\@)?(www\.)?\w+(\.\w+)+(\:\d+)?
|
http:\\www.manifold.net:8080 |
OK |
|
http:\\manifold |
Fail |
|
www.manifold.net |
OK |
|
ftp:\\microsoft.com |
OK |
|
mailto:\\john@manifold.com |
OK |
Manifold URL: (?:ftp\:\\\\|http\:\\\\|mailto\:\\\\)?(\w+\@)?(www\.)?manifold(\.\w+)+(\:\d+)?
|
http:\\www.manifold.net:8080 |
OK |
|
http:\\manifold |
Fail |
|
www.manifold.net |
OK |
|
ftp:\\microsoft.com |
Fail |
|
mailto:\\john@manifold.com |
OK |
FTP URL: ftp\:\\\\(www\.)?\w+(\.\w+)+(\:\d+)?
|
http:\\www.manifold.net:8080 |
Fail |
|
http:\\manifold |
Fail |
|
www.manifold.net |
Fail |
|
ftp:\\microsoft.com |
OK |
|
mailto:\\john@manifold.com |
Fail |
Latitude: \d+\°\d+\'\d+(\.\d+)?\"\s*(N|S)?
|
0°00'00.00" |
OK |
|
83°02'50.82" N |
OK |
|
0°05'43.14" S |
OK |
|
9' |
Fail |
South latitude: \d+\°\d+\'\d+(\.\d+)?\"\s*S
|
0°00'00.00" |
Fail |
|
83°02'50.82" N |
Fail |
|
0°05'43.14" S |
OK |
|
9' |
Fail |
Date: \d+(\/|\-)\d+(\/|\-)\d+
|
5/24/1985 |
OK |
|
5/24/85 |
OK |
|
5-24-1985 |
OK |
|
5.24.1985 |
Fail |
199x: \d+(\/|\-)\d+(\/|\-)199\d
|
5/24/1985 |
Fail |
|
5-24-1995 |
OK |
|
5.24.1995 |
Fail |
Seeking a date of the 24th: \d+(\/|\-)24(\/|\-)\d+
|
5/24/1985 |
OK |
|
5-29-1985 |
Fail |
|
5.24.1985 |
Fail |
Dollar currency: (\$\s*\d+(\.\d+)?)|(\d+(\.\d+)?\s*\$)
|
$5 |
OK |
|
4.6 $ |
OK |
|
-7.3 $ |
Fail |
|
7.3 |
Fail |
Exponential number: \-?\d+(\.\d+)?([E|D]\-?\d+)?
|
-3.5E2 |
OK |
|
3D-56 |
OK |
|
-7 |
OK |
|
.8 |
Fail |
Rational fraction: \-?(?:((\d+\s*)?\d+\/\d+)|(\d+(\s*\d+\/\d+)?))
|
-2 |
OK |
|
2/5 |
OK |
|
-1 2/5 |
OK |
|
2/-5 |
Fail |
HTML tag: <(.*)>.*<\/\1>
|
<a>abc</a> |
OK |
|
<a>abc</b> |
Fail |
|
<a>abc |
Fail |
SSN (social security number): \d{3}-\d{2}-\d{4}
|
223-20-9898 |
OK |
|
22-20-9898 |
Fail |
|
223-209898 |
Fail |
|
223 20 9898 |
Fail |
Credit card style number patterns: \d{4}( \d{4}){3,4}
|
2235 5656 4578 7890 |
OK |
|
2235 5656 4578 7890 0010 |
OK |
|
2235 5656 4578 7890 00 |
Fail |
|
2235-5656-4578-7890 |
Fail |
Email address validation: ([\w\.!#\$%\-+.]+@[A-Za-z0-9\-]+(\.[A-Za-z0-9\-]+)+)
|
john_smith@domain.com |
OK |
|
john.smith@domain.com.au |
OK |
|
john_smith.domain.com.au |
Fail |
|
john smith@domain.com.au |
Fail |
|
john_smith@domain |
Fail |
|
john_smith@ |
Fail |