Regular Expression Quick Guide
TextSoap supports use of regular expressions. A regular expression is a string that represents a set of possible strings. Here is a quick guide to basic syntax to get you started.
- a* - zero or more instances of a (longest match possible, greedy)
- a*? - zero or more instances of a (shortest match possible, non-greedy)
- a+ - one or more instances of a (longest match possible, greedy)
- a+? - one or more instances of a (shortest match possible)
- a? - zero or one instance of a
- ^ - beginning of a line
- $ - end of a line
- . - any character
- [a-z] - all characters between a and z
- 123(abc)efg - matches abc and stores it as a group.
- \1 - text matched in first group (when used in the find expression)
- a|b - a or b
- \n - newline
- \t - tab character
- \d - digit
- \D - non-digit
- \w - word character
- \W - non-word character
- \s - whitespace
- \S - non-whitespace
- \ - escape the next character (see below)
Escaped Literal Characters
While letters, numbers can be treated as literals, a number of symbols have special meaning with a regular expression, to match one of these special characters, it needs to be escaped. The following characters have special meaning within a regular expression:
+ ? . * ^ $ ( ) [ ] { } | \
To search for these specific characters, precede them with a backslash (), which is known as escaping the character.
To match a period, use \. To match a backslash, use \\
Replace Template
To reference a captured group, use the syntax $0.
$0 - entire search results
$1 - results of first capture group
Sample Expressions
- \s*$ - matches whitespace at the end of a line
- \S+@\S+ - very basic email address match (stuff@domain.com)
- (19|20)\d\d-\d\d?-\d\d
For a more definitive look at regular expressions, see the full reference on regular expression syntax.