LexiScript now features a full fledged Regular expression (regex) pattern matcher / parser. A regex parser is extremely useful for: matching strings to patterns and extracting portions (
capture groups) of strings.
This is an example of using a simple regex to determine if the text is a monetary amount (in US dollars, with $-sign)
| \$[0-9]+(\.[0-9][0-9]?)? |
This matches "$50", "$50.32", and "$50.3" but not "$two.fifty" or just plain "50" |
More complex pattern matching is also possible, such as the following case:
| \$([1-9][0-9]*\.[0-9][0-9]?|0?\.[0-9]+|[1-9][0-9]*[^.]) |
This is a more discriminating match, which matches "$50", "$50.32", "$0.32", and "$.32", but rejects "$050.32", "50.32", "50", and "$two.fifty" |
Extraction of substrings in the pattern can be done by enclosing the portion you want extracted in
( and
). If you don't want the 'group' to be extracted, begin the group with ?: like shown:
| name(?: is|s|'s)? ([^ ]*) |
Will extract the next word typed after "name is", "name's", "names", and "name" |
To perform a match, given a
string, you do:
%string.regex(pattern)%
The result will be a record with the field 'success' as either 0 or 1, and the field 'matches'. Success is always present, so you can always see if the match succeeds or not very simply:
if (%string.regex(pattern).success)%
or
set match %string.regex(pattern)%
if (%match.success%)
The 'matches' field is a parameter list of results.
For example:
set match %string.regex("name(?: is|s|'s)? ([^ ]*)")%
In this case, %% match.matches.param(1) %% will be the first group, in this case, the name. If the first group "(?: is|s|'s)" did not have the "?:", then %% match.matches.param(1) %% would be either "is" "s" "'s" or blank, and %% match.matches.param(2) %% would contain the name.
--
FearItself - 29 Sep 2005
More examples
- Separating the actual zone name from the namespace (if any) and the asset number, in a VID.
set result %string.regex("([\w]*):")%
# param(1) will be the zone name.
- Separating the asset number from a VID.
set result %string.regex("([\w:]*):([\d])")%
# param(1) will contain the zone + namespace(s), param(2) will contain the asset number.
set result %string.regex("([\d]*).(.*)")%
# param(1) will contain the number specification before the dot, param(2) will contain everything after the dot.
- Getting contents inside HTML style tags
set result %string.regex("([\w]*)>(.*)<([\w]*)").matches.param(2)%
# param 1 will be the name of the HTML tag, param2 will be the contents inside the tag.
# Using the format <subject>My contents here.</subject>
--
RommelAvP? - 28 Jul 2008