LexiScript Regular Expressions

LexiScript now features a full fledged Regular expression (regex) pattern matcher / parser. A regex parser is extremely useful for: matching strings to patterns and extracting portions (capture groups) of strings.

This is an example of using a simple regex to determine if the text is a monetary amount (in US dollars, with $-sign)

\$[0-9]+(\.[0-9][0-9]?)? This matches "$50", "$50.32", and "$50.3" but not "$two.fifty" or just plain "50"

More complex pattern matching is also possible, such as the following case:

\$([1-9][0-9]*\.[0-9][0-9]?|0?\.[0-9]+|[1-9][0-9]*[^.]) This is a more discriminating match, which matches "$50", "$50.32", "$0.32", and "$.32", but rejects "$050.32", "50.32", "50", and "$two.fifty"

Extraction of substrings in the pattern can be done by enclosing the portion you want extracted in ( and ). If you don't want the 'group' to be extracted, begin the group with ?: like shown:

name(?: is|s|'s)? ([^ ]*) Will extract the next word typed after "name is", "name's", "names", and "name"

To perform a match, given a string, you do:

%string.regex(pattern)%

The result will be a record with the field 'success' as either 0 or 1, and the field 'matches'. Success is always present, so you can always see if the match succeeds or not very simply:

if (%string.regex(pattern).success)%

or

set match %string.regex(pattern)%
if (%match.success%)

The 'matches' field is a parameter list of results.

For example:

set match %string.regex("name(?: is|s|'s)? ([^ ]*)")%

In this case, %% match.matches.param(1) %% will be the first group, in this case, the name. If the first group "(?: is|s|'s)" did not have the "?:", then %% match.matches.param(1) %% would be either "is" "s" "'s" or blank, and %% match.matches.param(2) %% would contain the name.

-- FearItself - 29 Sep 2005

More examples

  • Separating the actual zone name from the namespace (if any) and the asset number, in a VID.
set result %string.regex("([\w]*):")%
# param(1) will be the zone name.

  • Separating the asset number from a VID.
set result %string.regex("([\w:]*):([\d])")%
# param(1) will contain the zone + namespace(s), param(2) will contain the asset number.

  • Separating num.item
set result %string.regex("([\d]*).(.*)")%
# param(1) will contain the number specification before the dot, param(2) will contain everything after the dot.

  • Getting contents inside HTML style tags
set result %string.regex("([\w]*)>(.*)<([\w]*)").matches.param(2)%
# param 1 will be the name of the HTML tag, param2 will be the contents inside the tag.
# Using the format <subject>My contents here.</subject>

-- RommelAvP? - 28 Jul 2008

Topic revision: r6 - 10 Dec 2008 - 20:13:55 - RommelAvP?
Lexiscript.LexiScriptRegex moved from Core.LexiScriptRegex on 24 Sep 2006 - 01:38 by FearItself - put it back
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback