[Top] [Prev] [Next] [Bottom]


[Contents] [Index]

regex - regular expression recognizer

include "regex.m";
regex:= load Regex Regex->PATH;

compile: fn(exp: string, flag: int): Re;
execute: fn(x: Re; s: string): array of (int,int);

Description

compile

compile: fn(exp: string, flag: int): Re;
## returns nil on error.
The compile function returns a compiled form of the regular expression given in string exp, or nil if exp is not a valid regular expression.

If flag is 1, the regular expression exp can contain pairs of parentheses to group elements of the regular expression. If flag is 0, no grouping occurs.

execute

execute: fn(x: Re, s: string): array of (int,int);
## returns array of (beg, end); nil if no match.
The execute function matches the compiled regular expression x against string s. It returns an array of indexes of the first character of the longest leftmost match and of the next character beyond the match, or nil if no match exists.

The zeroth element of the array contains the character positions of the first character of some leftmost longest match and the next character beyond the match. If the compilation flag was 0, there are no more elements. If the compilation flag was 1, there is one element for each pair of parenthesis in the regular expression, counting left parenthesis left to right starting at 1. The nth element contains the position of the last match to the nth parenthesized subexpression, or (-1,-1) if the subexpression does not participate in the overall match.

Regular Expression Syntax

The primitives in regular expressions are:
. matches any character other than newline
\c matches character c, except \n matches newline
c matches character c other than one of: \ . ^ $ ( ) [ ] ? * + |
(e) matches what regular expression e matches
() matches an empty substring
^ matches an empty substring at the beginning of a string
$ matches an empty substring at the end of a string
[set] [^set] matches any character in a set (or its complement), given as a sequence of zero or more items - characters and ranges. An item consists at least of a literal character, not \ or ], or of a character escaped with \. If this is followed by a literal -, it is the lower limit of an inclusive range of Unicode characters. The upper limit is a similarly expressed character after the -.

Note: The pattern [a-] is treated as an illegal pattern. If you want to match "a" or "-", use the pattern [a\-].

Repetitions are built from primitives, p, in the following ways.
p one match to p
p? zero or one matches to p
p* zero or more matches to p
p+ one or more matches to p

Regular expressions are built from repetitions, r, and other regular expressions, e1, e2, in the following ways.
r a repetition
re1 concatenation: a match to r followed by a match to e1
e1|e2 alternation: a match to either e1 or e2; concatenation takes precedence over alternation

Examples

The following program illustrates the use of the regex module for simple expressions. A short transcript of its operation follows the program listing.

implement Regx;
# regx : This program demonstrates the regex module
#        for simple expressions.
include "sys.m";
	sys:	Sys;
	stderr, stdout, stdin: ref Sys->FD;	
	print : import sys;
include "draw.m";
include "regex.m";
	regex: Regex;
	compile,
	execute : import regex;
Regx: module  
{
	init:   fn(ctxt: ref Draw->Context, argv: list of 
string);
};
 
init(ctxt: ref Draw->Context, argv: list of string)
{
    sys  = load Sys Sys->PATH;
    regex = load Regex Regex->PATH;
    pos : array of (int,int);
    
    pname := hd argv;
    argv = tl argv;
    
    if (len argv != 3) {
    		usage(pname);
    		return;
    }
# get pattern, string, and flag
    patt := hd argv; argv = tl argv;
    s := hd argv; argv = tl argv;
    flag := int hd argv;
# compile and execute
    pos = execute(compile(patt, flag), s );
    if (pos == nil) {
    		print("No match\n");
    		return;
    }
# print number of elements if flag set
    if ( flag == 1 )
   	 	print("pos has %d elements\n", len pos);
# print matches
    print("patt = %s, s = %s\n", patt, s);
    beg, end : int;
    for( i := 0 ; i < len pos ; i++ ) {
    		print("pos[%d]: ", i);
    		(beg, end) = pos[i];
    		if ( (beg != -1) && (end != -1) ) {
    			print("match = %s\n", s[beg:end]);
    		} else {
    			print("did not participate in match\n");
    		}
	}
}
usage(pname: string)
{
	sys->print("Usage: %s RegEx string flag\n", pname);
}


A sample transcript of operation follows:

mymachine$ regx (fer)(no) inferno 0
patt = (fer)(no), s = inferno
pos[0]: match = ferno

mymachine$ regx (fer)(no) inferno 1
pos has 3 elements
patt = (fer)(no), s = inferno
pos[0]: match = ferno
pos[1]: match = fer
pos[2]: match = no

mymachine$ regx '(fer|foo)(no)' inferfoono 1
pos has 3 elements
patt = (fer|foo)(no), s = inferfoono
pos[0]: match = foono
pos[1]: match = foo
pos[2]: match = no



[Top] [Prev] [Next] [Bottom]

infernosupport@lucent.com
Copyright © 1997, Lucent Technologies, Inc.. All rights reserved.