(gawk.info) GNU Regexp Operators
(gawk.info) Regexp Operators
Additional Regexp Operators Only in `gawk'
GNU software that deals with regular expressions provides a number of
additional regexp operators. These operators are described in this
section, and are specific to `gawk'; they are not available in other
Most of the additional operators are for dealing with word matching.
For our purposes, a "word" is a sequence of one or more letters, digits,
or underscores (`_').
This operator matches any word-constituent character, i.e. any
letter, digit, or underscore. Think of it as a short-hand for
This operator matches any character that is not word-constituent.
Think of it as a short-hand for `[^[:alnum:]_]'.
This operator matches the empty string at the beginning of a word.
For example, `/\<away/' matches `away', but not `stowaway'.
This operator matches the empty string at the end of a word. For
example, `/stow\>/' matches `stow', but not `stowaway'.
This operator matches the empty string at either the beginning or
the end of a word (the word boundar*y*). For example, `\yballs?\y'
matches either `ball' or `balls' as a separate word.
This operator matches the empty string within a word. In other
words, `\B' matches the empty string that occurs between two
word-constituent characters. For example, `/\Brat\B/' matches
`crate', but it does not match `dirty rat'. `\B' is essentially
the opposite of `\y'.
There are two other operators that work on buffers. In Emacs, a
"buffer" is, naturally, an Emacs buffer. For other programs, the
regexp library routines that `gawk' uses consider the entire string to
be matched as the buffer.
For `awk', since `^' and `$' always work in terms of the beginning
and end of strings, these operators don't add any new capabilities.
They are provided for compatibility with other GNU software.
This operator matches the empty string at the beginning of the
This operator matches the empty string at the end of the buffer.
In other GNU software, the word boundary operator is `\b'. However,
that conflicts with the `awk' language's definition of `\b' as
backspace, so `gawk' uses a different letter.
An alternative method would have been to require two backslashes in
the GNU operators, but this was deemed to be too confusing, and the
current method of using `\y' for the GNU `\b' appears to be the lesser
of two evils.
The various command line options ( Command Line Options
Options.) control how `gawk' interprets characters in regexps.
In the default case, `gawk' provides all the facilities of POSIX
regexps and the GNU regexp operators described in Regular
Expression Operators Regexp Operators. However, interval
expressions are not supported.
Only POSIX regexps are supported, the GNU operators are not special
(e.g., `\w' matches a literal `w'). Interval expressions are
Traditional Unix `awk' regexps are matched. The GNU operators are
not special, interval expressions are not available, and neither
are the POSIX character classes (`[[:alnum:]]' and so on).
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
Allow interval expressions in regexps, even if `--traditional' has
(gawk.info) Regexp Operators
automatically generated byinfo2html