DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gawk.info) GNU Regexp Operators

Info Catalog (gawk.info) Regexp Operators (gawk.info) Regexp (gawk.info) Case-sensitivity
 
 Additional Regexp Operators Only in `gawk'
 ==========================================
 
    GNU software that deals with regular expressions provides a number of
 additional regexp operators.  These operators are described in this
 section, and are specific to `gawk'; they are not available in other
 `awk' implementations.
 
    Most of the additional operators are for dealing with word matching.
 For our purposes, a "word" is a sequence of one or more letters, digits,
 or underscores (`_').
 
 `\w'
      This operator matches any word-constituent character, i.e. any
      letter, digit, or underscore. Think of it as a short-hand for
      `[[:alnum:]_]'.
 
 `\W'
      This operator matches any character that is not word-constituent.
      Think of it as a short-hand for `[^[:alnum:]_]'.
 
 `\<'
      This operator matches the empty string at the beginning of a word.
      For example, `/\<away/' matches `away', but not `stowaway'.
 
 `\>'
      This operator matches the empty string at the end of a word.  For
      example, `/stow\>/' matches `stow', but not `stowaway'.
 
 `\y'
      This operator matches the empty string at either the beginning or
      the end of a word (the word boundar*y*).  For example, `\yballs?\y'
      matches either `ball' or `balls' as a separate word.
 
 `\B'
      This operator matches the empty string within a word. In other
      words, `\B' matches the empty string that occurs between two
      word-constituent characters. For example, `/\Brat\B/' matches
      `crate', but it does not match `dirty rat'.  `\B' is essentially
      the opposite of `\y'.
 
    There are two other operators that work on buffers.  In Emacs, a
 "buffer" is, naturally, an Emacs buffer.  For other programs, the
 regexp library routines that `gawk' uses consider the entire string to
 be matched as the buffer.
 
    For `awk', since `^' and `$' always work in terms of the beginning
 and end of strings, these operators don't add any new capabilities.
 They are provided for compatibility with other GNU software.
 
 `\`'
      This operator matches the empty string at the beginning of the
      buffer.
 
 `\''
      This operator matches the empty string at the end of the buffer.
 
    In other GNU software, the word boundary operator is `\b'. However,
 that conflicts with the `awk' language's definition of `\b' as
 backspace, so `gawk' uses a different letter.
 
    An alternative method would have been to require two backslashes in
 the GNU operators, but this was deemed to be too confusing, and the
 current method of using `\y' for the GNU `\b' appears to be the lesser
 of two evils.
 
    The various command line options ( Command Line Options
 Options.)  control how `gawk' interprets characters in regexps.
 
 No options
      In the default case, `gawk' provides all the facilities of POSIX
      regexps and the GNU regexp operators described in  Regular
      Expression Operators Regexp Operators.  However, interval
      expressions are not supported.
 
 `--posix'
      Only POSIX regexps are supported, the GNU operators are not special
      (e.g., `\w' matches a literal `w').  Interval expressions are
      allowed.
 
 `--traditional'
      Traditional Unix `awk' regexps are matched. The GNU operators are
      not special, interval expressions are not available, and neither
      are the POSIX character classes (`[[:alnum:]]' and so on).
      Characters described by octal and hexadecimal escape sequences are
      treated literally, even if they represent regexp metacharacters.
 
 `--re-interval'
      Allow interval expressions in regexps, even if `--traditional' has
      been provided.
 
Info Catalog (gawk.info) Regexp Operators (gawk.info) Regexp (gawk.info) Case-sensitivity
automatically generated byinfo2html