DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gawk.info) Case-sensitivity

Info Catalog (gawk.info) GNU Regexp Operators (gawk.info) Regexp (gawk.info) Leftmost Longest
 
 Case-sensitivity in Matching
 ============================
 
    Case is normally significant in regular expressions, both when
 matching ordinary characters (i.e. not metacharacters), and inside
 character sets.  Thus a `w' in a regular expression matches only a
 lower-case `w' and not an upper-case `W'.
 
    The simplest way to do a case-independent match is to use a character
 list: `[Ww]'.  However, this can be cumbersome if you need to use it
 often; and it can make the regular expressions harder to read.  There
 are two alternatives that you might prefer.
 
    One way to do a case-insensitive match at a particular point in the
 program is to convert the data to a single case, using the `tolower' or
 `toupper' built-in string functions (which we haven't discussed yet;
  Built-in Functions for String Manipulation String Functions.).
 For example:
 
      tolower($1) ~ /foo/  { ... }
 
 converts the first field to lower-case before matching against it.
 This will work in any POSIX-compliant implementation of `awk'.
 
    Another method, specific to `gawk', is to set the variable
 `IGNORECASE' to a non-zero value ( Built-in Variables).  When
 `IGNORECASE' is not zero, _all_ regexp and string operations ignore
 case.  Changing the value of `IGNORECASE' dynamically controls the case
 sensitivity of your program as it runs.  Case is significant by default
 because `IGNORECASE' (like most variables) is initialized to zero.
 
      x = "aB"
      if (x ~ /ab/) ...   # this test will fail
      
      IGNORECASE = 1
      if (x ~ /ab/) ...   # now it will succeed
 
    In general, you cannot use `IGNORECASE' to make certain rules
 case-insensitive and other rules case-sensitive, because there is no way
 to set `IGNORECASE' just for the pattern of a particular rule.  To do
 this, you must use character lists or `tolower'.  However, one thing
 you can do only with `IGNORECASE' is turn case-sensitivity on or off
 dynamically for all the rules at once.
 
    `IGNORECASE' can be set on the command line, or in a `BEGIN' rule
 ( Other Command Line Arguments Other Arguments.; also 
 Startup and Cleanup Actions Using BEGIN/END.).  Setting `IGNORECASE'
 from the command line is a way to make a program case-insensitive
 without having to edit it.
 
    Prior to version 3.0 of `gawk', the value of `IGNORECASE' only
 affected regexp operations. It did not affect string comparison with
 `==', `!=', and so on.  Beginning with version 3.0, both regexp and
 string comparison operations are affected by `IGNORECASE'.
 
    Beginning with version 3.0 of `gawk', the equivalences between
 upper-case and lower-case characters are based on the ISO-8859-1 (ISO
 Latin-1) character set. This character set is a superset of the
 traditional 128 ASCII characters, that also provides a number of
 characters suitable for use with European languages.
 
    The value of `IGNORECASE' has no effect if `gawk' is in
 compatibility mode ( Command Line Options Options.).  Case is
 always significant in compatibility mode.
 
Info Catalog (gawk.info) GNU Regexp Operators (gawk.info) Regexp (gawk.info) Leftmost Longest
automatically generated byinfo2html