(gawk.info) Case-sensitivity
Info Catalog
(gawk.info) GNU Regexp Operators
(gawk.info) Regexp
(gawk.info) Leftmost Longest
Case-sensitivity in Matching
============================
Case is normally significant in regular expressions, both when
matching ordinary characters (i.e. not metacharacters), and inside
character sets. Thus a `w' in a regular expression matches only a
lower-case `w' and not an upper-case `W'.
The simplest way to do a case-independent match is to use a character
list: `[Ww]'. However, this can be cumbersome if you need to use it
often; and it can make the regular expressions harder to read. There
are two alternatives that you might prefer.
One way to do a case-insensitive match at a particular point in the
program is to convert the data to a single case, using the `tolower' or
`toupper' built-in string functions (which we haven't discussed yet;
Built-in Functions for String Manipulation String Functions.).
For example:
tolower($1) ~ /foo/ { ... }
converts the first field to lower-case before matching against it.
This will work in any POSIX-compliant implementation of `awk'.
Another method, specific to `gawk', is to set the variable
`IGNORECASE' to a non-zero value ( Built-in Variables). When
`IGNORECASE' is not zero, _all_ regexp and string operations ignore
case. Changing the value of `IGNORECASE' dynamically controls the case
sensitivity of your program as it runs. Case is significant by default
because `IGNORECASE' (like most variables) is initialized to zero.
x = "aB"
if (x ~ /ab/) ... # this test will fail
IGNORECASE = 1
if (x ~ /ab/) ... # now it will succeed
In general, you cannot use `IGNORECASE' to make certain rules
case-insensitive and other rules case-sensitive, because there is no way
to set `IGNORECASE' just for the pattern of a particular rule. To do
this, you must use character lists or `tolower'. However, one thing
you can do only with `IGNORECASE' is turn case-sensitivity on or off
dynamically for all the rules at once.
`IGNORECASE' can be set on the command line, or in a `BEGIN' rule
( Other Command Line Arguments Other Arguments.; also
Startup and Cleanup Actions Using BEGIN/END.). Setting `IGNORECASE'
from the command line is a way to make a program case-insensitive
without having to edit it.
Prior to version 3.0 of `gawk', the value of `IGNORECASE' only
affected regexp operations. It did not affect string comparison with
`==', `!=', and so on. Beginning with version 3.0, both regexp and
string comparison operations are affected by `IGNORECASE'.
Beginning with version 3.0 of `gawk', the equivalences between
upper-case and lower-case characters are based on the ISO-8859-1 (ISO
Latin-1) character set. This character set is a superset of the
traditional 128 ASCII characters, that also provides a number of
characters suitable for use with European languages.
The value of `IGNORECASE' has no effect if `gawk' is in
compatibility mode ( Command Line Options Options.). Case is
always significant in compatibility mode.
Info Catalog
(gawk.info) GNU Regexp Operators
(gawk.info) Regexp
(gawk.info) Leftmost Longest
automatically generated byinfo2html