|
| |

egrep [-abcGhiLlnqrsuVvwx]
[-e pattern] ... [-f
patternfile] ... [-U[b|B|l|L]]
[-A num] [-B
num] [-C [num]] [-d
action] [--binary] [--byte-offset]
[pattern] [file ...]
The primary strength to egrep is its ability to use regular expressions.
Without going into excruciating detail, a regular expression is a syntax that
defines a pattern. This pattern can be an exact literal pattern or it can
contain wildcard characters.
Any user of DOS is familiar with the concept of wildcard characters. For
example, dir *.* will list all files in the current directory. dir
?at.* will list all files whose name is three characters long and whose last
two characters are at. Regular expressions expand on this idea.
| Character |
Meaning |
| . |
Matches any single character except a newline. |
| [exp] |
Defines an expression which is treated as a set of characters, any
of which may be matches. For example, [abc] would match on either a, b
or c. |
| [m-n] |
Defines a range of characters, any of which may be matched. For
example, [0-9] specifies all of the digits. 0, 1, 2, 3, 4, 5, 6, 7, 8 or
9 would be a match. |
| [^exp] |
Defines an expression which is treated as a set of characters, none
of which may be treated as a match. For example, [^0-9] would match any
character not a digit. |
| () |
Used to enclose an expression so that the entire expression can be
treated as a standalone group. |
| ^ |
(Caret) Matches the start of a string. |
| | |
Separator between two expressions. A match can occur using either of
the expressions. For example, b|cat would match either bat or cat. |
| $ |
(Dollar sign) Matches the end of a string. For example, abc$ would
match only if it occurred at the end of the line. |
| * |
Matches zero or more instances of the preceding expression. |
| + |
Matches one or more instances of the preceding expression. |
| ? |
Matches zero or one instance of the preceding expression. |
| *?, +?, ?? |
Normally, *, + and ? are 'greedy' matching operations in that they
try to match the greatest number of characters that they can. Adding the
modifying '?' reverses that behavior so that the match is the fewest
number of characters possible. |
| (m) |
Specifies that there must be exactly m copies of the pattern. For
example, a(3) specifies exactly 3 a's. |
| (m,n) |
Specifies that there must be between m and n copies of the preceding
pattern. For example, a(2,5) matches aa, aaa, aaaa or aaaaa. |
| \ |
Dual meaning. It either escapes one of the special characters, thus
turning it into a literal, or it acts as an introducer for a special
meaning string. For example, c:\\ matches c:\. |
| \A |
Matches only at the start of a string. |
| \b |
Matches the empty string, but only at the beginning or end of a
word. A word is defined as a sequence of alphanumeric or underscore
characters, so the end of a word is indicated by whitespace or a
non-alphanumeric, non-underscore character. Note that \b is
defined as the boundary between \w and \ W, so
the precise set of characters deemed to be alphanumeric depends on the
values of the UNICODE and LOCALE flags. Inside
a character range, \b represents the backspace
character, for compatibility with Python's string literals |
| \B |
Matches the empty string, but only when it is not at the
beginning or end of a word. This is just the opposite of \ b,
so is also subject to the settings of LOCALE and
UNICODE. |
| \d |
Matches any decimal digit and is the same as specifying [0-9]. |
| \D |
Matches anything but a decimal digit. This is the same as [~0-9]. |
| \s |
Matches any whitespace. Same as [ \f\n\r\t\v] |
| \S |
Opposite of \s. Matches any non-whitespace character. Same as [^
\f\n\r\t\v]. |
| \w |
When the LOCALE and
UNICODE flags are not specified, matches any alphanumeric character
and the underscore; this is equivalent to the set
[a-zA-Z0-9_]. With LOCALE, it will match
the set [0-9_] plus whatever characters are
defined as alphanumeric for the current locale. If
UNICODE is set, this will match the characters
[0-9_] plus whatever is classified as alphanumeric in the Unicode
character properties database |
| \W |
When the LOCALE and
UNICODE flags are not specified, matches any non-alphanumeric
character; this is equivalent to the set
[^a-zA-Z0-9_]. With LOCALE, it will match
any character not in the set [0-9_], and not
defined as alphanumeric for the current locale. If
UNICODE is set, this will match anything other than
[0-9_] and characters marked as alphanumeric in
the Unicode character properties database. |
| \z |
Matches a pattern only when the pattern is at the end of a string. |
Examples:
bat - the string bat.
.at - at preceded by any character as in bat, cat, fat, hat and so on.
ca*t - a c followed by zero or more a's followed by t.
^cat - a cat that occurs at the starting of a string.
cat$ - a cat that occurs at the end of a string.
[Cc]at - either Cat or cat.
[1-9][0-9]* - a number that doesn't start with a zero and is at least 1 digit
long but can be any number of digits long.


|