. Home Feedback Contents Search

Command Syntax 

Up Next

egrep [-abcGhiLlnqrsuVvwx] [-e pattern] ... [-f patternfile] ... [-U[b|B|l|L]] [-A num] [-B num] [-C [num]] [-d action] [--binary] [--byte-offset] [pattern] [file ...]

The primary strength to egrep is its ability to use regular expressions. Without going into excruciating detail, a regular expression is a syntax that defines a pattern. This pattern can be an exact literal pattern or it can contain wildcard characters.

Any user of DOS is familiar with the concept of wildcard characters. For example, dir *.* will list all files in the current directory. dir ?at.* will list all files whose name is three characters long and whose last two characters are at. Regular expressions expand on this idea.

Character   Meaning
.   Matches any single character except a newline.
[exp] Defines an expression which is treated as a set of characters, any of which may be matches. For example, [abc] would match on either a, b or c.
[m-n] Defines a range of characters, any of which may be matched. For example, [0-9] specifies all of the digits. 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9 would be a match.
[^exp] Defines an expression which is treated as a set of characters, none of which may be treated as a match. For example, [^0-9] would match any character not a digit.
() Used to enclose an expression so that the entire expression can be treated as a standalone group.
^   (Caret) Matches the start of a string.
| Separator between two expressions. A match can occur using either of the expressions. For example, b|cat would match either bat or cat.
$   (Dollar sign) Matches the end of a string. For example, abc$ would match only if it occurred at the end of the line.
*   Matches zero or more instances of the preceding expression.
+   Matches one or more instances of the preceding expression.
?   Matches zero or one instance of the preceding expression.
*?, +?, ??   Normally, *, + and ? are 'greedy' matching operations in that they try to match the greatest number of characters that they can. Adding the modifying '?' reverses that behavior so that the match is the fewest number of characters possible.
(m) Specifies that there must be exactly m copies of the pattern. For example, a(3) specifies exactly 3 a's.
(m,n) Specifies that there must be between m and n copies of the preceding pattern. For example, a(2,5) matches aa, aaa, aaaa or aaaaa.
\ Dual meaning. It either escapes one of the special characters, thus turning it into a literal, or it acts as an introducer for a special meaning string. For example, c:\\ matches c:\.
\A Matches only at the start of a string.
\b Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \ W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python's string literals
\B Matches the empty string, but only when it is not at the beginning or end of a word. This is just the opposite of \ b, so is also subject to the settings of LOCALE and UNICODE.
\d Matches any decimal digit and is the same as specifying [0-9].
\D Matches anything but a decimal digit. This is the same as [~0-9].
\s Matches any whitespace. Same as [ \f\n\r\t\v]
\S Opposite of \s. Matches any non-whitespace character. Same as [^ \f\n\r\t\v].
\w When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in the Unicode character properties database
\W When the LOCALE and UNICODE flags are not specified, matches any non-alphanumeric character; this is equivalent to the set [^a-zA-Z0-9_]. With LOCALE, it will match any character not in the set [0-9_], and not defined as alphanumeric for the current locale. If UNICODE is set, this will match anything other than [0-9_] and characters marked as alphanumeric in the Unicode character properties database.
\z Matches a pattern only when the pattern is at the end of a string.

Examples:

bat - the string bat.

.at - at preceded by any character as in bat, cat, fat, hat and so on.

ca*t - a c followed by zero or more a's followed by t.

^cat - a cat that occurs at the starting of a string.

cat$ - a cat that occurs at the end of a string.

[Cc]at - either Cat or cat.

[1-9][0-9]* - a number that doesn't start with a zero and is at least 1 digit long but can be any number of digits long.

Up Next

Hit Counter