grep

The grep utility, which allows files to be searched for strings of words, uses a syntax similar to the regular expression syntax of the vi, ex, ed, and sed editors. grep comes in three flavors, grepfgrep, and egrep, all of which I’ll cover in this article.

The name grep is derived from the editor command g/re/p, which literally translates to “globally search for a regular wxpression and print what you find.” Regular expressions are at the core of grep, and I’ll cover them after a brief description of some of the utility’s command options.

The simplest grep command is grep (search pattern) (files list), as in:

grep hello *

The output of this command might be something like this:

$ grep hello *
story.txt: so I said hello and she smiled back
intro.txt: use the hello.c program as an example of C programming
$

grep is case sensitive, so in order to change the search to include “hello,” “Hello,” or “HELLO,” use the -y or -i option. Earlier versions of grep used -y, and later versions use -i-y is now considered obsolete, although some versions of grep do support both. In the following example, more hellos show up because the search is case independent.

$ grep -i hello *
story.txt: so I said hello and she smiled back
story.txt: I could hear my echo, “HELLO.”
intro.txt: use the hello.c program as an example of C programming hello.c:      printf(“Hello, world. n”);
$

This command searches all files in the current directory and prints the file name and the line containing the string “hello” for any files that contain that string.

The output of grep varies depending on whether you’re searching one or several files. If only one file is named on the command line, the output doesn’t include the file name, as in the following example:

$ grep -i hello hello.c
printf(“Hello, world. n”);
$

The one-file rule applies whether you use a wild card in your file list or not. If hello.c were the only file in the current directory, using a wild card to locate the file would still produce an unnamed file output. In the following example, the user is searching for any C files containing “hello.” There is only one C file in the directory, so the output is identical to the previous example.

$ grep -i hello *.c
printf(“Hello, world. n”);
$

I don’t know of a grep that has a work-around for this behavior, but you could use the -l option instead, which prints the file name only and not the line containing the string. At least you would know the name of the file that contained the string.

$ grep -il hello *.c
hello.c:
$

The -l option can be used to extract a list of files containing the string. The file name is printed only once, even though the string may appear in multiple lines within that file. In the following example, story.txt appears only once, even though it contains more than one “hello.”

$ grep -il hello *
hello.c:
intro.txt:
story.txt:
$

The -l option suppresses most of the other output options from grep. On the other hand, the -n option will print a line number as well as the text, as in the following example:

$ grep -in hello *
hello.c:7: printf(“Hello, world. n”);
intro.txt:44: use the hello.c program as an example of C programming
story.txt:110: so I said hello and she smiled back
story.txt:187: I could hear my echo, “HELLO.”
$

The -v option outputs the complement of the search, i.e., all lines not containing the requested search pattern.

$ grep -iv hello intro.txt
You will be able to get more practice if you
at its simplest
$

The -c option prints only a count of lines matched. It also has the interesting and useful side effect of listing all the files it searches, not just the successful hits.

$ grep -ic hello *
data.txt:0
hello.c:1:
intro.txt:1
intro2.txt:0
story.txt:2
$

Some versions of grep come with -r as an option, which prompts grep to search recursively through subdirectories. The default behavior is to search only one directory, so the -r option, as provided in GNU and other implementations of grep, is the exception rather than the rule.

Going wild with grep
So far I’ve covered some of the input and output options, but the real power of grep is in its search pattern, which uses regular expressions. grep can match simple strings, as we saw in the “hello” example we played with above; but it can also use a variety of wild cards and special symbols to create a regular expression to search for more complex strings.

I will begin with some of the simpler characters in a regular expression. A ^ (caret) character means the start of a line and a $ (dollar) character means the end of one.

The wild cards used by grep frequently clash with the special symbols that the shell uses, so the usual practice is to enclose complex search strings within single quotes. The two following examples would match any case version of “hello” at the start and end of a line, respectively.

$ grep ‘^hello’ *

$ grep ‘hello$’ *

The dot or period character (.) will match any single character. For example, the following would match any character followed by “ello,” as in “aello,” “bello,” “cello,” and so on all the way through “zello.” Odd combinations, like “1ello” and “?ello,” would also be included; any combination of one initial character followed by “ello” is valid. The dot does not match the beginning or end of a line; therefore, “ello” at the start of a line would not be matched.

$ grep ‘.ello’ *

Optional characters can be enclosed in square brackets ([ ]) causing any of the enclosed characters to be matched. The following search string would match “hello,” “cello,” or “jello.”

Want to learn more?? The InfoSec Institute Ethical Hacking course goes in-depth into the techniques used by malicious, black hat hackers with attention getting lectures and hands-on lab exercises. While these hacking skills can be used for malicious purposes, this class teaches you how to use the same hacking techniques to perform a white-hat, ethical hack, on your organization. You leave with the ability to quantitatively assess and measure threats to information assets; and discover where your organization is most vulnerable to black hat hackers. Some features of this course include:

  • Dual Certification - CEH and CPT
  • 5 days of Intensive Hands-On Labs
  • Expert Instruction
  • CTF exercises in the evening
  • Most up-to-date proprietary courseware available

$ grep ‘[hcj]ello’ *

Optional characters can also be specified by using a range consisting of two characters separated by a hyphen. The following example would match “bay,” “cay,” or “day.”

$ grep ‘[b-d]ay’ *

An optional character or range of characters can be preceded by a caret (^) to invert the sense of the match. The following would match any character proceeded by “ay” except the combinations “bay,” “cay,” and “day.”

$ grep ‘[^b-d]ay’ *

Note that options and ranges represent a match of a single character.

Any single character match (including a single character matched by a option/range specification) can be repeated by using the asterisk character (*). An asterisk following a single character means “zero or more occurrences” of the preceding match. The following search requests any line containing “hello” followed by “dolly” where the words are separated by zero or more spaces. Note that the asterisk follows the space after “hello” and therefore applies to the space character.

$ grep ‘hello *dolly’ *

This search would match any of the following, without regard to the number of spaces between the words.

hellodolly
hello dolly
hello                       dolly

The asterisk can be applied to an option or range. Following search matches “c” and “t” with any number of vowels (or no vowels) in between.

$ grep ‘c[aeiou]*t’ somewords.txt
cat
coat
coot
cot
cout
cut
ct
$

Extending grep
At this point grep and egrep depart from one another. egrep stands for extended grep. The POSIX 1003.2 standard defined a set of regular expression characters, called modern, extended, or full regular expressions. The regular expressions I cited earlier are frequently called older or basic regular expressions. There is some overlap between the two, and recent versions of grep can be made to behave like egrep by using the -E option.

The egrep utility uses extended regular expressions, with a useful one being the plus (+) character, which works like the asterisk (*) but means “one or more” rather than “zero or more.” Using egrep in the above example with a + instead of an * would cause the search to exclude “ct” because it doesn’t contain one or more vowels.

$ egrep ‘c[aeiou]+t’ somewords.txt
cat
coat
coot
cot
cout
cut
$

If you use grep to achieve the same results, the search pattern becomes clumsier. The next example asks for “c,” followed by any vowel, followed by zero or more occurrences of any vowel, followed by “t.”

$ grep ‘c[aeiou][aeiou]*t’ somewords.txt
cat
coat
coot
cot
cout
cut
$

The egrep utility also adds a question mark (?), meaning zero or one occurrence, as another version of multiple occurrence matching.

* = zero or more occurrences
+ = one or more occurrences
? = zero or one occurrence

The vertical bar (|) creates an “or” condition between two possible search patterns. In the following example, egrep searches for “c,” followed by one or more vowels, followed by “t,” or for “p” followed by one or more vowels, followed by “l.” Because the search string doesn’t specify that the word must end after the closing “t” or “l,” this example has matched “paula” and “paella,” as well as words that end in “l.”

$ egrep ‘c[aeiou]+t|p[aeiou]+l’ somewords.txt
cat
coat
coot
cot
cut
cet
cit
pal
paella
paul
paula
peal
peel
pool
$

You can fudge this with grep by entering multiple search patterns and inserting newlines in between the patterns. This can be used with egrep and fgrep as well, but I’m introducing it here simply to highlight the difficulty of imitating egrep with grep when it would be simpler to use egrep.

In the following example, the first part of the command is entered on one line, and then Enter is pressed while the single quotes are still open. The shell prompts for additional input and continues to accept lines until the closing quote appears. Each individual line represents a separate search string to grep. This trick is useful with any version of grep.

$ grep ‘c[aeiou][aeiou]*t
> p[aeiou][aeiou]*l’ somewords.txt
cat
coat
coot
cot
cut
cet
cit
pal
paella
paul
paula
peal
peel
pool
$

With egrep, simple parentheses can be used to group sections of a search pattern together. In the following example, the search pattern will match any of the words shown in the result list. The parentheses group “[Ss]ome” and “[Aa]ny” are optional strings, followed by “one.”

$ egrep ‘([Ss]ome|[Aa]ny)one’ somewords.txt
someone
Someone
anyone
Anyone
$

A single character can be modified by a bound, which consists of one or two comma-separated numbers, with the first number specifying the minimum number and the second specifying the maximum. egrep uses curly braces ({}) to specify a bound, while grep uses back-slashed curly braces ({}). These example matching strings of characters should clarify what I mean: