General security

Grep Essentials

grep

The grep utility, which allows files to be searched for strings of words, uses a syntax similar to the regular expression syntax of the vi, ex, ed, and sed editors. grep comes in three flavors, grep, fgrep, and egrep, all of which I'll cover in this article.

The name grep is derived from the editor command g/re/p, which literally translates to "globally search for a regular wxpression and print what you find." Regular expressions are at the core of grep, and I'll cover them after a brief description of some of the utility's command options.

The simplest grep command is grep (search pattern) (files list), as in:

grep hello *

The output of this command might be something like this:

$ grep hello *

story.txt: so I said hello and she smiled back

intro.txt: use the hello.c program as an example of C programming

grep is case sensitive, so in order to change the search to include "hello," "Hello," or "HELLO," use the -y or -i option. Earlier versions of grep used -y, and later versions use -i. -y is now considered obsolete, although some versions of grep do support both. In the following example, more hellos show up because the search is case independent.

$ grep -i hello *

story.txt: so I said hello and she smiled back

story.txt: I could hear my echo, "HELLO."

intro.txt: use the hello.c program as an example of C programming hello.c: printf("Hello, world. n");

This command searches all files in the current directory and prints the file name and the line containing the string "hello" for any files that contain that string.

The output of grep varies depending on whether you're searching one or several files. If only one file is named on the command line, the output doesn't include the file name, as in the following example:

$ grep -i hello hello.c

printf("Hello, world. n");

The one-file rule applies whether you use a wild card in your file list or not. If hello.c were the only file in the current directory, using a wild card to locate the file would still produce an unnamed file output. In the following example, the user is searching for any C files containing "hello." There is only one C file in the directory, so the output is identical to the previous example.

$ grep -i hello *.c

printf("Hello, world. n");

I don't know of a grep that has a work-around for this behavior, but you could use the -l option instead, which prints the file name only and not the line containing the string. At least you would know the name of the file that contained the string.

$ grep -il hello *.c

hello.c:

The -l option can be used to extract a list of files containing the string. The file name is printed only once, even though the string may appear in multiple lines within that file. In the following example, story.txt appears only once, even though it contains more than one "hello."

$ grep -il hello *

hello.c:

intro.txt:

story.txt:

The -l option suppresses most of the other output options from grep. On the other hand, the -n option will print a line number as well as the text, as in the following example:

$ grep -in hello *

hello.c:7: printf("Hello, world. n");

intro.txt:44: use the hello.c program as an example of C programming

story.txt:110: so I said hello and she smiled back

story.txt:187: I could hear my echo, "HELLO."

The -v option outputs the complement of the search, i.e., all lines not containing the requested search pattern.

$ grep -iv hello intro.txt

You will be able to get more practice if you

at its simplest

The -c option prints only a count of lines matched. It also has the interesting and useful side effect of listing all the files it searches, not just the successful hits.

$ grep -ic hello *

data.txt:0

hello.c:1:

intro.txt:1

intro2.txt:0

story.txt:2

Some versions of grep come with -r as an option, which prompts grep to search recursively through subdirectories. The default behavior is to search only one directory, so the -r option, as provided in GNU and other implementations of grep, is the exception rather than the rule.

Going wild with grep
So far I've covered some of the input and output options, but the real power of grep is in its search pattern, which uses regular expressions. grep can match simple strings, as we saw in the "hello" example we played with above; but it can also use a variety of wild cards and special symbols to create a regular expression to search for more complex strings.

I will begin with some of the simpler characters in a regular expression. A ^ (caret) character means the start of a line and a $ (dollar) character means the end of one.

The wild cards used by grep frequently clash with the special symbols that the shell uses, so the usual practice is to enclose complex search strings within single quotes. The two following examples would match any case version of "hello" at the start and end of a line, respectively.

$ grep '^hello' *

$ grep 'hello$' *

The dot or period character (.) will match any single character. For example, the following would match any character followed by "ello," as in "aello," "bello," "cello," and so on all the way through "zello." Odd combinations, like "1ello" and "?ello," would also be included; any combination of one initial character followed by "ello" is valid. The dot does not match the beginning or end of a line; therefore, "ello" at the start of a line would not be matched.

$ grep '.ello' *

Optional characters can be enclosed in square brackets ([ ]) causing any of the enclosed characters to be matched. The following search string would match "hello," "cello," or "jello."

$ grep '[hcj]ello' *

Optional characters can also be specified by using a range consisting of two characters separated by a hyphen. The following example would match "bay," "cay," or "day."

$ grep '[b-d]ay' *

An optional character or range of characters can be preceded by a caret (^) to invert the sense of the match. The following would match any character proceeded by "ay" except the combinations "bay," "cay," and "day."

$ grep '[^b-d]ay' *

Note that options and ranges represent a match of a single character.

Any single character match (including a single character matched by a option/range specification) can be repeated by using the asterisk character (*). An asterisk following a single character means "zero or more occurrences" of the preceding match. The following search requests any line containing "hello" followed by "dolly" where the words are separated by zero or more spaces. Note that the asterisk follows the space after "hello" and therefore applies to the space character.

$ grep 'hello *dolly' *

This search would match any of the following, without regard to the number of spaces between the words.

hellodolly

hello dolly

The asterisk can be applied to an option or range. Following search matches "c" and "t" with any number of vowels (or no vowels) in between.

$ grep 'c[aeiou]*t' somewords.txt

cat

coat

coot

cot

cout

cut

Extending grep
At this point grep and egrep depart from one another. egrep stands for extended grep. The POSIX 1003.2 standard defined a set of regular expression characters, called modern, extended, or full regular expressions. The regular expressions I cited earlier are frequently called older or basic regular expressions. There is some overlap between the two, and recent versions of grep can be made to behave like egrep by using the -E option.

The egrep utility uses extended regular expressions, with a useful one being the plus (+) character, which works like the asterisk (*) but means "one or more" rather than "zero or more." Using egrep in the above example with a + instead of an * would cause the search to exclude "ct" because it doesn't contain one or more vowels.

$ egrep 'c[aeiou]+t' somewords.txt

cat

coat

coot

cot

cout

cut

If you use grep to achieve the same results, the search pattern becomes clumsier. The next example asks for "c," followed by any vowel, followed by zero or more occurrences of any vowel, followed by "t."

$ grep 'c[aeiou][aeiou]*t' somewords.txt

cat

coat

coot

cot

cout

cut

The egrep utility also adds a question mark (?), meaning zero or one occurrence, as another version of multiple occurrence matching.

* = zero or more occurrences

+ = one or more occurrences

? = zero or one occurrence

The vertical bar (|) creates an "or" condition between two possible search patterns. In the following example, egrep searches for "c," followed by one or more vowels, followed by "t," or for "p" followed by one or more vowels, followed by "l." Because the search string doesn't specify that the word must end after the closing "t" or "l," this example has matched "paula" and "paella," as well as words that end in "l."

$ egrep 'c[aeiou]+t|p[aeiou]+l' somewords.txt

cat

coat

coot

cot

cut

cet

cit

pal

paella

paul

paula

peal

peel

pool

You can fudge this with grep by entering multiple search patterns and inserting newlines in between the patterns. This can be used with egrep and fgrep as well, but I'm introducing it here simply to highlight the difficulty of imitating egrep with grep when it would be simpler to use egrep.

In the following example, the first part of the command is entered on one line, and then Enter is pressed while the single quotes are still open. The shell prompts for additional input and continues to accept lines until the closing quote appears. Each individual line represents a separate search string to grep. This trick is useful with any version of grep.

$ grep 'c[aeiou][aeiou]*t

> p[aeiou][aeiou]*l' somewords.txt

cat

coat

coot

cot

cut

cet

cit

pal

paella

paul

paula

peal

peel

pool

With egrep, simple parentheses can be used to group sections of a search pattern together. In the following example, the search pattern will match any of the words shown in the result list. The parentheses group "[Ss]ome" and "[Aa]ny" are optional strings, followed by "one."

$ egrep '([Ss]ome|[Aa]ny)one' somewords.txt

someone

Someone

anyone

Anyone

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.

Get Your Plan

A single character can be modified by a bound, which consists of one or two comma-separated numbers, with the first number specifying the minimum number and the second specifying the maximum. egrep uses curly braces ({}) to specify a bound, while grep uses back-slashed curly braces ({}). These example matching strings of characters should clarify what I mean:

Posted: April 28, 2011

Darren Dalasta

View Profile

Darren Dalasta is an accomplished digital strategist and growth marketing leader with almost 20 years of experience in SEO, demand generation and product management. Darren leads growth marketing strategy at Infosec, where he focuses on implementing scalable digital strategies that generate sales-ready leads, shorten the time-to-purchase journey and reduce churn. Previously, Darren ran digital marketing at WhitePages.com where he doubled search traffic for the company’s Top 50 global site and was among one of the first 100 Google Adwords Qualified Professionals. He joined Infosec in 2010 and has since grown the marketing team from one staff person to 18. Darren holds a Bachelor of Science in Marketing from University of Wisconsin-Madison and lives in the Pacific Northwest where he spends as much time in the mountains as possible.

Website Linkedin

Grep Essentials

Get certified and advance your career