AWK as Grep

AWK as Grep

Structure of AWK programs

AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.
— Alfred V. Aho[13]

An AWK program is a series of pattern action pairs, written as:

BEGIN {
  # init code goes here
}

# "body" of the script follows:

condition 1 or /pattern-1/ {
  # action1 - what to do with the line matching the pattern?
}

condition n or /pattern-n/ {
  # action 1 - what to do with the line matching the pattern? ...
}

END {
  # finalizing
}
  • pattern is a regular expression, numeric expression, string expression or combination
  • action is executable code, similar to C

Each line is being exploded into columns based on the separator which by default is any number of consecutive white characters. One can change it via the -F switch or by assigning the FS variable inside the BEGIN area.

The "columns" that lines are being exploded into can be accessed via the special variables:

$0 # the whole line
$1 # first column
$2 # second column
...
$n # nth column

AWK as linux grep

Table below is basic cheatsheet how use awk as linux grep:

awk command Description
awk '{print $1}' file Print first field for each record in file
awk '/regex/' file Print only lines that match regex in file
awk '!/regex/' file Print only lines that do not match regex in file
awk '$2 == "foo"' file Print any line where field 2 is equal to "foo" in file
awk '$2 != "foo"' file Print lines where field 2 is NOT equal to "foo" in file
awk '$1 ~ /regex/' file Print line if field 1 matches regex in file
awk '$1 !~ /regex/' file Print line if field 1 does NOT match regex in file

awk '/search_pattern/ { action_to_take_on_matches; another_action; }' file_to_parse

Basic search

For most of the straight forward use cases, you can just use grep to match multiple strings or patterns but for complex use cases, we may consider awk as an alternative. The basic syntax to match a single PATTERN with awk would be:

awk '/PATTERN/' FILE

case-insensitive search

To perform case-insensitive search of a string or pattern we can use below syntax:

awk 'BEGIN{IGNORECASE=1} /PATTERN1|PATTERN2|PATTERN3/' FILE

Match multiple patterns with OR condition

To match multiple patterns:

awk '/PATTERN1|PATTERN2|PATTERN3/' FILE

| in regular-expressions means logical function or

For example to grep for all the lines having Error or Warning in /var/log/messages we can use:

awk '/Error|warning/' /var/log/messages

But to perform case-insensitive we will use IGNORECASE in this example:

awk 'BEGIN{IGNORECASE=1} /Error|warning/' /var/log/messages

Search for multiple patterns with AND condition

In the above example, we are searching for pattern with OR condition i.e. if either of the multiple provided strings are found, print the respective matched line. But to print the lines when all the provided PATTERN match, we must use AND operator. The syntax would be:

awk '/PATTERN1/ && /PATTERN2/ && /PATTERN3/' FILE

Now we will use this syntax to search for lines containing "Success" and "activated" in our /tmp/somefile

~] awk '/Success/ && /activated/' /tmp/somefile
<span class="s font-weight-bold">Success</span>fully **activated** sshd service 
<span class="s font-weight-bold">Success</span>fully **activated** httpd service

To perform case-insensitive search we will use below syntax:

awk 'BEGIN{IGNORECASE=1} /PATTERN1/ && /PATTERN2/ && /PATTERN3/' FILE

Now we use this syntax in our example:

~] awk 'BEGIN{IGNORECASE=1}; /success/ && /activated/' /tmp/somefile    
<span class="s font-weight-bold">Success</span>fully **activated** sshd service     
<span class="s font-weight-bold">Success</span>fully **activated** httpd service    

Exclude multiple patterns with awk

We can also exclude certain pre-defined patterns from the search. The general syntax would be:

awk '!/PATTERN1/ && !/PATTERN2/ && !/PATTERN3/' FILE

In this syntax we want to exclude all the three PATTERNS from the search. You can add or remove more patterns in the syntax based on your requirement.

For example, to print all the lines except the ones containing activated

~] awk '!/activated/' /tmp/somefile
Successfully reloaded service
Successfully stopped service
Successfully enabled service

SUBSCRIBE FOR NEW ARTICLES

@
comments powered by Disqus