Awk: Difference between revisions
(→TODO) |
|||
(28 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
* [[Linux#Commands|Linux]] | * [[Linux#Commands|Linux]] | ||
* [[sed#Overview|sed]] | * [[sed#Overview|sed]] | ||
=TODO= | |||
<font color=darkgray size='+3'>Continue with, and deplete: http://localhost:9627/personal/Wiki.jsp?page=Awk</font>. | |||
=Overview= | =Overview= | ||
Line 35: | Line 39: | ||
The fields are referred to with $<field-number> where field-number is 1-based: the first field in the record is $1. | The fields are referred to with $<field-number> where field-number is 1-based: the first field in the record is $1. | ||
=Field Separator= | =Field Separator= | ||
The default field separator is white space. It can be changed either in command line or program. | The default field separator is white space. It can be changed either in command line or program by using the FS [[#Built-in_Variables|built-in variable]]. | ||
Command line: | Command line: | ||
Line 53: | Line 53: | ||
=Program Structure= | =Program Structure= | ||
==Comments== | |||
Everything that follows a '#' is a comment. The '#' does not have to be on the first position in line. | |||
==Built-in Variables== | |||
In some cases, the variable name must be prefixed with '$' (the field variables and the entire record variables $0, $1, ..., the last field in the input record $NF). The vast majority of variables must not be prefixed by '$'. | |||
{| | |||
| $1, $2, $3 ... || Corresponding fields in the record, 1-based. | |||
|- | |||
| $0 || The entire record. | |||
|- | |||
| FS || The [[#Field_Separator|field separator]]. | |||
|- | |||
| RS || The "record separator". By default is "newline". | |||
|- | |||
| NR || Keeps a current count of the number of input records. | |||
|- | |||
| NF || Keeps a count of the number of fields in an input record. The last field in the input record can be designated by $NF. | |||
|- | |||
| FILENAME || Contains the name of the current input file. | |||
|- | |||
| OFS || The "output field separator", which separates the fields when awk prints them. By default is "space". | |||
|- | |||
| ORS || The "output record separator", which separates the records when awk prints them. By default is "new line". | |||
|- | |||
| OFMT || Stores the format for numeric output. The default format is "%.6g". | |||
|- | |||
|} | |||
==Program Content== | |||
[condition1] { action 1 } | |||
[condition2] { action 2 } | |||
[condition3] { action 3 } | |||
==Conditions== | |||
''condition'' { ... } | |||
==Actions== | |||
... { ''action'' } | |||
The same block allows for multiple statements, separated by semicolons: | |||
... { print $1; print $2; print $3 } | |||
==Custom Variables== | |||
<tt>awk</tt> allows for custom variables. All variables are stored internally as strings. However, <tt>awk</tt> allows mathematical operations to be performed on those variables as long as the variable contains a valid numeric string. <tt>awk</tt> automatically takes care of the string-to-number conversion. Variables are declared in [[#Action|actions]] and can be used from different actions: | |||
{ a = "something" } { print a } | |||
<font color=darkgray>The behavior is different if we assign a string to a variable (a="true") from when assigning a non-string: (a=true). Investigate this.</font> | |||
The variables can be used without any special syntax, like prefixing with '$'. Specifying the name of the variable is sufficient to dereference the variable. | |||
===Boolean Variables=== | |||
A variable that has been defined as "1" makes a boolean expression to evaluate to true: | |||
BEGIN { cond=1 } | |||
{ | |||
if (cond} { print } | |||
} | |||
will always print. | |||
===Passing Variables Assigned in the Surrounding Script=== | |||
Use <tt>-v</tt> command line option: | |||
awk -v var_a="blah" '{print var_a}' | |||
For multiple variable definitions, "-v" should be used repeatedly: | |||
awk -v var_a="blaha" -v var_b="blahb" '{print var_a var_b}' | |||
===Using Variables in Expressions=== | |||
Using a string variable in an expression: simply concatenate it. | |||
BEGIN { bean_name="blah" } | |||
{ | |||
if ($2 == "name=\""bean_name"\"") | |||
{ | |||
print | |||
} | |||
} | |||
===Using Variables in Regular Expressions (Dynamic Regexps)=== | |||
The right hand side of a `~' or `!~' operator need not be a regexp constant (i.e. a string of characters between slashes). It may be any expression. The expression is evaluated, and converted if necessary to a string; the contents of the string are used as the regexp. A regexp that is computed in this way is called a dynamic regexp: | |||
BEGIN { identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+" } | |||
$0 ~ identifier_regexp { print } | |||
===Another Way of Building Dynamic Regular Expressions Without Using Variables=== | |||
We are using bash variables instead: | |||
bash_var="something" | |||
awk '/^'${bash_var}'.*/ { print }' ./some_file.txt | |||
This command matches and display all lines that start with "something..." | |||
===Numeric Variables=== | |||
Expression similar to: | |||
a = a + 1 | |||
or | |||
a += 1 | |||
are valid. | |||
If you want to increment a numeric variable (or interact with any kind of variable) over the main processing loop, then you need to declare that variable in the BEGIN section: | |||
BEGIN { a=1; } | |||
{ | |||
if (...something...) | |||
{ | |||
a += 1; | |||
} | |||
} | |||
=Recipes= | =Recipes= | ||
* [[awk Turn on printing on a regular expression and turn it off on another|Turn on printing on a regular expression and turn it off on another]]. |
Latest revision as of 00:05, 15 June 2018
External
- http://en.wikipedia.org/wiki/Awk
- Awk by Example
Internal
TODO
Continue with, and deplete: http://localhost:9627/personal/Wiki.jsp?page=Awk.
Overview
awk handles a stream of text as a sequence of records. The default record separator is the new line, so by default each line is handled as a record. Each record is broken up into a sequence of fields. By default, the field separator is white space. An awk program consists in condition-action statements, that are applied to the records, as they are fed into awk. Each record is scanned for the condition, which can be a pattern, among other things, and for each condition that matches, the associated action is executed.
awk '<program>' <file-to-process>
The program is a succession of:
condition { action }
Example:
awk '{print $1}' ./sample.txt
For the above, the condition matches all records and the action prints out the first field. More details about the syntax are available in Program Structure.
The program can be specified in a separate text file, which is provided to awk by preceding the program file name with -f:
awk -f <program-file-name> <file-to-process>
Referring Fields
The fields are referred to with $<field-number> where field-number is 1-based: the first field in the record is $1.
Field Separator
The default field separator is white space. It can be changed either in command line or program by using the FS built-in variable.
Command line:
awk -F":" ....
In program:
awk 'BEGIN {FS=":"} {print $1}' ...
Program Structure
Comments
Everything that follows a '#' is a comment. The '#' does not have to be on the first position in line.
Built-in Variables
In some cases, the variable name must be prefixed with '$' (the field variables and the entire record variables $0, $1, ..., the last field in the input record $NF). The vast majority of variables must not be prefixed by '$'.
$1, $2, $3 ... | Corresponding fields in the record, 1-based. |
$0 | The entire record. |
FS | The field separator. |
RS | The "record separator". By default is "newline". |
NR | Keeps a current count of the number of input records. |
NF | Keeps a count of the number of fields in an input record. The last field in the input record can be designated by $NF. |
FILENAME | Contains the name of the current input file. |
OFS | The "output field separator", which separates the fields when awk prints them. By default is "space". |
ORS | The "output record separator", which separates the records when awk prints them. By default is "new line". |
OFMT | Stores the format for numeric output. The default format is "%.6g". |
Program Content
[condition1] { action 1 } [condition2] { action 2 } [condition3] { action 3 }
Conditions
condition { ... }
Actions
... { action }
The same block allows for multiple statements, separated by semicolons:
... { print $1; print $2; print $3 }
Custom Variables
awk allows for custom variables. All variables are stored internally as strings. However, awk allows mathematical operations to be performed on those variables as long as the variable contains a valid numeric string. awk automatically takes care of the string-to-number conversion. Variables are declared in actions and can be used from different actions:
{ a = "something" } { print a }
The behavior is different if we assign a string to a variable (a="true") from when assigning a non-string: (a=true). Investigate this.
The variables can be used without any special syntax, like prefixing with '$'. Specifying the name of the variable is sufficient to dereference the variable.
Boolean Variables
A variable that has been defined as "1" makes a boolean expression to evaluate to true:
BEGIN { cond=1 } { if (cond} { print } }
will always print.
Passing Variables Assigned in the Surrounding Script
Use -v command line option:
awk -v var_a="blah" '{print var_a}'
For multiple variable definitions, "-v" should be used repeatedly:
awk -v var_a="blaha" -v var_b="blahb" '{print var_a var_b}'
Using Variables in Expressions
Using a string variable in an expression: simply concatenate it.
BEGIN { bean_name="blah" } { if ($2 == "name=\""bean_name"\"") { print } }
Using Variables in Regular Expressions (Dynamic Regexps)
The right hand side of a `~' or `!~' operator need not be a regexp constant (i.e. a string of characters between slashes). It may be any expression. The expression is evaluated, and converted if necessary to a string; the contents of the string are used as the regexp. A regexp that is computed in this way is called a dynamic regexp:
BEGIN { identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+" } $0 ~ identifier_regexp { print }
Another Way of Building Dynamic Regular Expressions Without Using Variables
We are using bash variables instead:
bash_var="something" awk '/^'${bash_var}'.*/ { print }' ./some_file.txt
This command matches and display all lines that start with "something..."
Numeric Variables
Expression similar to:
a = a + 1
or
a += 1
are valid.
If you want to increment a numeric variable (or interact with any kind of variable) over the main processing loop, then you need to declare that variable in the BEGIN section:
BEGIN { a=1; } { if (...something...) { a += 1; } }