Events User Manual - Parse Command: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
No edit summary
 
(14 intermediate revisions by the same user not shown)
Line 15: Line 15:
=Input Format Specification=
=Input Format Specification=


Input Formats: -----------------------------------------------------------------------------------
The format of the input stream can be specified either verbatim, in-line in the command line or in a file whose name is specified on the command line, or by a name.


The input stream format is required. It can be specified either on command line with:
==In-Line Input Formats==


    -i|--input-format="..."
To specify a verbatim format in line, use:


where the input format is specified in-line, or with:
<pre>
-i format-specification
</pre>


    --input-format-file=<file-name-that-contains-input-format>
or


where the input format is defined in the file name referred by the command line option.
<pre>
--input-format=format-specification
</pre>


Common input formats are httpd log, CSV header definitions or garbage collection log formats.
If the format specification is stored in a file on an accessible filesystem, it can be specified as follows:
Examples:


  httpd log format:  "%h %u [%t] \"%r\" \"%q\" %{c,JSESSIONID} %{i,Some-Request-Header} %s %b %D"
<pre>
  CSV format:        "timestamp, count, status-code"
--input-format-file=<file-name-that-contains-input-format>
</pre>


'events' will try to apply heuristics and figure out what type of format was specified.  For more
Formats that can be specified in-line usually apply to line-based logs, such as Apache httpd logs, or CSV files. A httpd log format can be specified in-line, introduced by the -i short option -i (or with the long form equivalent --input-format="...") as follows:
details about input formats, see "Input Formats" section below.


httpd Input Format: ------------------------------------------------------------------------------
<pre>
-i "%h %u [%t] \"%r\" \"%q\" %{c,JSESSIONID} %{i,Some-Request-Header} %s %b %D"
</pre>


  When specified on command line, quotes must be escaped: \"%r\". When specified in a format file,
More details about the httpd log format support can be found in the "[[events User Manual - Apache httpd Logs#Input_Format|Apache httpd log format]]" section. A CSV file format can be specified in-line as follows:
  quotes do NOT need to be escaped; the following specification is legal and it will be parsed
  correctly: "%r". HTTP special characters can also be used (&quot;).


  %A - The local IP address.
<pre>
  %b - Response entity body size. Stored as long.
-i "timestamp, count, status-code"
  %D - The time taken to serve the request. WildFly logs the time in milliseconds for %D, while
</pre>
      Apache httpd logs the time in microseconds for the same %D.
  %h - Remote host name or IP address. Will log the IP address if HostnameLookups is set to Off,
      which is the default.
  %I - The name of the thread processing the request. Note that this is actually the WildFly
      convention, not Apache httpd convention (Apache httpd logs "bytes received, including
      request and headers" for %I)
  %l - Remote logname from identd (if supplied).
  %u - Remote user if the request was authenticated. May be irrelevant if return status (%s) is
      401 (unauthorized).
  %P - The PID of the process that serviced the request.
  %q - The query string, excluding the '?' character. Usually enclosed in quotes.
  %r - First line of request. Note that the first line is enclosed in quotes, you must explicitly
      specify the \" (double quotes) or ' (single quote) format elements.
  %s - The status code of the original request (whether was internally redirected or not). Stored
      as integer.
  %>s - The status code of the final request (whether was internally redirected or not). Stored as
      integer.
  %S - Bytes transferred (received and sent), including request and headers.
  %t - Time the request was received, in the format [18/Sep/2015:19:18:28 -0400]. The last number
      indicates the timezone offset from GMT. Usually the timestamp format specification is
      explicitly declared between brackets, but if the brackets are not present in the log format
      specification, they are implied: WildFly will enclose the generated timestamp between
      brackets.
  %T - Time taken to process the request, in seconds.
  %v - The local server name.


  The parser can be instructed to drop the log values for certain fields, by using the %?
More details about the CSV format support can be found in the "[[Events_User_Manual_-_CSV_Support#Input_Format#Input_Format|CSV format]]" section.
  format string:


  %? - Instructs the parser to ignore (drop) the corresponding value.
When an in-line format is used, the runtime will apply heuristics and try to figure out what type of format was specified.


CSV Input Format: --------------------------------------------------------------------------------
==Named Formats==


The CSV input format consists in comma separated header names, with optional type and format
A format specified by its logical name can be provided as follows:
information. If not type or format information are specified, the input fields are handled as
strings. The general syntax is:


  <field-name>(<type>[:format])
<pre>
-i format-name
</pre>


If a value contains commas, it must be enclosed in double quotes to be parsed correctly, otherwise
or
the parser will interpret comma as a field separator.


Examples: ----------------------------------------------------------------------------------------
<pre>
--input-format=format-name
</pre>


  path - the content of the field be read verbatim and handled as string.
=Supported Formats=


  timestamp(time:yy/MM/dd HH:mm:ss,SSS) - the content of the field will be parsed as time
<blockquote style="background-color: #f9f9f9; border: solid thin lightgrey;">
  information according to the format specified between (time:...). The format follows Java
:[[events User Manual - Apache httpd Logs#Input_Format|Apache httpd log format]]
  SimpleDateFormat conventions. If the content cannot be converted into a time stamp, a FaultEvent
:[[events User Manual - CSV Support#Input_Format|CSV format]]
  will be generated and sent down the pipeline. For more details see:
:[[events User Manual - Java Garbage Collection Logs#G1|G1 Garbage Collection Log]]
  http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html
</blockquote>
 
  status-code(int) - the content of the field will be interpreted as integer and parsed
  accordingly. If the content cannot be converted to an integer, a FaultEvent will be generated
  and sent down the pipeline.
 
  body-size(long) - the content of the field will be interpreted as long and parsed accordingly.
  If the content cannot be converted to a long, a FaultEvent will be generated and sent down the
  pipeline.

Latest revision as of 23:15, 14 February 2017

Internal

Overview

The command configures an events pipeline to process text data arriving at stdout and convert it into a time series at stdout, according to the specified format.

Syntax

events < input-file <input-format-specification>

Input Format Specification

The format of the input stream can be specified either verbatim, in-line in the command line or in a file whose name is specified on the command line, or by a name.

In-Line Input Formats

To specify a verbatim format in line, use:

-i format-specification

or

--input-format=format-specification

If the format specification is stored in a file on an accessible filesystem, it can be specified as follows:

--input-format-file=<file-name-that-contains-input-format>

Formats that can be specified in-line usually apply to line-based logs, such as Apache httpd logs, or CSV files. A httpd log format can be specified in-line, introduced by the -i short option -i (or with the long form equivalent --input-format="...") as follows:

-i "%h %u [%t] \"%r\" \"%q\" %{c,JSESSIONID} %{i,Some-Request-Header} %s %b %D"

More details about the httpd log format support can be found in the "Apache httpd log format" section. A CSV file format can be specified in-line as follows:

-i "timestamp, count, status-code"

More details about the CSV format support can be found in the "CSV format" section.

When an in-line format is used, the runtime will apply heuristics and try to figure out what type of format was specified.

Named Formats

A format specified by its logical name can be provided as follows:

-i format-name

or

--input-format=format-name

Supported Formats

Apache httpd log format
CSV format
G1 Garbage Collection Log