Events-csv Concepts: Difference between revisions
(23 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
* [[events-csv User Manual]] | * [[events-csv User Manual]] | ||
=Tokenization= | |||
The empty strings found between commas are interpreted as "[[#Missing_Value|missing value]]". For example: | |||
a, , b | |||
generates a data line with two values: <tt>"a"</tt> and <tt>"b"</tt>, separated by a [[#Missing_Value|missing value]]. | |||
The quoted empty strings found between commas are interpreted as empty strings. For example: | |||
a," ", b | |||
generates a data line with three values: <tt>"a"</tt>, <tt>" "</tt> and <tt>"b"</tt>. | |||
A line that ends in a comma generates a data line that has a [[#Missing_Value|missing value]] on the last position in line. | |||
Then a comma-separated value line is turned into a CSVEvent, the [[#Missing_Value|missing values]] as defined above are represented with null-valued properties. If the type of the value is known, then the missing value is represented as a property of the corresponding type with a null value. For example: | |||
# timestamp, count(int) | |||
12/21/16 14:00:00, | |||
will return a CSVEvent with a IntegerProperty "field_1". The value of the property will be null, which will carry [[#Missing_Value|missing value]] semantics. | |||
On the other hand, when the header is missing, so we don't have a way of knowing the missing value's type, the missing value is represented with a null-valued UndefinedTypeProperty. In the following case: | |||
a, , b | |||
the corresponding CSVEvent carries a "field_1" UndefinedTypeProperty. which carries a null value. | |||
=Missing Value= | |||
=Headers= | |||
The CSV parsers understand in-line header lines. A header line must start with # and must contain [[#CSV_Field_Specification|CSV field specifications]]. | |||
The headers can be extracted from a CSV stream with the [[Events-csv_User_Manual#headers|'headers' command]]. | |||
If a valid timed line ''immediately'' follows the header, the header will contain the timestamp for that line, expressed as [[Time#Millisecond_POSIX_Time|millisecond POSIX Time]], accessible as a "next-timed-event-timestamp" property. If a non-timed line follows, then the header event won't be expected to maintain a timestamp, even if subsequent lines are timed. | |||
=CSV Format= | =CSV Format= | ||
As mentioned above, [[#Headers|headers]] can be specified in-line. A header is prefixed with '#' and specifies the fields: | |||
# timestamp(MM/dd/yy HH:mm:ss), collection-type(string), heap-occupancy(long) | |||
Multiple headers are supported in the CSV line stream, and the parser adjust upon receiving a header, by parsing the data lines according to the latest header seen on the stream. | |||
Comment lines are not allowed. | |||
==CSV Field== | ==CSV Field== | ||
===CSV Field Specification=== | ===CSV Field Specification=== | ||
====Timestamp==== | |||
"timestamp", "timestamp(yy/MM/dd HH:mm:ss)", "timestamp(time:yy/MM/dd HH:mm:ss)" | "timestamp", "timestamp(yy/MM/dd HH:mm:ss)", "timestamp(time:yy/MM/dd HH:mm:ss)" | ||
The timestamp can be specified as long representing [[Events-api_Concepts#Timestamp_Value|UTC milliseconds]], or as a string formatted using common date/time formats. | |||
"something(string)" | If the UTC milliseconds is used, the CSV format/header should be specified as follows: | ||
# timestamp(long), ... | |||
====String Fields==== | |||
"something", "something(string)" | |||
====Integer Fields==== | |||
"something(int)" | "something(int)" | ||
====Long Fields==== | |||
"something(long)" | "something(long)" | ||
====Float Fields==== | |||
"something(float)" | "something(float)" | ||
====Double Fields==== | |||
"something(double)" | "something(double)" | ||
"something(time)" | "something(time)" |
Latest revision as of 19:45, 22 September 2017
Internal
Tokenization
The empty strings found between commas are interpreted as "missing value". For example:
a, , b
generates a data line with two values: "a" and "b", separated by a missing value.
The quoted empty strings found between commas are interpreted as empty strings. For example:
a," ", b
generates a data line with three values: "a", " " and "b".
A line that ends in a comma generates a data line that has a missing value on the last position in line.
Then a comma-separated value line is turned into a CSVEvent, the missing values as defined above are represented with null-valued properties. If the type of the value is known, then the missing value is represented as a property of the corresponding type with a null value. For example:
# timestamp, count(int) 12/21/16 14:00:00,
will return a CSVEvent with a IntegerProperty "field_1". The value of the property will be null, which will carry missing value semantics.
On the other hand, when the header is missing, so we don't have a way of knowing the missing value's type, the missing value is represented with a null-valued UndefinedTypeProperty. In the following case:
a, , b
the corresponding CSVEvent carries a "field_1" UndefinedTypeProperty. which carries a null value.
Missing Value
Headers
The CSV parsers understand in-line header lines. A header line must start with # and must contain CSV field specifications.
The headers can be extracted from a CSV stream with the 'headers' command.
If a valid timed line immediately follows the header, the header will contain the timestamp for that line, expressed as millisecond POSIX Time, accessible as a "next-timed-event-timestamp" property. If a non-timed line follows, then the header event won't be expected to maintain a timestamp, even if subsequent lines are timed.
CSV Format
As mentioned above, headers can be specified in-line. A header is prefixed with '#' and specifies the fields:
# timestamp(MM/dd/yy HH:mm:ss), collection-type(string), heap-occupancy(long)
Multiple headers are supported in the CSV line stream, and the parser adjust upon receiving a header, by parsing the data lines according to the latest header seen on the stream.
Comment lines are not allowed.
CSV Field
CSV Field Specification
Timestamp
"timestamp", "timestamp(yy/MM/dd HH:mm:ss)", "timestamp(time:yy/MM/dd HH:mm:ss)"
The timestamp can be specified as long representing UTC milliseconds, or as a string formatted using common date/time formats.
If the UTC milliseconds is used, the CSV format/header should be specified as follows:
# timestamp(long), ...
String Fields
"something", "something(string)"
Integer Fields
"something(int)"
Long Fields
"something(long)"
Float Fields
"something(float)"
Double Fields
"something(double)"
"something(time)"