Latest revision as of 19:45, 22 September 2017

Internal

events-csv User Manual

Tokenization

The empty strings found between commas are interpreted as "missing value". For example:

a, , b

generates a data line with two values: "a" and "b", separated by a missing value.

The quoted empty strings found between commas are interpreted as empty strings. For example:

a,"   ", b

generates a data line with three values: "a", " " and "b".

A line that ends in a comma generates a data line that has a missing value on the last position in line.

Then a comma-separated value line is turned into a CSVEvent, the missing values as defined above are represented with null-valued properties. If the type of the value is known, then the missing value is represented as a property of the corresponding type with a null value. For example:

# timestamp, count(int)
12/21/16 14:00:00,

will return a CSVEvent with a IntegerProperty "field_1". The value of the property will be null, which will carry missing value semantics.

On the other hand, when the header is missing, so we don't have a way of knowing the missing value's type, the missing value is represented with a null-valued UndefinedTypeProperty. In the following case:

a, , b

the corresponding CSVEvent carries a "field_1" UndefinedTypeProperty. which carries a null value.

Missing Value

Headers

The CSV parsers understand in-line header lines. A header line must start with # and must contain CSV field specifications.

The headers can be extracted from a CSV stream with the 'headers' command.

If a valid timed line immediately follows the header, the header will contain the timestamp for that line, expressed as millisecond POSIX Time, accessible as a "next-timed-event-timestamp" property. If a non-timed line follows, then the header event won't be expected to maintain a timestamp, even if subsequent lines are timed.

CSV Format

As mentioned above, headers can be specified in-line. A header is prefixed with '#' and specifies the fields:

# timestamp(MM/dd/yy HH:mm:ss), collection-type(string), heap-occupancy(long)

Multiple headers are supported in the CSV line stream, and the parser adjust upon receiving a header, by parsing the data lines according to the latest header seen on the stream.

Comment lines are not allowed.

CSV Field

CSV Field Specification

Timestamp

"timestamp", "timestamp(yy/MM/dd HH:mm:ss)", "timestamp(time:yy/MM/dd HH:mm:ss)"

The timestamp can be specified as long representing UTC milliseconds, or as a string formatted using common date/time formats.

If the UTC milliseconds is used, the CSV format/header should be specified as follows:

# timestamp(long), ...

String Fields

"something", "something(string)"

Integer Fields

"something(int)"

Long Fields

"something(long)"

Float Fields

"something(float)"

Double Fields

"something(double)"

"something(time)"

@@ Line 2: / Line 2: @@
 * [[events-csv User Manual]]
+=Tokenization=
+The empty strings found between commas are interpreted as "[[#Missing_Value|missing value]]". For example:
+ a, , b
+generates a data line with two values: <tt>"a"</tt> and <tt>"b"</tt>, separated by a [[#Missing_Value|missing value]].
+The quoted empty strings found between commas are interpreted as empty strings. For example:
+ a,"   ", b
+generates a data line with three values: <tt>"a"</tt>, <tt>"   "</tt> and <tt>"b"</tt>.
+A line that ends in a comma generates a data line that has a [[#Missing_Value|missing value]] on the last position in line.
+Then a comma-separated value line is turned into a CSVEvent, the [[#Missing_Value|missing values]] as defined above are represented with null-valued properties. If the type of the value is known, then the missing value is represented as a property of the corresponding type with a null value. For example:
+ # timestamp, count(int)
+/21/16 14:00:00,
+will return a CSVEvent with a IntegerProperty "field_1". The value of the property will be null, which will carry [[#Missing_Value|missing value]] semantics.
+On the other hand, when the header is missing, so we don't have a way of knowing the missing value's type, the missing value is represented with a null-valued UndefinedTypeProperty. In the following case:
+ a, , b
+the corresponding CSVEvent carries a "field_1" UndefinedTypeProperty. which carries a null value.
+=Missing Value=
+=Headers=
+The CSV parsers understand in-line header lines. A header line must start with # and must contain [[#CSV_Field_Specification|CSV field specifications]].
+The headers can be extracted from a CSV stream with the [[Events-csv_User_Manual#headers|'headers' command]].
+If a valid timed line ''immediately'' follows the header, the header will contain the timestamp for that line, expressed as [[Time#Millisecond_POSIX_Time|millisecond POSIX Time]], accessible as a "next-timed-event-timestamp" property. If a non-timed line follows, then the header event won't be expected to maintain a timestamp, even if subsequent lines are timed.
 =CSV Format=
+As mentioned above, [[#Headers|headers]] can be specified in-line. A header is prefixed with '#' and specifies the fields:
+ # timestamp(MM/dd/yy HH:mm:ss), collection-type(string), heap-occupancy(long)
+Multiple headers are supported in the CSV line stream, and the parser adjust upon receiving a header, by parsing the data lines according to the latest header seen on the stream.
+Comment lines are not allowed.
 ==CSV Field==
@@ Line 9: / Line 56: @@
 ===CSV Field Specification===
-"timestamp"
+====Timestamp====
-"timestamp(time:yy/MM/dd HH:mm:ss)"
+"timestamp", "timestamp(yy/MM/dd HH:mm:ss)", "timestamp(time:yy/MM/dd HH:mm:ss)"
-"something"
+The timestamp can be specified as long representing [[Events-api_Concepts#Timestamp_Value|UTC milliseconds]], or as a string formatted using common date/time formats.
-"something(string)"
+If the UTC milliseconds is used, the CSV format/header should be specified as follows:
+ # timestamp(long), ...
+====String Fields====
+"something", "something(string)"
+====Integer Fields====
 "something(int)"
+====Long Fields====
 "something(long)"
+====Float Fields====
 "something(float)"
+====Double Fields====
 "something(double)"
 "something(time)"

Events-csv Concepts: Difference between revisions

Latest revision as of 19:45, 22 September 2017

Contents

Internal

Tokenization

Missing Value

Headers

CSV Format

CSV Field

CSV Field Specification

Timestamp

String Fields

Integer Fields

Long Fields

Float Fields

Double Fields

Navigation menu

Events-csv Concepts: Difference between revisions

Latest revision as of 19:45, 22 September 2017

Internal

Tokenization

Missing Value

Headers

CSV Format

CSV Field

CSV Field Specification

Timestamp

String Fields

Integer Fields

Long Fields

Float Fields

Double Fields

Navigation menu

Search