Events-csv Concepts: Difference between revisions
Line 28: | Line 28: | ||
On the other hand, when the header is missing, so we don't have a way of knowing the missing value's type, the missing value is represented with a null-valued UndefinedTypeProperty. In the following case: | On the other hand, when the header is missing, so we don't have a way of knowing the missing value's type, the missing value is represented with a null-valued UndefinedTypeProperty. In the following case: | ||
a, ,b | a, , b | ||
the corresponding CSVEvent carries a "field_1" UndefinedTypeProperty, whose value is null. | the corresponding CSVEvent carries a "field_1" UndefinedTypeProperty, whose value is null. |
Revision as of 19:18, 28 August 2017
Internal
Tokenization
The empty strings found between commas are interpreted as "missing value". For example:
a, , b
generates a data line with two values: "a" and "b", separated by a "missing value".
The quoted empty strings found between commas are interpreted as empty strings. For example:
a," ", b
generates a data line with three values: "a", " " and "b".
A line that ends in a comma generates a data line that has a "missing value" on the last position in line.
Then a comma-separated value line is turned into a CSVEvent, the "missing values" as defined above are represented with null-valued properties. If the type of the value is known, then the missing value is represented as a property of the corresponding type with a null value. For example:
# timestamp, count(int) 12/21/16 14:00:00,
will return a CSVEvent with a IntegerProperty "field_1". The value of the property will be null, which will carry "missing value" semantics.
On the other hand, when the header is missing, so we don't have a way of knowing the missing value's type, the missing value is represented with a null-valued UndefinedTypeProperty. In the following case:
a, , b
the corresponding CSVEvent carries a "field_1" UndefinedTypeProperty, whose value is null.
CSV Format
Headers can be specified in-line. A header is prefixed with '#' and specifies the fields:
# timestamp(MM/dd/yy HH:mm:ss), collection-type(string), heap-occupancy(long)
Multiple headers are supported in the CSV line stream, and the parser adjust upon receiving a header, by parsing the data lines according to the latest header seen on the stream.
Comment lines are not allowed.
CSV Field
CSV Field Specification
"timestamp", "timestamp(yy/MM/dd HH:mm:ss)", "timestamp(time:yy/MM/dd HH:mm:ss)"
"something", "something(string)"
"something(int)"
"something(long)"
"something(float)"
"something(double)"
"something(time)"