Strings in YAML: Difference between revisions
Line 73: | Line 73: | ||
* Cannot contain <code>: </code>&nbps;(:<space>). Colons are allowed but only if they are not followed by whitespace. | * Cannot contain <code>: </code>&nbps;(:<space>). Colons are allowed but only if they are not followed by whitespace. | ||
* Cannot contain <code> #</code>&nbps;(<space>#). This starts a comment | * Cannot contain <code> #</code>&nbps;(<space>#). This starts a comment | ||
* Cannot start with | * Cannot start with: | ||
**<code>- </code> (-<space>). | |||
===Comments and Plain Flow Scalars=== | ===Comments and Plain Flow Scalars=== |
Revision as of 21:14, 7 December 2022
External
- http://www.yaml.org/spec/1.2/spec.html#id2795688
- https://www.baeldung.com/yaml-multi-line
- http://blogs.perl.org/users/tinita/2018/03/strings-in-yaml---to-quote-or-not-to-quote.html
- https://yaml-multiline.info
- https://www.educative.io/answers/what-is-flow-style-in-yaml
Internal
Overview
There is wide variety of choices when it comes to representing strings in YAML. Strings can represented as flow scalars and block scalars. A flow scalar can be plain, single quoted and double quoted. A block scalar can be literal or folded.
Flow Scalars
"Flow" comes from "flow style", where successive YAML elements are placed on the same line, directly next to each other. This style is different from the "block style", where separate YAML elements are arranged into separate blocks defined by the same indentation. A string can be represented as a plain flow scalar, single quoted flow scalar and double quoted flow scalar.
Plain Flow Scalar
A plain flow scalar string starts immediately after :
(colon-space) and it does not require any quote.
a: this is a plain flow scalar string
The plain flow scalar string can be continued on the next lines of the YAML file. However, "folding" the string over multiple lines does not mean that we are introducing new line characters in the string. The lines are concatenated and joined with a single space (If some lines are indented more than others, this is ignored). Note that the amount of space between characters is preserved.
a: this is a plain flow scalar string
that folds
over the
line boundary
The string value is this is a plain flow scalar string that folds over the line boundary
.
long_key:
the plain flow
scalar string may
have a smaller
indentation than
the key
The string value is the plain flow scalar string may have a smaller indentation than the key
.
New lines can be introduced in the string value by entering empty lines in the flow value. Each empty line introduces a new line character in the string value:
a: this string
contains
new lines
The string value is:
this string
contains
new lines
Plain flow scalar strings can contain:
- Single and double quotes, as long as they are not the first character of the string.
- Tabs
- Backslashes
- Unicode characters
a: this ' is " also ' a \ plain flow scalar string
The string value will be: this ' is " also ' a \ plain flow scalar string
.
Plain Flow Scalar String Limitations
- Cannot contain escape sequences: \n, \t
- Cannot contain
:
&nbps;(:<space>). Colons are allowed but only if they are not followed by whitespace. - Cannot contain
#
&nbps;(<space>#). This starts a comment - Cannot start with:
-
(-<space>).
Comments and Plain Flow Scalars
A comment will end a plain flow scalar, the following example is invalid:
some_key:
this is # comment
an invalid example
Single Quoted Flow Scalar
Double Quoted Flow Scalar
Block Scalars
Literal Block Scalar
Folded Block Scalar
Multi-Line Strings
TODO
TODO:
- How can it be parsed and dumped with python.
TODEPLETE
Strings do not require quotation, but it is recommended to quote them, to explicitly specify they are strings and type inference should not be attempted.
The following representations are equivalent for a string:
s1: bare words string s2: "a double-quoted string" s3: 'a single-quoted string'
All forms represented above are named "inline", in that the strings must be rendered on one line.
In the bare word format, characters cannot be escaped.
Double-quoted strings can have specific characters escaped with \. Double quotes can be escaped with \" and line breaks can be escaped with \n.
Single-quoted strings are "literal" strings, they do not use \ to escape characters. The only escape sequence is '' (two single quotes), which is decoded as a single '.
Literal Style
Multi-line strings can be written using the '|' character followed by a new line. To be considered multi-line content, the first line under the '|'-terminated line must be indented on a level deeper that the line containing the '|' and the subsequent multi-line lines must be indented with the same offset as the first line. Trailing white space is stripped. The new line characters are preserved.
This is correct multi-line (the multi-lines must be indented under 'data'):
data: |
This is a
multi-line
text section
The value of data is equivalent with "This is a\nmulti-line\ntext selection\n".
This is NOT a correct mult-line, because the "mult-lines" are not correctly indented:
data: |
This is not
a correct multi-line
The "|" multi-line operator implies that a trailing newline will be added to the string. In the correct above example, the data value will be equivalent with "This is a\nmulti-line\ntext selection\n" - note the trailing new line. If we want the YAML processor to strip off the trailing newline, we should use "|-" instead of "|":
dataWithoutTrailingNewLine: |-
This is a
multi-line
text section
The dataWithoutTrailingNewLine value is equivalent with "This is a\nmulti-line\ntext selection". Note the lack of trailing newline.
If we want the trailing whitespace to be preserved, we should use "|+" instead of "|":
dataWithTrailingWhitespacePreserved: |+
This is a
multi-line
text section
another: value
The dataWithTrailingWhitespacePreserved value is equivalent with "This is a\nmulti-line\ntext selection\n\n\n".
Folded Style
The '>' character followed by a new line folds all the new lines, after removing trailing white space and new lines. All but the last newline will be converted to space.
data: >
This is another
multi-line
text section
but in the final form
it will be just one long string
without new lines
except the last one
The data value is equivalent with "This is another multi-line text section but in the final form it will be just one long string without new lines except the last one\n"
If we wan to drop the trailing newline instead of preserving it, use ">-" instead of ">":
dataWihtoutTrailingNewLine: >-
Something
else
dataWihtoutTrailingNewLine value is equivalent with "Something else". Note the lack of trailing newline.