Strings in YAML: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(59 intermediate revisions by the same user not shown)
Line 20: Line 20:
"Flow" comes from "flow style", where successive YAML elements are placed on the same line, directly next to each other. This style is different from the "block style", where separate YAML elements are arranged into separate blocks defined by the same indentation. A string can be represented as a [[#Plain_Flow_Scalar|plain]] flow scalar, [[#Single_Quoted_Flow_Scalar|single quoted]] flow scalar and [[#Double_Quoted_Flow_Scalar|double quoted]] flow scalar.
"Flow" comes from "flow style", where successive YAML elements are placed on the same line, directly next to each other. This style is different from the "block style", where separate YAML elements are arranged into separate blocks defined by the same indentation. A string can be represented as a [[#Plain_Flow_Scalar|plain]] flow scalar, [[#Single_Quoted_Flow_Scalar|single quoted]] flow scalar and [[#Double_Quoted_Flow_Scalar|double quoted]] flow scalar.
==Plain Flow Scalar==
==Plain Flow Scalar==
A plain flow scalar string starts immediately after <code>: </code> (colon-space) and it does not require any quote. It can be continued on the next lines of the YAML file, but "folding" the string over multiple lines does not mean that we are introducing new line characters in the string.
A plain flow scalar string starts immediately after <code>: </code>&nbsp;(colon-space) and it does not require any quote.  
==Single Quoted Flow Scalar==
<syntaxhighlight lang='yaml'>
==Double Quoted Flow Scalar==
a: this is a plain flow scalar string
</syntaxhighlight>


=<span id='Block_Scalar'></span>Block Scalars=
Note that in case of numeric or boolean values, type inference is attempted. To avoid it, use [[#Single_Quoted_Flow_Scalar|single quoted flow scalars]] or [[#Double_Quoted_Flow_Scalar|double quoted flow scalars]].
==Literal Block Scalar==
==Folded Block Scalar==
=Multi-Line Strings=


=TODO=
<span id='Folding'></span>The plain flow scalar string can be continued on the next lines of the YAML file. However, "folding" the string over multiple lines does not mean that we are introducing new line characters in the string. The lines are concatenated and joined with a single space (If some lines are indented more than others, this is ignored). Note that the amount of space between characters is preserved.
<syntaxhighlight lang='yaml'>
a: this is a plain flow scalar string
that folds
      over the
                line boundary
</syntaxhighlight>
The string value is <code>this is a plain flow scalar string that folds over the line boundary</code>.
<syntaxhighlight lang='yaml'>
long_key:
the plain flow
    scalar string may
      have a smaller
    indentation than
            the key
</syntaxhighlight>
The string value is <code>the plain flow scalar string may have a smaller indentation than the key</code>.


<font color=darkkhaki>
New lines can be introduced in the string value by entering empty lines in the flow value. Each empty line introduces a new line character in the string value:
<syntaxhighlight lang='yaml'>
a: this string
     
            contains


'''TODO''':


* How can it be parsed and dumped with python.
                        new lines
</syntaxhighlight>
The string value is:
<syntaxhighlight lang='text'>
this string
contains


</font>
new lines
</syntaxhighlight>


=TODEPLETE=
Plain flow scalar strings can contain:
* Single and double quotes, as long as they are not the first character of the string.
* Tabs
* Backslashes
* Unicode characters
<syntaxhighlight lang='text'>
a: this ' is " also ' a \ plain flow scalar string
</syntaxhighlight>
The string value will be: <code> this ' is " also ' a \ plain flow scalar string</code>.


Strings do not require quotation, but it is recommended to quote them, to explicitly specify they are strings and type inference should not be attempted.
===Plain Flow Scalar String Limitations===
* Cannot contain escape sequences: \n, \t
* Cannot contain <code>: </code>&nbps;(:<space>). Colons are allowed but only if they are not followed by whitespace.
* Cannot contain <code> #</code>&nbps;(<space>#). This starts a comment
* Cannot start with:
**<code>- </code>&nbsp;-<space>
**<code>: </code>&nbsp;:<space>
**<code>? </code>&nbsp;?<space>
** <code>!</code>
** <code>&</code>
**<code>*</code>
**<code>{</code>, <code>}</code>, <code>[</code>, <code>]</code> (flow mapping or sequence)
**<code>,</code> (flow collection entry separator)
**<code>#</code> (comment)
**<code>|</code> or <code>></code> (block scalar)
**<code>@</code>
**<code>`</code> back tick
**<code>'</code>, <code>"</code>


The following representations are equivalent for a string:
===Comments and Plain Flow Scalars===
A comment will end a plain flow scalar, the following example is invalid:
<syntaxhighlight lang='yaml'>
some_key:
  this is            # comment
  an invalid example
</syntaxhighlight>


s1: bare words string
==Single Quoted Flow Scalar==
s2: "a double-quoted string"
Single quoted flow scalars work like plain scalars, but they accept additional characters that are [[#Plain_Flow_Scalar_String_Limitations|not accepted by plain flow scalars]].
s3: 'a single-quoted string'


All forms represented above are named "inline", in that the strings must be rendered on one line.
<syntaxhighlight lang='yaml'>
some_key: 'escape seqs \n \t, : (semicolon space), - (dash space), ? (question mark space) ! & * { } [ ] | > @ `, '
</syntaxhighlight>
The string value is <code>escape seqs \n \t, : (semicolon space), - (dash space), ? (question mark space) ! & * { } [ ] | > @ `,</code>


In the bare word format, characters cannot be escaped.
Single quoted flow scalars have similar folding rules as [[#Folding|plain flow scalars]].


Double-quoted strings can have specific characters escaped with \. Double quotes can be escaped with \" and line breaks can be escaped with \n.  
==Double Quoted Flow Scalar==
Double quoted flow scalars have the same rules ass single quoted scalars, plus some extra rules and escape sequences.  


Single-quoted strings are "literal" strings, they do not use \ to escape characters. The only escape sequence is &#39;' (two single quotes), which is decoded as a single '.
This is the only scalar style where escape sequences '''can''' be used.
<syntaxhighlight lang='yaml'>
some_key: "multi\nline"
</syntaxhighlight>
The string value is:
<syntaxhighlight lang='text'>
multi
line
</syntaxhighlight>


Only a limited set of characters can be escaped:
* \n
* \"


==Literal Style==
Other escapes are invalid.
<syntaxhighlight lang='yaml'>
some_key: "some\,thing"
</syntaxhighlight>
will produce a parsing error.


Multi-line strings can be written using the '|' character followed by a new line. To be considered multi-line content, the first line under the '|'-terminated line must be indented on a level deeper that the line containing the '|' and the subsequent multi-line lines must be indented with the same offset as the first line. Trailing white space is stripped. The new line characters are preserved.
There are special escape sequence that can be used to express any character:
<syntaxhighlight lang='text'>
"a \x20 space"
"a vertical \v tab can also be written as \x0B or \x0b"
"an 'A' in 8-bit unicode: \x41"
"an 'A' in 16-bit unicode: \u0041"
"an 'A' in 32-bit unicode: \U00000041"
</syntaxhighlight>
For more details:
{{External|https://yaml.org/spec/1.2-old/spec.html#id2776092}}
===Backslash in Double Quotes===
If a backslash is added the end of a line, the next line will be folded without a space.
<syntaxhighlight lang='yaml'>
some_key:
  "a\
  b\
  c"
</syntaxhighlight>
The string value is <code>a  b  c</code>.
 
=<span id='Block_Scalar'></span>Block Scalars=
There are [[#Literal_Block_Scalar|literal]] and [[#Folded_Block_Scalar|folded]] block scalars. They are introduced by <code>|</code> and <code>></code> respectively. The content must start on the next line and must be indented.
==<span id='Multi-Line'></span>Literal Block Scalar==
The literal block scalar in introduced by <code>|</code> (pipe). The content must start on the next line. The content is indented - this is where the "block" name comes from. The first line must be indented on a level deeper than the line containing the '|' character. If a line has a smaller indentation than the previous, this is a syntax error. The string value ends with a single new line, but further trailing new lines will be stripped. The main use case for literal block scalars is that it preserve a multi-line format so it is useful when storing code or shell scripts, for example.
<syntaxhighlight lang='yaml'>
some_key: |
  this
  is
  a literal block
  scalar
</syntaxhighlight>
 
The string is:
<syntaxhighlight lang='text'>
this
is
a literal block
scalar
</syntaxhighlight>


This is correct multi-line (the multi-lines must be indented under 'data'):
The indentation is detected from the first non-empty line of the block scalar, but the empty lines will become part of the string.


<syntaxhighlight lang='yaml'>
<syntaxhighlight lang='yaml'>
data: |
some_key: |
  This is a
 
  multi-line
 
  text section
      a
      b
</syntaxhighlight>
</syntaxhighlight>


The value of data is equivalent with "This is a\nmulti-line\ntext selection\n".
The string is:
<syntaxhighlight lang='text'>
 


This is NOT a correct mult-line, because the "mult-lines" are not correctly indented:
a
b
</syntaxhighlight>
Additional rules:
* Trailing spaces are preserved
* Cannot use escape sequences
* A line that starts with # and it is indented correctly will '''not''' be interpreted as a comment


A comment can be added to a block scalar immediately after the header:
<syntaxhighlight lang='yaml'>
<syntaxhighlight lang='yaml'>
data: |
some_key: | # this is a comment
This is not
  a
a correct multi-line
  b
</syntaxhighlight>
</syntaxhighlight>
===Dumping as Literal Block Scalar in Python===


The "|" multi-line operator implies that a trailing newline will be added to the string. In the correct above example, the data value will be equivalent with "This is a\nmulti-line\ntext selection\n" - note the trailing new line. If we want the YAML processor to strip off the trailing newline, we should use "|-" instead of "|":
==Folded Block Scalar==
 
A folded block scalar will fold its lines with spaces. It is introduced with the <code>></code>. The content must start on the next line and must be indented (broken indentation will cause parsing error).  The string value ends with a new line, but further trailing new lines will be stripped
<syntaxhighlight lang='yaml'>
<syntaxhighlight lang='yaml'>
dataWithoutTrailingNewLine: |-
some_key: >
  This is a
this
  multi-line
is
  text section
a
folded
block
scalar
</syntaxhighlight>
</syntaxhighlight>


The dataWithoutTrailingNewLine value is equivalent with "This is a\nmulti-line\ntext selection". Note the lack of trailing newline.
The string is <code>this is a folded  block scalar</code>.


If we want the trailing whitespace to be preserved, we should use "|+" instead of "|":
The folding rules are almost the same as for [[#Folding|plain flow scalars]]. In particular, an empty line introduces a new line in the string value:


<syntaxhighlight lang='yaml'>
<syntaxhighlight lang='yaml'>
dataWithTrailingWhitespacePreserved: |+
some_key: >
  This is a
a
  multi-line
b
  text section
c
</syntaxhighlight>


 
The string is:
another: value
<syntaxhighlight lang='text'>
a b
c
</syntaxhighlight>
</syntaxhighlight>


The dataWithTrailingWhitespacePreserved value is equivalent with "This is a\nmulti-line\ntext selection\n\n\n".
Rules:
* Trailing spaces are preserved
* Cannot use escape sequences
* A line that starts with # and it is indented correctly will '''not''' be interpreted as a comment


==Folded Style==
A comment can be added to a block scalar immediately after the header:
<syntaxhighlight lang='yaml'>
some_key: | # this is a comment
  a
  b
</syntaxhighlight>


The '>' character followed by a new line folds all the new lines, after removing trailing white space '''and new lines'''. All but the last newline will be converted to space.
==Block Scalar Chomping==
This applies to both literal and folded block scalars. The block scalar strings will always end with a new line, and additional new lines will be stripped.
===Strip===
To strip the default new line, use the "-" (chomping indicator):
<syntaxhighlight lang='yaml'>
literal_block_scalar: |-
  a
  b
folded_block_scalar: >-
  c
  d
</syntaxhighlight>
The strings are:
<syntaxhighlight lang='text'>
a
b
</syntaxhighlight> (no new line) and <code>c d</code> (no new line).


===Keep===
To preserve all new lines in the block, the "+" indicator:
<syntaxhighlight lang='yaml'>
<syntaxhighlight lang='yaml'>
data: >
literal_block_scalar: |+
  This is another
  a
  multi-line
  b
  text section
 
  but in the final form
  it will be just one long string
  without new lines
  except the last one
</syntaxhighlight>


The data value is equivalent with "This is another multi-line text section but in the final form it will be just one long string without new lines except the last one\n"
folded_block_scalar: >+
  c
  d


If we wan to drop the trailing newline instead of preserving it, use ">-" instead of ">":


<syntaxhighlight lang='yaml'>
dataWihtoutTrailingNewLine: >-
  Something
  else
</syntaxhighlight>
</syntaxhighlight>
The strings are:
<syntaxhighlight lang='text'>
a
b


dataWihtoutTrailingNewLine value is equivalent with "Something else". Note the lack of trailing newline.
</syntaxhighlight>and <code>c d\n\n</code>. The new lines are preserved.

Latest revision as of 22:26, 7 December 2022

External

Internal

Overview

There is wide variety of choices when it comes to representing strings in YAML. Strings can represented as flow scalars and block scalars. A flow scalar can be plain, single quoted and double quoted. A block scalar can be literal or folded.

Flow Scalars

"Flow" comes from "flow style", where successive YAML elements are placed on the same line, directly next to each other. This style is different from the "block style", where separate YAML elements are arranged into separate blocks defined by the same indentation. A string can be represented as a plain flow scalar, single quoted flow scalar and double quoted flow scalar.

Plain Flow Scalar

A plain flow scalar string starts immediately after :  (colon-space) and it does not require any quote.

a: this is a plain flow scalar string

Note that in case of numeric or boolean values, type inference is attempted. To avoid it, use single quoted flow scalars or double quoted flow scalars.

The plain flow scalar string can be continued on the next lines of the YAML file. However, "folding" the string over multiple lines does not mean that we are introducing new line characters in the string. The lines are concatenated and joined with a single space (If some lines are indented more than others, this is ignored). Note that the amount of space between characters is preserved.

a: this is a plain flow scalar string
 that folds
       over the
                 line boundary

The string value is this is a plain flow scalar string that folds over the line boundary.

long_key:
 the plain flow
    scalar string may
       have a smaller 
     indentation than
             the key

The string value is the plain flow scalar string may have a smaller indentation than the key.

New lines can be introduced in the string value by entering empty lines in the flow value. Each empty line introduces a new line character in the string value:

a: this string
      
             contains


                         new lines

The string value is:

this string
contains

new lines

Plain flow scalar strings can contain:

  • Single and double quotes, as long as they are not the first character of the string.
  • Tabs
  • Backslashes
  • Unicode characters
a: this ' is " also ' a \ plain flow scalar string

The string value will be: this ' is " also ' a \ plain flow scalar string.

Plain Flow Scalar String Limitations

  • Cannot contain escape sequences: \n, \t
  • Cannot contain : &nbps;(:<space>). Colons are allowed but only if they are not followed by whitespace.
  • Cannot contain #&nbps;(<space>#). This starts a comment
  • Cannot start with:
    • -  -<space>
    • :  :<space>
    • ?  ?<space>
    • !
    • &
    • *
    • {, }, [, ] (flow mapping or sequence)
    • , (flow collection entry separator)
    • # (comment)
    • | or > (block scalar)
    • @
    • ` back tick
    • ', "

Comments and Plain Flow Scalars

A comment will end a plain flow scalar, the following example is invalid:

some_key:
   this is            # comment
   an invalid example

Single Quoted Flow Scalar

Single quoted flow scalars work like plain scalars, but they accept additional characters that are not accepted by plain flow scalars.

some_key: 'escape seqs \n \t, : (semicolon space), - (dash space), ? (question mark space) ! & * { } [ ] | > @ `, '

The string value is escape seqs \n \t, : (semicolon space), - (dash space), ? (question mark space) ! & * { } [ ] | > @ `,

Single quoted flow scalars have similar folding rules as plain flow scalars.

Double Quoted Flow Scalar

Double quoted flow scalars have the same rules ass single quoted scalars, plus some extra rules and escape sequences.

This is the only scalar style where escape sequences can be used.

some_key: "multi\nline"

The string value is:

multi
line

Only a limited set of characters can be escaped:

  • \n
  • \"

Other escapes are invalid.

some_key: "some\,thing"

will produce a parsing error.

There are special escape sequence that can be used to express any character:

"a \x20 space"
"a vertical \v tab can also be written as \x0B or \x0b"
"an 'A' in 8-bit unicode: \x41"
"an 'A' in 16-bit unicode: \u0041"
"an 'A' in 32-bit unicode: \U00000041"

For more details:

https://yaml.org/spec/1.2-old/spec.html#id2776092

Backslash in Double Quotes

If a backslash is added the end of a line, the next line will be folded without a space.

some_key: 
  "a\
   b\
   c"

The string value is a b c.

Block Scalars

There are literal and folded block scalars. They are introduced by | and > respectively. The content must start on the next line and must be indented.

Literal Block Scalar

The literal block scalar in introduced by | (pipe). The content must start on the next line. The content is indented - this is where the "block" name comes from. The first line must be indented on a level deeper than the line containing the '|' character. If a line has a smaller indentation than the previous, this is a syntax error. The string value ends with a single new line, but further trailing new lines will be stripped. The main use case for literal block scalars is that it preserve a multi-line format so it is useful when storing code or shell scripts, for example.

some_key: |
  this 
  is
  a literal block
  scalar

The string is:

this 
is
a literal block
scalar

The indentation is detected from the first non-empty line of the block scalar, but the empty lines will become part of the string.

some_key: |


       a
       b

The string is:

a
b

Additional rules:

  • Trailing spaces are preserved
  • Cannot use escape sequences
  • A line that starts with # and it is indented correctly will not be interpreted as a comment

A comment can be added to a block scalar immediately after the header:

some_key: | # this is a comment
  a
  b

Dumping as Literal Block Scalar in Python

Folded Block Scalar

A folded block scalar will fold its lines with spaces. It is introduced with the >. The content must start on the next line and must be indented (broken indentation will cause parsing error). The string value ends with a new line, but further trailing new lines will be stripped

some_key: >
 this
 is
 a
 folded
 block
 scalar

The string is this is a folded block scalar.

The folding rules are almost the same as for plain flow scalars. In particular, an empty line introduces a new line in the string value:

some_key: >
 a
 b
 
 c

The string is:

a b
c

Rules:

  • Trailing spaces are preserved
  • Cannot use escape sequences
  • A line that starts with # and it is indented correctly will not be interpreted as a comment

A comment can be added to a block scalar immediately after the header:

some_key: | # this is a comment
  a
  b

Block Scalar Chomping

This applies to both literal and folded block scalars. The block scalar strings will always end with a new line, and additional new lines will be stripped.

Strip

To strip the default new line, use the "-" (chomping indicator):

literal_block_scalar: |-
  a
  b
folded_block_scalar: >-
  c
  d

The strings are:

a
b

(no new line) and c d (no new line).

Keep

To preserve all new lines in the block, the "+" indicator:

literal_block_scalar: |+
  a
  b
  

folded_block_scalar: >+
  c
  d

The strings are:

a
b

and c d\n\n. The new lines are preserved.