EBNF Variant that Used to Define the XML Grammar: Difference between revisions
(3 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
=Overview= | =Overview= | ||
W3C uses an variant of the Extended Backus-Naur Form notation to express the formal grammar of XML. | W3C uses an variant of the Extended Backus-Naur Form notation to express the formal grammar of XML. It is a combination between a EBNF and regular expressions. | ||
=Notation= | =Notation= | ||
Line 37: | Line 37: | ||
'''"string"''' matches the literal string specified inside the double-quotes. | '''"string"''' matches the literal string specified inside the double-quotes. | ||
''''string'''' matches the literal string specified inside the single-quotes. | |||
'''(expression)''' is treated as a unit and may be combined as described below: | |||
'''A?''' matches A or nothing (optional A). | |||
'''A B''' matches A followed by B. This operator has higher precedence than alternation (|), thus A B | C D is equivalent to (A B) | (C D). | |||
'''A | B''' alternations: matches A or B. | |||
'''A - B''' matches any string that matches A but does not match B. | |||
'''A+''' matches one or more occurrences of A. Concatenation has higher precedence than alternation (|), thus A+ | B+ is equivalent to (A+) | (B+). | |||
'''A*''' matches zero or more occurrences of A. Concatenation has higher precedence than alternation (|), thus A* | B* is equivalent to (A*) | (B*). | |||
'''/* ... */''' comment. |
Latest revision as of 00:06, 8 June 2018
External
Internal
Overview
W3C uses an variant of the Extended Backus-Naur Form notation to express the formal grammar of XML. It is a combination between a EBNF and regular expressions.
Notation
Each rule in the grammar defines one of the symbols, in the form:
symbol ::= expression
More details about Backus-Naur Form are available here:
Literal strings are quoted.
Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:
#xN where N is hexadecimal integer. The expression matches the character whose number (code point) in ISO/IEC 10646 is N. The number of leading zeroes is insignificant.
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
[a-zA-z], [#xN-#xN] matches any Char with a value in the specified range (inclusive).
[abc], [#xN#xN#xN] matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.
[^a-z], [^#xN-#xN] maches any Char with a value outside the indicated range.
[^abc], [^#xN#xN#xN] matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets.
"string" matches the literal string specified inside the double-quotes.
'string' matches the literal string specified inside the single-quotes.
(expression) is treated as a unit and may be combined as described below:
A? matches A or nothing (optional A).
A B matches A followed by B. This operator has higher precedence than alternation (|), thus A B | C D is equivalent to (A B) | (C D).
A | B alternations: matches A or B.
A - B matches any string that matches A but does not match B.
A+ matches one or more occurrences of A. Concatenation has higher precedence than alternation (|), thus A+ | B+ is equivalent to (A+) | (B+).
A* matches zero or more occurrences of A. Concatenation has higher precedence than alternation (|), thus A* | B* is equivalent to (A*) | (B*).
/* ... */ comment.