EBNF Variant that Used to Define the XML Grammar

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

W3C uses an variant of the Extended Backus-Naur Form notation to express the formal grammar of XML. It is a combination between a EBNF and regular expressions.

Notation

Each rule in the grammar defines one of the symbols, in the form:

symbol ::= expression

More details about Backus-Naur Form are available here:

Context-Free Grammars

Literal strings are quoted.

Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:

#xN where N is hexadecimal integer. The expression matches the character whose number (code point) in ISO/IEC 10646 is N. The number of leading zeroes is insignificant.

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

[a-zA-z], [#xN-#xN] matches any Char with a value in the specified range (inclusive).

[abc], [#xN#xN#xN] matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.

[^a-z], [^#xN-#xN] maches any Char with a value outside the indicated range.

[^abc], [^#xN#xN#xN] matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets.

"string" matches the literal string specified inside the double-quotes.

'string' matches the literal string specified inside the single-quotes.

(expression) is treated as a unit and may be combined as described below:

A? matches A or nothing (optional A).

A B matches A followed by B. This operator has higher precedence than alternation (|), thus A B | C D is equivalent to (A B) | (C D).

A | B alternations: matches A or B.

A - B matches any string that matches A but does not match B.

A+ matches one or more occurrences of A. Concatenation has higher precedence than alternation (|), thus A+ | B+ is equivalent to (A+) | (B+).

A* matches zero or more occurrences of A. Concatenation has higher precedence than alternation (|), thus A* | B* is equivalent to (A*) | (B*).

/* ... */ comment.