Parsing: Difference between revisions
Line 94: | Line 94: | ||
* [[ANTLR]] | * [[ANTLR]] | ||
* http://catalog.compilertools.net/java.html | |||
* http://dinosaur.compilertools.net | |||
=Organizatorium= | =Organizatorium= | ||
* [[Difference between Pull Parsing and Push Parsing#Overview|Difference between Pull Parsing and Push Parsing]] | * [[Difference between Pull Parsing and Push Parsing#Overview|Difference between Pull Parsing and Push Parsing]] |
Revision as of 17:24, 5 July 2018
External
Internal
Overview
Formal Grammar
A set of production rules that describe all possible strings in a given formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language syntax. The grammar does not describe the meaning of strings, or what can be done with them in whatever context, only their form.
Context-Free Grammar
A context-free grammar is a grammar in which the left-hand side of each production rule consists of only a single nonterminal symbol. A popular notation for context-free grammar is Backus-Naur (BNF).
Deterministic context-free grammars (DCFGs) are a proper subset of the context-free grammars. They can be derived from deterministic pushdown automata, and they generate deterministic context-free languages. DCFGs are unambiguous and are of practical interest as they can be parsed in linear time. Parsers can be automatically generated from the grammar by a parser generator. Restricted forms of DCFGs can be parsed by simpler parsers. These grammar classes are referred by the type of parser that parses them: LALR, SLR and LL.
Backus-Naur Form BNF
Backus-Naur form is notation technique for context-free grammars that is used to describe the syntax of languages used in computing.
A BNF specification is a set of derivation rules, written as:
<symbol> ::= expression
where <symbol> is a nonterminal, and expression consists of one or more sequence of symbols. The expression may contain vertical bars '|' indicating a choice. Symbols that. never appear on the left side are terminals. The ::= means that the symbol on the left must be replaced with the expression on the right.
Extended Backus-Naur form EBNF
EBNF Variant that Used to Define the XML Grammar
Parse Tree
Difference between parse tree and syntax tree.
Syntax Tree
Parser
Top-Down Parser
A top-down parser is a parser that employs a strategy where it first looks at the highest level of the parse tree and works down the parser tree by using the rewriting rules of the formal grammar. The top-down parser discovers the parse tree starting from the top and incrementally works its way first downwards and then rightwards. Unlike a bottom-up parser, the top-down parser eagerly decides what a construct is much earlier, when it has only scanned the leftmost symbol of that construct and has not yet parsed any of its parts.
If a language grammar has multiple rules that may start with the same leftmost symbols but have different endings, that grammar can be efficiently handled by a deterministic bottom-up parser, but cannot be handled top-down without guesswork and backtracking - so bottom-up parsers handle a somewhat larger range of computer language grammars than top-down parsers.
Recursive Descent Parser
A top-down parser build from a set of (possibly mutually recursive) procedures, where each such procedure usually implements one of the productions of the grammar. In consequence, the structure of the resulting program closely mirrors that of the grammar it recognizes. If the parser does not require backtracking, it is called a predictive parser.
Bottom-Up Parser
Bottom-up parsers recognize the text's lowest-level small details first, before its mid-level structures, and leaving the highest-level overall structure to last. The bottom-up parser discovers the parse tree starting from the bottom left end and incrementally works its way upwards and rightwards. The parser waits until it has scanned and parsed all parts of some constructs before committing to what the combined structure is. Bottom-up parsing is sometimes done by backtracking. Much more commonly, LALR parsers are used.