Parsing: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(32 intermediate revisions by the same user not shown)
Line 9: Line 9:
=Overview=
=Overview=


=Formal Grammars=
<font color=darkgray size='+4'><center>On Hold, Pending Review of Compilers: Principles, Techniques and Tools -> Deplete into [[Formal_Languages_and_Translators]]</center></font>
 
=<span id='Formal_Grammars'></span>Formal Grammar=


{{External|https://en.wikipedia.org/wiki/Formal_grammar}}
{{External|https://en.wikipedia.org/wiki/Formal_grammar}}
Line 17: Line 19:
==Context-Free Grammar==
==Context-Free Grammar==


A context-free grammar is a grammar in which the left-hand side of each production rule consists of only a single nonterminal symbol.  A popular notation for context-free grammar is Backus-Naur (BNF).
A context-free grammar is a grammar in which the left-hand side of each production rule consists of only a '''single''' nonterminal symbol.  A popular notation for context-free grammar is Backus-Naur (BNF).


Deterministic context-free grammars (DCFGs) are a proper subset of the context-free grammars. They can be derived from deterministic pushdown automata, and they generate deterministic context-free languages. DCFGs are unambiguous and are of practical interest as they can be parsed in linear time. Parsers can be automatically generated from the grammar by a [[#Parser_Generator|parser generator]]. Restricted forms of DCFGs can be parsed by simpler parsers. These grammar classes are referred by the type of parser that parses them: LALR, SLR and LL.
Deterministic context-free grammars (DCFGs) are a proper subset of the context-free grammars. They can be derived from deterministic pushdown automata, and they generate deterministic context-free languages. DCFGs are unambiguous and are of practical interest as they can be parsed in linear time. Parsers can be automatically generated from the grammar by a [[#Parser_Generator|parser generator]]. Restricted forms of DCFGs can be parsed by simpler parsers. These grammar classes are referred by the type of parser that parses them: LALR, SLR and LL.
Line 31: Line 33:
  <symbol> ::= expression
  <symbol> ::= expression


where <symbol> is a nonterminal, and expression consists of one or more sequence of symbols. The expression may contain vertical bars '|' indicating a choice. Symbols that. never appear on the left side are terminals. The ::= means that the symbol on the left must be replaced with the expression on the right.
where <symbol> is a nonterminal, and expression consists of one or more sequence of symbols. The expression may contain vertical bars '|' indicating a choice. Symbols that never appear on the left side are terminals. The ::= means that the symbol on the left must be replaced with the expression on the right.


===Extended Backus-Naur form EBNF===
===Extended Backus-Naur form EBNF===
Line 41: Line 43:
{{Internal|EBNF Variant that Used to Define the XML Grammar|EBNF Variant that Used to Define the XML Grammar}}
{{Internal|EBNF Variant that Used to Define the XML Grammar|EBNF Variant that Used to Define the XML Grammar}}


=Parser Generator=
=Parser=
 
==Bottom-Up Parser==
 
{{External|[https://en.wikipedia.org/wiki/Bottom-up_parsing Wikipedia Bottom-up Parsing]}}
 
Bottom-up parsers recognize the text's lowest-level small details first, before its mid-level structures, and leaving the highest-level overall structure to last. The bottom-up parser discovers the [[#Parse_Tree|parse tree]] starting from the bottom left end and incrementally works its way upwards and rightwards. The parser waits until it has scanned and parsed all parts of some constructs before committing to what the combined structure is. Bottom-up parsing is sometimes done by backtracking. Much more commonly, [[#Look-Ahead_LR_Parser_.28LARL.29|LALR parsers]] are used.
 
===Left-to-Right, Rightmost Derivation Parser (LR)===
 
{{External|https://en.wikipedia.org/wiki/LR_parser}}
 
====Simple LR Parser (SLR)====
 
====Look-Ahead LR Parser (LARL)====
 
{{External|https://en.wikipedia.org/wiki/LALR_parser}}
 
====Canonical LR(1) Parser====
 
====Minimal LR(1) Parser====


ANTLR
====GLR Parser====


=Parse Tree=
===Shift-Reduce Parser===


{{External|https://en.wikipedia.org/wiki/Parse_tree}}
{{External|https://en.wikipedia.org/wiki/Shift-reduce_parser}}


<font color=darkgray>Difference between parse tree and syntax tree.</font>
=Parser Generator=


=Syntax Tree=
* [[ANTLR]]
* http://catalog.compilertools.net/java.html
* http://dinosaur.compilertools.net


=Organizatorium=
=Organizatorium=


* [[Difference between Pull Parsing and Push Parsing#Overview|Difference between Pull Parsing and Push Parsing]]
* [[Difference between Pull Parsing and Push Parsing#Overview|Difference between Pull Parsing and Push Parsing]]

Latest revision as of 22:11, 13 July 2018

External

Internal

Overview

On Hold, Pending Review of Compilers: Principles, Techniques and Tools -> Deplete into Formal_Languages_and_Translators

Formal Grammar

https://en.wikipedia.org/wiki/Formal_grammar

A set of production rules that describe all possible strings in a given formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language syntax. The grammar does not describe the meaning of strings, or what can be done with them in whatever context, only their form.

Context-Free Grammar

A context-free grammar is a grammar in which the left-hand side of each production rule consists of only a single nonterminal symbol. A popular notation for context-free grammar is Backus-Naur (BNF).

Deterministic context-free grammars (DCFGs) are a proper subset of the context-free grammars. They can be derived from deterministic pushdown automata, and they generate deterministic context-free languages. DCFGs are unambiguous and are of practical interest as they can be parsed in linear time. Parsers can be automatically generated from the grammar by a parser generator. Restricted forms of DCFGs can be parsed by simpler parsers. These grammar classes are referred by the type of parser that parses them: LALR, SLR and LL.

Backus-Naur Form BNF

https://en.wikipedia.org/wiki/Backus–Naur_form

Backus-Naur form is notation technique for context-free grammars that is used to describe the syntax of languages used in computing.

A BNF specification is a set of derivation rules, written as:

<symbol> ::= expression

where <symbol> is a nonterminal, and expression consists of one or more sequence of symbols. The expression may contain vertical bars '|' indicating a choice. Symbols that never appear on the left side are terminals. The ::= means that the symbol on the left must be replaced with the expression on the right.

Extended Backus-Naur form EBNF

https://en.wikipedia.org/wiki/Extended_Backus–Naur_form

EBNF Variant that Used to Define the XML Grammar

EBNF Variant that Used to Define the XML Grammar

Parser

Bottom-Up Parser

Wikipedia Bottom-up Parsing

Bottom-up parsers recognize the text's lowest-level small details first, before its mid-level structures, and leaving the highest-level overall structure to last. The bottom-up parser discovers the parse tree starting from the bottom left end and incrementally works its way upwards and rightwards. The parser waits until it has scanned and parsed all parts of some constructs before committing to what the combined structure is. Bottom-up parsing is sometimes done by backtracking. Much more commonly, LALR parsers are used.

Left-to-Right, Rightmost Derivation Parser (LR)

https://en.wikipedia.org/wiki/LR_parser

Simple LR Parser (SLR)

Look-Ahead LR Parser (LARL)

https://en.wikipedia.org/wiki/LALR_parser

Canonical LR(1) Parser

Minimal LR(1) Parser

GLR Parser

Shift-Reduce Parser

https://en.wikipedia.org/wiki/Shift-reduce_parser

Parser Generator

Organizatorium