JAXP DOM: Difference between revisions
(37 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
* [[JAXP#Document_Object_Model_.28DOM.29|JAXP]] | * [[JAXP#Document_Object_Model_.28DOM.29|JAXP]] | ||
* [[XML DOM]] | * [[XML DOM]] | ||
* [[JAXP DOM Reference]] | |||
* [[JDOM]] | * [[JDOM]] | ||
* [[dom4j]] | * [[dom4j]] | ||
Line 8: | Line 9: | ||
=Overview= | =Overview= | ||
The JAXP DOM API is defined by W3C. | The JAXP DOM API is defined by W3C. The parsers implementing the DOM API translates an entire XML document into a memory tree structure, where each node contains on of the components of an XML structure. Once in memory, the DOM tree can be traversed and parsed arbitrarily. The <tt>Document</tt> tree elements are low level data structures. For higher level object structures, [[JAXP_DOM#When_to_Use_JAXP_DOM.2C_JDOM_or_dom4j.3F|use JDOM or dom4j parsers instead]]. | ||
=DOM | Unlike an event-driver parser API, the DOM API is memory and CPU intensive. | ||
The parsing process begins by using <tt>DocumentBuilderFactory</tt> to create a <tt>DocumentBuilder</tt> instance. The actual implementation is dictated by the value of the <tt>javax.xml.parsers.DocumentBuilderFactory</tt> system property. The <tt>DocumentBuilder</tt> produces a <tt>Document</tt> object as a result of a <tt>parse()</tt> method invocation. DOM and [[JAXP SAX|SAX]] parsers handle errors in a similar manner, the same exceptions are generated so the error handling code is virtually identical. | |||
JAXP DOM and SAX use the [[JAXP_SAX#Error_Handling|same error handling mechanism]]: a JAXP-conformant document builder is required to report SAX exceptions when it has trouble parsing an XML document. | |||
=DOM Reference= | |||
<blockquote style="background-color: #f9f9f9; border: solid thin lightgrey;"> | |||
:[[JAXP DOM Reference|DOM Reference]] | |||
</blockquote> | |||
=When to Use JAXP DOM, JDOM or dom4j?= | |||
==DOM Tree Nodes as Objects== | |||
The data structures referred to from the tree produced by a DOM parser are low-level structures, as DOM is intended to be language neutral, and not oriented towards objects. It is the difference in what constitutes a "node" in the data hierarchy that primarily accounts for the differences in programming with these APIs. | |||
Also, because DOM [[#Capability_to_Handle_Mixed_Content|needs to support a mixed content model]], the DOM nodes are inherently very simple. The fact that the "content" of an XML element is the name of the element itself, and not what follows between the start and end brackets is emblematic of this fact. The ''value'' of an element is not the same as its ''content''. | |||
With [[JDOM]] or [[dom4j]], each node in the hierarchy is an object. These APIs are not primarily designed to support a mixed content situation. | |||
==Validation== | |||
The JAXP DOM implementation supports [[XML Schema]], so documents [[JAXP_DOM_Examples#XML_Schema_Validation|can be validated on parsing]], which is not the case with [[JDOM]] and [[dom4j]]. | |||
==Capability to Handle Mixed Content== | |||
The DOM API supports a [[XML_DOM#Mixed_Content_Model|mixed content model]]. | |||
[[JDOM]] and [[dom4j]] allow handling of mixed content, but they are primarily designed for applications where the XML structure contains data, and the data typically is either text, or other elements, but not both. | |||
=DOM Examples= | |||
<blockquote style="background-color: #f9f9f9; border: solid thin lightgrey;"> | <blockquote style="background-color: #f9f9f9; border: solid thin lightgrey;"> | ||
:[[JAXP DOM | :[[JAXP DOM Examples|DOM Examples]] | ||
</blockquote> | </blockquote> | ||
=Component Packages= | =Component Packages= | ||
* <tt>[https://docs.oracle.com/javase/8/docs/api/javax/xml/parsers/package-frame.html javax.xml.parsers]</tt> | * <tt>[https://docs.oracle.com/javase/8/docs/api/javax/xml/parsers/package-frame.html javax.xml.parsers]</tt> defines <tt>DocumentBuilderFactory</tt> and <tt>DocumentBuilder</tt> classes, and error types. | ||
* <tt>[https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/package-frame.html org.w3c.dom]</tt> defines the Document class and other DOM components. | * <tt>[https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/package-frame.html org.w3c.dom]</tt> defines the Document class and other DOM components. |
Latest revision as of 03:38, 11 November 2016
Internal
Overview
The JAXP DOM API is defined by W3C. The parsers implementing the DOM API translates an entire XML document into a memory tree structure, where each node contains on of the components of an XML structure. Once in memory, the DOM tree can be traversed and parsed arbitrarily. The Document tree elements are low level data structures. For higher level object structures, use JDOM or dom4j parsers instead.
Unlike an event-driver parser API, the DOM API is memory and CPU intensive.
The parsing process begins by using DocumentBuilderFactory to create a DocumentBuilder instance. The actual implementation is dictated by the value of the javax.xml.parsers.DocumentBuilderFactory system property. The DocumentBuilder produces a Document object as a result of a parse() method invocation. DOM and SAX parsers handle errors in a similar manner, the same exceptions are generated so the error handling code is virtually identical.
JAXP DOM and SAX use the same error handling mechanism: a JAXP-conformant document builder is required to report SAX exceptions when it has trouble parsing an XML document.
DOM Reference
When to Use JAXP DOM, JDOM or dom4j?
DOM Tree Nodes as Objects
The data structures referred to from the tree produced by a DOM parser are low-level structures, as DOM is intended to be language neutral, and not oriented towards objects. It is the difference in what constitutes a "node" in the data hierarchy that primarily accounts for the differences in programming with these APIs.
Also, because DOM needs to support a mixed content model, the DOM nodes are inherently very simple. The fact that the "content" of an XML element is the name of the element itself, and not what follows between the start and end brackets is emblematic of this fact. The value of an element is not the same as its content.
With JDOM or dom4j, each node in the hierarchy is an object. These APIs are not primarily designed to support a mixed content situation.
Validation
The JAXP DOM implementation supports XML Schema, so documents can be validated on parsing, which is not the case with JDOM and dom4j.
Capability to Handle Mixed Content
The DOM API supports a mixed content model.
JDOM and dom4j allow handling of mixed content, but they are primarily designed for applications where the XML structure contains data, and the data typically is either text, or other elements, but not both.
DOM Examples
Component Packages
- javax.xml.parsers defines DocumentBuilderFactory and DocumentBuilder classes, and error types.
- org.w3c.dom defines the Document class and other DOM components.