JAXP DOM

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

The JAXP DOM API is defined by W3C. The parsers implementing the DOM API translates an entire XML document into a memory tree structure, where each node contains on of the components of an XML structure. Once in memory, the DOM tree can be traversed and parsed arbitrarily. The Document tree elements are low level data structures. For higher level object structures, use JDOM or dom4j parsers instead.

Unlike an event-driver parser API, the DOM API is memory and CPU intensive.

The parsing process begins by using DocumentBuilderFactory to create a DocumentBuilder instance. The actual implementation is dictated by the value of the javax.xml.parsers.DocumentBuilderFactory system property. The DocumentBuilder produces a Document object as a result of a parse() method invocation. DOM and SAX parsers handle errors in a similar manner, the same exceptions are generated so the error handling code is virtually identical.

JAXP DOM and SAX use the same error handling mechanism: a JAXP-conformant document builder is required to report SAX exceptions when it has trouble parsing an XML document.

DOM Reference

DOM Reference

When to Use JAXP DOM, JDOM or dom4j?

DOM Tree Nodes as Objects

The data structures referred to from the tree produced by a DOM parser are low-level structures, as DOM is intended to be language neutral, and not oriented towards objects. It is the difference in what constitutes a "node" in the data hierarchy that primarily accounts for the differences in programming with these APIs.

Also, because DOM needs to support a mixed content model, the DOM nodes are inherently very simple. The fact that the "content" of an XML element is the name of the element itself, and not what follows between the start and end brackets is emblematic of this fact. The value of an element is not the same as its content.

With JDOM or dom4j, each node in the hierarchy is an object. These APIs are not primarily designed to support a mixed content situation.

Validation

The JAXP DOM implementation supports XML Schema, so documents can be validated on parsing, which is not the case with JDOM and dom4j.

Capability to Handle Mixed Content

The DOM API supports a mixed content model.

JDOM and dom4j allow handling of mixed content, but they are primarily designed for applications where the XML structure contains data, and the data typically is either text, or other elements, but not both.

DOM Examples

DOM Examples

Component Packages

  • javax.xml.parsers defines DocumentBuilderFactory and DocumentBuilder classes, and error types.
  • org.w3c.dom defines the Document class and other DOM components.