JAXP DOM Reference

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

All JAXP DOM interfaces are part of the org.w3c.dom package.

For an example of how to walk a DOM tree, see:

https://github.com/NovaOrdis/playground/tree/master/java/xml/dom-reading

Document

Node

The process of navigating to a node involves processing sub-elements, ignoring the uninteresting ones and inspecting the interesting ones, recursively. A robust DOM application must do these things:

  • When searching for an element
    • ignore comments, attributes and processing instructions
    • allow for the possibility that sub-elements do not occur in the expected order
    • skip over TEXT nodes that contain ignorable white space. Warning new lines in the file are returned as text, so they have to be handled.
  • When extracting text for a node:
    • extract text from CDATA as well as text nodes
    • ignore comments, attributes and processing instructions when gathering text
    • if an entity reference node or another element node is encountered, recurse.

Node Types

A node type can be obtained with getNodeType() call, and it is one of the following:

ELEMENT_NODE

Node type: 1 (Node.ELEMENT_NODE)

ATTRIBUTE_NODE

Node type: 2

TEXT_NODE

Node type: 3

CDATA_SECTION_NODE

Node type: 4

ENTITY_REFERENCE_NODE

Node type: 5

ENTITY_NODE

Node type: 6

PROCESSING_INSTRUCTION_NODE

Node type: 7

COMMENT_NODE

Node type: 8

DOCUMENT_NODE

Node type: 9

DOCUMENT_TYPE_NODE

Node type: 10

DOCUMENT_FRAGMENT_NODE

Node type: 11

NOTATION_NODE

Node type: 12

Node Name, Value and Attributes

Interface nodeName nodeValue attributes
Element Element.tagName null NamedNodeMap
Text "#text" same as CharacterData.data, the content of the text node null
Attr same as Attr.name same as Attr.value null
CDATASection "#cdata-section" same as CharacterData.data, the content of the CDATA Section null
Comment "#comment" same as CharacterData.data, the content of the comment null
Document "#document" null null
DocumentFragment "#document-fragment" null null
DocumentType same as DocumentType.name null null
Entity entity name null null
Notation notation name null null
ProcessingInstruction same as ProcessingInstruction.target same as ProcessingInstruction.data null

Node's Text Content

To get the text a node contains, you need to look through the list of child nodes, ignoring entries that are of no concern and accumulating the text you find in TEXT nodes, CDATA nodes, and EntityRef nodes.

Element

An Element extends a Node and it has node type of 1.