JAXP DOM Reference
External
- Document Object Model (DOM) Level 3 Core Specification https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/
Internal
Overview
All JAXP DOM interfaces are part of the org.w3c.dom
package.
For an example of how to walk a DOM tree, see:
Document
Node
The process of navigating to a node involves processing sub-elements, ignoring the uninteresting ones and inspecting the interesting ones, recursively. A robust DOM application must do these things:
- When searching for an element
- ignore comments, attributes and processing instructions
- allow for the possibility that sub-elements do not occur in the expected order
- skip over TEXT nodes that contain ignorable white space. Warning new lines in the file are returned as text, so they have to be handled.
- When extracting text for a node:
- extract text from CDATA as well as text nodes
- ignore comments, attributes and processing instructions when gathering text
- if an entity reference node or another element node is encountered, recurse.
Node Types
A node type can be obtained with getNodeType() call, and it is one of the following:
ELEMENT_NODE
Node type: 1 (Node.ELEMENT_NODE)
ATTRIBUTE_NODE
Node type: 2
TEXT_NODE
Node type: 3
CDATA_SECTION_NODE
Node type: 4
ENTITY_REFERENCE_NODE
Node type: 5
ENTITY_NODE
Node type: 6
PROCESSING_INSTRUCTION_NODE
Node type: 7
COMMENT_NODE
Node type: 8
DOCUMENT_NODE
Node type: 9
DOCUMENT_TYPE_NODE
Node type: 10
DOCUMENT_FRAGMENT_NODE
Node type: 11
NOTATION_NODE
Node type: 12
Node Name, Value and Attributes
Interface | nodeName | nodeValue | attributes |
Element | Element.tagName | null | NamedNodeMap |
Text | "#text" | same as CharacterData.data, the content of the text node | null |
Attr | same as Attr.name | same as Attr.value | null |
CDATASection | "#cdata-section" | same as CharacterData.data, the content of the CDATA Section | null |
Comment | "#comment" | same as CharacterData.data, the content of the comment | null |
Document | "#document" | null | null |
DocumentFragment | "#document-fragment" | null | null |
DocumentType | same as DocumentType.name | null | null |
Entity | entity name | null | null |
Notation | notation name | null | null |
ProcessingInstruction | same as ProcessingInstruction.target | same as ProcessingInstruction.data | null |
Node's Text Content
To get the text a node contains, you need to look through the list of child nodes, ignoring entries that are of no concern and accumulating the text you find in TEXT nodes, CDATA nodes, and EntityRef nodes.
Element
An Element extends a Node and it has node type of 1.