PHP DOM: explained and exemplified

by Adrian Mejia in August 06, 2011 and last update was in August 12, 2011

This is guide to get started with PHP DOM or a quick reminder to those who have a little while since the last time they used it. The extended documentation is in, but it is quite long. Here you might found a quick reference to get started in no time.

Purpose of the DOM (Docuement Object Model): It is a convention used to represent and manipulate objects in XML, XHTML and HTML documents. Parsing XML and HTML files is very useful. It allows to manipulate RSS Feeds, interact with APIs and web services through XML (e.g. Google Maps, Facebook and Twitter APIs, etc.), extract information from websites (web crawling) and more. 

Getting Started

The DOM implementation in PHP have more than 15 classes! But don't get afraid, for most cases, you might just end up using these ones: DOMNode, DOMDocument, DOMNodeList and DOMElement. In the following UML class diagram of PHP's DOM you will see how these classes are related to each other and them the explanation of each one.

PHP DOM UML Class Diagram (lean)

Fig 1.  PHP DOM: UML Class Diagram

Loading and Saving DOM Documents

DOMDocument — The DOMDocument class which exteds from DOMNode. This class contains the XML (or HTML) elements and configurations. It has configurations attributes, such as format output, preserve white spaces, versions, etc.

DOMDocument must-know methods (part 1: load and save)

  • Load: load XML (or HTML) documents. There are different types of loads (quite self-explanatories)
  • Save: it is used to present (screen or file) the whole DOM document.

 Example using DOMDocument for loading and showing HTML:

  $dom = new DOMDocuement;
  $dom->loadHTML('');  // load website content to DOM 
  echo $dom->save();  // print to screen

Iterating through DOM Elements

The first thing you need to do after loading the XML that you want to process, it's to select the data that you are intereted in. To search for you data you need to iterate through the DOM elements and you need to know what methods and objects are using in this process.

DOMDocument must-know methods (part 2: get data)

You may notice that the above methods returns DOMElement and DOMNodeList objects. Now we will explore the properties and attributes that you need to know in order to get the data.
DOMNodeList — class that contains DOMNodes collection.
DOMNodeList must-know elements (part 3: get data from nodes collection)
DOMElement — class that extends DOMNode and add new methods but we don't need those for iterating through nodes.
DOMNode — The DOMNode class is the pillar class and it is used by all others classes directly or indirectly by one of its children classes.
DOMNode must-know properties (part 4: get node data)
Example using DOMDocument for loading and showing HTML
(status: not finished yet)

Tags: php, dom, xml

Add a new comment