Handling XML

From Cosmin's Wiki

Jump to: navigation, search

Home > PHP > Handling XML


Contents

What is XML?

XML is a data storage format. It doesn't define what data is being stored or the structure of that data. XML simply defines tags and attributes for those tags. A properly formed XML tag looks like this:

<name>Jack Herrington</name>

This <name> tag contains some text: Jack Herrington.

An XML tag that contains no text looks like this:

<powerUp />

There may be more than one way to code something in XML. For instance, this tag produces the same output as the previous one:

<powerUp></powerUp>

You can also add attributes to an XML tag. For example, this <name> tag contains first and last attributes:

<name first="Jack" last="Herrington" />

An XML book list example

<books>
  <book>
  <author>Jack Herrington</author>
  <title>PHP Hacks</title>
  <publisher>O'Reilly</publisher>
  </book>
  <book>
  <author>Jack Herrington</author>
  <title>Podcasting Hacks</title>
  <publisher>O'Reilly</publisher>
  </book>
  </books>

An XML document that contains tags and attributes formatted like the examples provided is well formed, which means the tags are balanced, and the characters are encoded properly. Listing 1 is an example of well-formed XML.

The XML in the listing above contains a list of books. The parent <books> tag includes a set of <book> tags that each contain <author>, <title>, and <publisher> tags.

The DOM

Reading XML using the DOM library

The easiest way to read a well-formed XML file is to use the Document Object Model (DOM) library compiled into some installations of PHP. The DOM library reads the entire XML document into memory and represents it as a tree of nodes, as illustrated in Figure below:
Image:Figure1.gif
The books node at the top of the tree has two child book tags. Within each book, there are author, publisher, and title nodes. The author, publisher, and title nodes each have child text nodes that contain the text.

The code to read the books XML file and display the contents using the DOM is shown in the following listing:
Reading xml with the DOM

 
<?php
  $doc = new DOMDocument();
  $doc->load( 'books.xml' );
 
  $books = $doc->getElementsByTagName( "book" );
  foreach( $books as $book )
  {
  $authors = $book->getElementsByTagName( "author" );
  $author = $authors->item(0)->nodeValue;
 
  $publishers = $book->getElementsByTagName( "publisher" );
  $publisher = $publishers->item(0)->nodeValue;
 
  $titles = $book->getElementsByTagName( "title" );
  $title = $titles->item(0)->nodeValue;
 
  echo "$title - $author - $publisher\n";
  }
  ?>
 

The script starts by creating a new DOMdocument object and loading the books XML into that object using the load method. After that, the script uses the getElementsByName method to get a list of all of the elements with the given name.

Within the loop of the book nodes, the script uses the getElementsByName method to get the nodeValue for the author, publisher, and title tags. The nodeValue is the text within the node. The script then displays those values.


Writing XML with the DOM

Reading XML is only one part of the equation. What about writing it? The best way to write XML is to use the DOM. The listing below shows how the DOM builds the books XML file.

 
<?php
  $books = array();
  $books [] = array(
  'title' => 'PHP Hacks',
  'author' => 'Jack Herrington',
  'publisher' => "O'Reilly"
  );
  $books [] = array(
  'title' => 'Podcasting Hacks',
  'author' => 'Jack Herrington',
  'publisher' => "O'Reilly"
  );
 
  $doc = new DOMDocument();
  $doc->formatOutput = true;
 
  $r = $doc->createElement( "books" );
  $doc->appendChild( $r );
 
  foreach( $books as $book )
  {
  $b = $doc->createElement( "book" );
 
  $author = $doc->createElement( "author" );
  $author->appendChild(
  $doc->createTextNode( $book['author'] )
  );
  $b->appendChild( $author );
 
  $title = $doc->createElement( "title" );
  $title->appendChild(
  $doc->createTextNode( $book['title'] )
  );
  $b->appendChild( $title );
 
  $publisher = $doc->createElement( "publisher" );
  $publisher->appendChild(
  $doc->createTextNode( $book['publisher'] )
  );
  $b->appendChild( $publisher );
 
  $r->appendChild( $b );
  }
 
  echo $doc->saveXML();
  ?>
 

At the top of the script, the books array is loaded with some example books. That data could come from the user or from a database.

After the example books are loaded, the script creates a new DOMDocument and adds the root books node to it. Then the script creates an element for the author, title, and publisher for each book and adds a text node to each of those nodes. The final step for each book node is to re-attach it to the root books node.

The end of the script dumps the XML to the console using the saveXML method. (You can also use the save method to create a file from the XML.)


SimpleXML

Introduction

The SimpleXML extension provides a very simple and easily usable toolset to convert XML to an object that can be processed with normal property selectors and array iterators.

Requirements

The SimpleXML extension requires PHP 5.

Installation

The SimpleXML extension is enabled by default. To disable it, use the --disable-simplexml configure option.

Including an xml file

Many examples in this reference require an XML string. Instead of repeating this string in every example, we put it into a file which we include in each example. This included file is shown in the following example section. Alternatively, you could create an XML document and read it with simplexml_load_file().

 
<?php
$xmlstr = <<<XML
<?xml version='1.0' standalone='yes'?>
<movies>
 <movie>
  <title>PHP: Behind the Parser</title>
  <characters>
   <character>
   <name>Ms. Coder</name>
   <actor>Onlivia Actora</actor>
   </character>
   <character>
   <name>Mr. Coder</name>
   <actor>El Act&#211;r</actor>
   </character>
  </characters>
  <plot>
   So, this language. It''s like, a programming language. Or is it a
   scripting language? All is revealed in this thrilling horror spoof
   of a documentary.
  </plot>
  <rating type="thumbs">7</rating>
  <rating type="stars">5</rating>
 </movie>
</movies>
XML;
?> 
 

The simplicity of SimpleXML appears most clearly when one extracts a string or number from a basic XML document.

Getting <plot>

 
<?php
include 'example.php';
 
$xml = simplexml_load_string($xmlstr);
 
echo $xml->movie[0]->plot; // "So this language. It's like..."
?> 
 

Accessing non-unique elements in SimpleXML

When multiple instances of an element exist as children of a single parent element, normal iteration techniques apply.

 
<?php
include 'example.php';
 
$xml = simplexml_load_string($xmlstr);
 
/* For each <movie> node, we echo a separate <plot>. */
foreach ($xml->movie as $movie) {
   echo $movie->plot, '<br />';
}
 
?> 
 

Using attributes

So far, we have only covered the work of reading element names and their values. SimpleXML can also access element attributes. Access attributes of an element just as you would elements of an array.

 
<?php
include 'example.php';
 
$xml = simplexml_load_string($xmlstr);
 
/* Access the <rating> nodes of the first movie.
 * Output the rating scale, too. */
foreach ($xml->movie[0]->rating as $rating) {
   switch((string) $rating['type']) { // Get attributes as element indices
   case 'thumbs':
       echo $rating, ' thumbs up';
       break;
   case 'stars':
       echo $rating, ' stars';
       break;
   }
}
?> 
 


Comparing Elements and Attributes with Text

To compare an element or attribute with a string or pass it into a function that requires a string, you must cast it to a string using (string). Otherwise, PHP treats the element as an object.

 
<?php   
include 'example.php';
 
$xml = simplexml_load_string($xmlstr);
 
if ((string) $xml->movie->title == 'PHP: Behind the Parser') {
   print 'My favorite movie.';
}
 
htmlentities((string) $xml->movie->title);
?>
 

Using Xpath

SimpleXML includes builtin Xpath support. To find all <character> elements:

 
<?php
include 'example.php';
$xml = simplexml_load_string($xmlstr);
 
foreach ($xml->xpath('//character') as $character) {
   echo $character->name, 'played by ', $character->actor, '<br />';
}
?>
 

'//' serves as a wildcard. To specify absolute paths, omit one of the slashes.

Setting values

Data in SimpleXML doesn't have to be constant. The object allows for manipulation of all of its elements.

 
<?php
include 'example.php';
$xml = simplexml_load_string($xmlstr);
 
$xml->movie[0]->characters->character[0]->name = 'Miss Coder';
 
echo $xml->asXML();
?>
 

The above code will output a new XML document, just like the original, except that the new XML will change Ms. Coder to Miss Coder.

DOM Interoperability

PHP has a mechanism to convert XML nodes between SimpleXML and DOM formats. This example shows how one might change a DOM element to SimpleXML.

 
<?php
$dom = new domDocument;
$dom->loadXML('<books><book><title>blah</title></book></books>');
if (!$dom) {
     echo 'Error while parsing the document';
     exit;
}
 
$s = simplexml_import_dom($dom);
 
echo $s->book[0]->title;
?> 
 


List of SimpleXML functions

  • SimpleXMLElement->addAttribute() — Adds an attribute to the SimpleXML element
  • SimpleXMLElement->addChild() — Adds a child element to the XML node
  • SimpleXMLElement->asXML() — Return a well-formed XML string based on SimpleXML element
  • SimpleXMLElement->attributes() — Identifies an element's attributes
  • SimpleXMLElement->children() — Finds children of given node
  • SimpleXMLElement->__construct() — Creates a new SimpleXMLElement object
  • SimpleXMLElement->getDocNamespaces() — Returns namespaces declared in document
  • SimpleXMLElement->getName() — Gets the name of the XML element
  • SimpleXMLElement->getNamespaces() — Returns namespaces used in document
  • SimpleXMLElement->registerXPathNamespace() — Creates a prefix/ns context for the next XPath query
  • SimpleXMLElement->xpath() — Runs XPath query on XML data
  • simplexml_import_dom — Get a SimpleXMLElement object from a DOM node.
  • simplexml_load_file — Interprets an XML file into an object
  • simplexml_load_string — Interprets a string of XML into an object