Xerces Tutorial: Learn how to create a DOM Document using Xerces for C++

Introduction

This tutorial shows you how to create an empty XML DOM Document using Xerces for C++ using Visual Studio 9. Although I don't cover the actuall output for this application, if I did, it would produce an xml document that looked like this:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Hello_World/>

The main reason I've written this article, is because the official documentation for Xerces is a little light on sample code. Also, I could not find anyone else using Xerces for C++ on Visual Studio 2008 (v9).

Subversion Source Control Repository

I write computer software for the Win32 platform. I do this mostly using Visual Studio products by Microsoft. In the past I used Microsoft Visual Source Safe ("VSS") as my source code repository, however I am now using an open source replacement called Subversion. I've looked at several high-end source control repositories, however in my humble opinion, Subverion should suit the needs of almost any development team. It can be a bit scary at first (especially if you are used to VSS), however when combined with the TortoiseSVN client it's really quite easy. Having said that, I would strongly suggest that you also revise the way you use source control. There's little point in upgrading to Subverion if you are just going to use it like VSS. I do not use Subverion using the command line interface (and I would never like to either).

Setting up your Visual Studio 9 Environment (w/Subversion)

All mention of Subversion Source Control is optional - just ignore if you wish

The following is my own personal preferences for laying out files & folders on the HDD and how this integrates with Subversion source control. Have a look at the following screen snapshot of my Explorer (the icons have been modified by TortoiseSVN, but just ignore them for now).

Explorer with Subversion Externals
Figure 1 - My personal preference for laying out folders under [Subversion] source control. Special attributes on the externals folder, copies files from xerces into my current project - Xerces_Lesson_01.

I split my code into three main areas:

  1. The project that I'm working on. This is case Xerces_Lesson_01
  2. Code from other organizations such as Xerces (from the Apache project)
  3. ...and a special folder called externals

The main project (Xerces_Lesson_01) is split into branches, tags and trunk. This is the scheme recommended by Subversion source control which I've found very helpful. Code in the trunk always compiles and is never broken - It is considered my current working copy. Whenever I feel that I may 'break' code in order to refactor, fix bugs or experiment, I 'branch' off into a separate folder, make changes, then merge back into the trunk. Release versions are 'tagged' and moved appropriately.

If a project is composed of multiple projects or libraries (my own or 3rd party), I place these under source control too. In the example above the 3rd party library is xerces-c_2_7_0-windows_2000-msvc_60.

The externals folder has special attributes attached to it (thanks to TortoiseSVN), that copies files from one location to another. In the diagram above, files that have been copied to a new location are shaded red. Understand that these files are not just copied to Xerces_Lesson_01, but they are copied at the correct revision number that I specify. In this way, as I develop my project(s), I can step-back-in-time to any previous revision of my main project, and also pull in all dependences at that revision too. This is a very powerful technique for moving a project forwards with confidence, despite having multiple dependencies that may change over time.

Another reason for pulling in dependencies from other projects into an externals folder, is to keep all compiler and linker options tightly under source control as well (see next section).

Setting up Visual Studio 9 with Xerces

There's three things to setup before you can compile, link and run an executable:

  1. Tell Visual Studio (and/or the Project) where to find the Include files (*.h)
  2. Tell Visual Studio (and/or the Project) where to find the the Library files (*.lib)
  3. Copy the Dynamic Link Libraries (*.dll) to the same directory as the executable (and/or to the solution directory during debug.)

Assuming you've already download the Xerces binaries (containing include files, libraries and DLLs), you can reference Xerces in multiple ways. Perhaps the most common method is to add a reference to the various files using the VC++ Directories Options (as shown below).

Visual Studio Options
Figure 2 - Tell Visual Studio where the various Xerces files are located. You will need to do this for include files (*.h) and libraries (*.lib). Although this method is very popular, I don't use it or recommend it for a business environment

  • Advantages
    1. Quick, Simple and Easy.
  • Disadvantages
    1. The values are coupled to a particular instance of Visual Studio instead of the project you are working on. If you copy the project files to a different PC, you will have to re-establish these directories on the new PC.
    2. These values are not easily coupled with source control solutions (because they are not coupled to the project).
    3. If you have multiple developers working on the same projects, and one of them changes the values here, the change isn't automatically propagated to the other developers PCs.

As you can see from the list above, there's more disadvantages than advantages, therefore I do not recommend this method in a business environment where you are working on code with other developers. Consider using the following instead :

Visual Studio Project Options
Figure 3 - Place references to your 3rd party, or external directories in the Project Properties instead. This way they are tied very closely with the project.

While it's certainly possible to add the full path here, I usually don't and provide relative paths instead. In general I try to avoid absolute paths. This makes it easier for me to copy the entire project and all dependencies onto a memory stick and continue development on a different PC without changing anything. It also makes backups easier.

Link with Xerces Libraries

Visual Studio Project Options
Figure 4 - The process of establishing references to library files (*.lib) is similar to include files. Set the Project Linker options.

Include files

Unlike some other API's, Xerces has a lot of includes. This is because the library is seperated into many manageble sections and each section has it's own include.

Before using any feature of Xerces, you'll need to include the following file:

Utilities that must be implemented in a platform-specific way.

// Mandatory for using any feature of Xerces.
#include <xercesc/util/PlatformUtils.hpp>

The following header file allows us to use the C++ XML Document Object Model API. If you want to use SAX instead of DOM, then you'll need to use SAX includes instead.

// Use the Document Object Model (DOM) API
#include <xercesc/dom/DOM.hpp>

Namespaces

The following line (without semi-colon) will expand to use the correct Xerces namespace. The alternative is to prefix all Xerces code & functions with the XERCES_CPP_NAMESPACE:: namespace.

// Define namespace symbols
XERCES_CPP_NAMESPACE_USE

Initializing Xerces

Initialize Xerces and get the DOM implementation used for creating DOM documents.

// Initilize Xerces.
XMLPlatformUtils::Initialize();

DOM Implementation

Before we can work with the Document Object Model to create and manipulate documents, we need to get an instance of the DOM implementation code.

// Pointer to our DOMImplementation.
DOMImplementation * p_DOMImplementation = NULL;

// Get the DOM Implementation (used for creating DOMDocuments).
// Also see: http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html
p_DOMImplementation = DOMImplementationRegistry::getDOMImplementation(XMLString::transcode("core"));

Both Xerces and Xalan follow the official specifications for the XML and XSLT languages. These can be found at the w3c.

Question: What the heck is transcode?
Answer: Xerces (and Xalan) has been built with an international audience in mind; multiple languages, multiple platforms and multiple character encodings. The short answer is, transcoding helps take care of all these issues. Since this can be a big subject, I will leave this up to the reader to research (since this is supposed to be a tutorial for beginners).

Creating an Empty DOM document

The following code creates a DOM document containing a root node (with optional namespace). Both of these values can be any valid text (ie. no spaces or weird characters). The "root node" in this example is "Hello_World". This root node owns everything beneath it, just as if it were a separate object... in fact, it is. It's defined by the 'DOMNode' class.

// Pointer to our DOMDocument.
DOMDocument * p_DOMDocument = NULL;

// Create an empty DOMDocument.
p_DOMDocument = p_DOMImplementation->createDocument(0, L"Hello_World", 0);

Don't forget : XML is case-sensitive.

If you were to output the document at this point, you would get the following:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Hello_World/>

Clean Up

Clean up as usual :

// Cleanup.
p_DOMDocument->release();
XMLPlatformUtils::Terminate();

Problems and Errors

The following is a list of problems I've encountered compiling and linking with Xerces using Visual Studio v9 (2008)

If you get the following error:

error LNK2001: unresolved external symbol
"__declspec(dllimport) public: static wchar_t *
__cdecl xercesc_2_7::XMLString::transcode(char const * const)" (
__imp_?transcode@XMLString@xercesc_2_7@@SAPA_WQBD@Z)

Modify the following C++ language compiler option:

/Zc:wchar_t (wchar_t Is Native Type)

Downloads

  • Download Visual C++ (1,460KB) source code for this lesson
    Includes minimum files (includes, library and DLL) from Xerces. Win32, Visual Studio 9

Links