The puddle on my XPath

I always tried to stay out of the XML world, but there is no escaping for me now. Currently, I have to deal with the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH). This appears to be a popular protocol to disclose public archives. Technically, it boils down to XML messages with lot’s of different namespaces.To give you an example:

<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2009-02-26T15:15:22Z</responseDate>
<request verb="GetRecord" metadataPrefix="abc" identifier="AB.abcid1">http://app.archive.nl/oai</request>
<GetRecord>
<record>
<header>
<identifier>AB.abcid1</identifier>
<datestamp>2009-02-24</datestamp>
<setSpec>AB</setSpec>
</header>
<metadata>
<ns1:ns1 xsi:schemaLocation="http://www.archive.nl/abc/1.0 http://www.archive.nl/xsd/abc
/ns1.xsd"xmlns:abc="http://www.archive.nl/abc/1.0">
<ns1:beginTag xml:lang="en-us">
....

Now, how can you select with XPath the content between the metadata tags?

The metadata tag is in the default namespace, so my first guess was to use

[//record/metadata/*]

as XPath. But to my surprise this delivers not the content between the metadata tags. It appears that although the tags record and metadata are in the default namespace, there is no link made with the default namespaceuri.The solution is to create an artificial NamespaceContext with a made up prefix and the default namespaceuri. Let’s take for the sake of the argument ‘oai’ as prefix for the default “http://www.openarchives.org/OAI/2.0/” namespaceuri. The XPath expression would then look like this

[//oai:record/oai:metadata/*].

This delivers the content between the metadata tags.

Every technology has it’s quirks, but I hope not to meet many of them in the XML world. XML appears on the surface simple, but combined with XPath and XSLT you really can find yourself lost. I rather stick to Java were I am in my natural habitat.