Referential Integrity in xml

In this post I will try to explain a very useful feature of xsd; the ability to detect referential integrity constraint violations. Let's start of with some simple xml :

<root>
  <type name="A"/>
  <type name="B"/>
  <type name="C"/>

  <item type="C"/>
  <item type="A"/>
</root>

And the xsd defining this xml :
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:complexType name="type">
    <xsd:attribute name="name" type="xsd:string"/>
  </xsd:complexType>

  <xsd:complexType name="item">
    <xsd:attribute name="type" type="xsd:string"/>
  </xsd:complexType>

  <xsd:element name="root">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="type" minOccurs="0" maxOccurs="unbounded" type="type"/>
        <xsd:element name="item" minOccurs="0" maxOccurs="unbounded" type="item"/>
      </xsd:sequence>
    </xsd:complexType>

  </xsd:element>
</xsd:schema>
Again nothing new. At this stage, we have a definition in xsd format which can be used to test xml for conformance to our schema. However, any validation would be strictly structural. If we wanted to validate things like whether the items have type attributes which have been defined as type elements, or that there is only one type of element for each name, then we need something more than just the above xsd. In short, we need a definition which would encapsulate the fact that the following xml should be invalid (since the type D is not defined, and since type C is defined twice):
<root>
  <type name="A"/>
  <type name="B"/>
  <type name="C"/>
  <type name="C"/>

  <item type="D"/>
</root>

This can very simply be done by using the unique,key and keyref  xsd elements. Here is the xsd after the required modifications:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:complexType name="type">
    <xsd:attribute name="name" type="xsd:string"/>
  </xsd:complexType>

  <xsd:complexType name="item">
    <xsd:attribute name="type" type="xsd:string"/>
  </xsd:complexType>

  <xsd:element name="root">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="type" minOccurs="0" maxOccurs="unbounded" type="type"/>
        <xsd:element name="item" minOccurs="0" maxOccurs="unbounded" type="item"/>
      </xsd:sequence>
    </xsd:complexType>

    <xsd:unique name="unique_type_constraint">
      <xsd:selector xpath="type"/>
      <xsd:field xpath="@name"/>
    </xsd:unique>

    <xsd:key name="typeKey">
      <xsd:selector xpath="type"/>
      <xsd:field xpath="@name"/>
    </xsd:key>

    <xsd:keyref name="type_constraint" refer="typeKey">
      <xsd:selector xpath="item"/>
      <xsd:field xpath="@type"/>
    </xsd:keyref>

  </xsd:element>
</xsd:schema>
What we are doing is defining a 'lookup' using the key element. The key is defined by selecting the name attribute from the type elements under the root element. This 'lookup' is then linked to the type attributes of the items under root via the keyref element. Thus, restricting the values of the type attributes of the item element to the values found int the name attributes of the type elements.

The unique element is used to apply a uniquenesses constraint over the values of name attributes of the type elements.

I have written a simple Python script to test the above concepts. This script can be found here. Please note that you will have to install lxml in order to run the script.

No comments:

Post a Comment