Creating an Index on an XML Fragment in SQL Server

Imagine you are given a table with a column of XML in it and a few million rows of data. Further imagine that you want to write an efficient query using an attribute of that XML in the WHERE clause of your query.

A short test scenario describes how I recently dealt with the problem. First, a basic table definition. The XML will end up in the SomeContent field.

CREATE TABLE [dbo].[MarcTest]
(
    [ID] [int] NOT NULL,
    [TypeCode] [varchar](20) NOT NULL,
    [SomeContent] [varchar](MAX) NOT NULL
CONSTRAINT [PK_MarcTest] PRIMARY KEY CLUSTERED
    ([ID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

Now for some test data. As in my real-life scenario, the schema of the XML data varies from row to row.

INSERT INTO marctest (ID, TypeCode, SomeContent) VALUES(1, ‘CODE1’, ‘<MyDoc><MetaData><EmployeeID>13791</EmployeeID><Name>Marc Moore</Name></MetaData><ContentData><ContentField1>This is some content.</ContentField1><ContentField2>This is some additional content.</ContentField2></ContentData></MyDoc>’)
INSERT INTO marctest (ID, TypeCode, SomeContent) VALUES(2, ‘CODE2’, ‘<MyDoc><MetaData><EmployeeID>4</EmployeeID><Name>Xi Zhang</Name></MetaData><ContentData><ContentField1>This is some content.</ContentField1><ContentField2>This is some additional content.</ContentField2></ContentData></MyDoc>’)
INSERT INTO marctest (ID, TypeCode, SomeContent) VALUES(1001, ‘OTHER1’, ‘<SomeOtherData><Data1>111111</Data1><Data2>222222</Data2>333333<Data3></Data3></SomeOtherData>’)
GO

It’s not possible to create an index directly on an XML column’s XPath expression and the XML indexes offered by SQL Server 2008 have a bad reputation for being bloated. So what to do? The solution that follows was blessed by David, our DBA, as a valid one.

Based on this question and answer about SQL, XML, and XPath, what I first wanted to do was create a computed column based on an XPATH expression like this one:

Unfortunately that does not work. There are a couple of reason for this: The SomeContent column is not an XML column, which doesn’t help, nor does the fact that SQL considers this expression non-deterministic.

Various articles led me to create this scalar function to help solve the problem:

FUNCTION MarcTestEmployeeID
(
@SomeXml xml
)
RETURNS INT
AS
BEGIN

    DECLARE @ret INT = @SomeXml.value(‘(/MyDoc/MetaData/EmployeeID/node())[1]’, ‘int’)
    IF @ret IS NULL
        SET @ret = 0
    RETURN @ret

END

Now SQL was happy and let me create the computed column

ALTER TABLE MarcTest ADD EmployeeID as ([dbo].[MarcTestEmployeeID]([SomeContent]))

Unfortunately, when I attempted to create my index:

CREATE INDEX MarcTestEmployeeID_IDX ON dbo.MarcTest(TypeCode, EmployeeID)

SQL became unhappy once more:

Msg 2729, Level 16, State 1, Line 2
Column ‘EmployeeID’ in table ‘dbo.MarcTest’ cannot be used in an index or statistics or as a partition key because it is non-deterministic.

I’m sure SQL has its own reasons for doing this, but its logic was clearly incorrect as far as my function goes. Given a blob of XML, dbo.MarcTestEmployeeID will always return the same answer for that XML; ergo, it is deterministic. Yet SQL remained unhappy.

Happily, this question and answer about deterministic scalar functions solved the issue with a one-line change to the function definition:

FUNCTION MarcTestEmployeeID
(
@SomeXml xml
)
RETURNS INT
WITH SCHEMABINDING
…

After creating my index, my sample “SELECT *” query returned the following, which is just what I would expect:

Next, I inserted a couple of thousand more records to make things more interesting and ran this query, which is similar to the one I plan to run in the real world:

SELECT * FROM MarcTest WHERE TypeCode = ‘CODE1’ AND EmployeeID = 13791

The execution plan indicates that my new index that’s built on the computed column using the UDF and XML data is being used:

Exactly what was hoped for!

Marc Moore

You might also like

iPhone App Images Needed

Visual Studio Missing Report Data Window

WordPress Fix

Leave a Reply