{"id":787,"date":"2012-07-20T22:27:00","date_gmt":"2012-07-20T22:27:00","guid":{"rendered":"http:\/\/mooredynasty.net\/?p=787"},"modified":"2014-12-18T22:27:48","modified_gmt":"2014-12-18T22:27:48","slug":"creating-an-index-on-an-xml-fragment-in-sql-server","status":"publish","type":"post","link":"https:\/\/mooredynasty.net\/index.php\/2012\/07\/creating-an-index-on-an-xml-fragment-in-sql-server\/","title":{"rendered":"Creating an Index on an XML Fragment in SQL Server"},"content":{"rendered":"<p>Imagine you are given a table with a column of XML in it and a few million rows of data.&#160; Further imagine that you want to write an efficient query using an attribute of that XML in the WHERE clause of your query.<\/p>\n<p>A short test scenario describes how I recently dealt with the problem.&#160; First, a basic table definition.&#160; The XML will end up in the SomeContent field.<\/p>\n<p>CREATE TABLE [dbo].[MarcTest]    <br \/>(     <br \/>&#160;&#160;&#160; [ID] [int] NOT NULL,     <br \/>&#160;&#160;&#160; [TypeCode] [varchar](20) NOT NULL,     <br \/>&#160;&#160;&#160; [SomeContent] [varchar](MAX) NOT NULL     <br \/>CONSTRAINT [PK_MarcTest] PRIMARY KEY CLUSTERED     <br \/>&#160;&#160;&#160; ([ID] ASC)WITH (PAD_INDEX&#160; = OFF, STATISTICS_NORECOMPUTE&#160; = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS&#160; = ON, ALLOW_PAGE_LOCKS&#160; = ON) ON [PRIMARY]     <br \/>) ON [PRIMARY]     <br \/>GO<\/p>\n<p>Now for some test data.&#160; As in my real-life scenario, the schema of the XML data varies from row to row.<\/p>\n<p>INSERT INTO marctest (ID, TypeCode, SomeContent) VALUES(1, &#8216;CODE1&#8217;, &#8216;&lt;MyDoc&gt;&lt;MetaData&gt;&lt;EmployeeID&gt;13791&lt;\/EmployeeID&gt;&lt;Name&gt;Marc Moore&lt;\/Name&gt;&lt;\/MetaData&gt;&lt;ContentData&gt;&lt;ContentField1&gt;This is some content.&lt;\/ContentField1&gt;&lt;ContentField2&gt;This is some additional content.&lt;\/ContentField2&gt;&lt;\/ContentData&gt;&lt;\/MyDoc&gt;&#8217;)    <br \/>INSERT INTO marctest (ID, TypeCode, SomeContent) VALUES(2, &#8216;CODE2&#8217;, &#8216;&lt;MyDoc&gt;&lt;MetaData&gt;&lt;EmployeeID&gt;4&lt;\/EmployeeID&gt;&lt;Name&gt;Xi Zhang&lt;\/Name&gt;&lt;\/MetaData&gt;&lt;ContentData&gt;&lt;ContentField1&gt;This is some content.&lt;\/ContentField1&gt;&lt;ContentField2&gt;This is some additional content.&lt;\/ContentField2&gt;&lt;\/ContentData&gt;&lt;\/MyDoc&gt;&#8217;)     <br \/>INSERT INTO marctest (ID, TypeCode, SomeContent) VALUES(1001, &#8216;OTHER1&#8217;, &#8216;&lt;SomeOtherData&gt;&lt;Data1&gt;111111&lt;\/Data1&gt;&lt;Data2&gt;222222&lt;\/Data2&gt;333333&lt;Data3&gt;&lt;\/Data3&gt;&lt;\/SomeOtherData&gt;&#8217;)     <br \/>GO<\/p>\n<p>It\u2019s not possible to create an index directly on an XML column\u2019s XPath expression and the XML indexes offered by SQL Server 2008 have a bad reputation for being bloated.&#160; So what to do?&#160; The solution that follows was blessed by David, our DBA, as a valid one.<\/p>\n<p>Based on <a href=\"http:\/\/stackoverflow.com\/questions\/899313\/select-values-from-xml-field-in-sql-server-2008\">this question and answer about SQL<\/a>, XML, and XPath, what I first wanted to do was create a computed column based on an XPATH expression like this one:<\/p>\n<p><a href=\"http:\/\/bcs.system.tamus.edu\/images\/MarcBlogs\/image-8-CD28237F74C3.png\"><img loading=\"lazy\" decoding=\"async\" title=\"image\" border=\"0\" alt=\"image\" src=\"http:\/\/bcs.system.tamus.edu\/images\/MarcBlogs\/image_thumb-4-743988998D1A.png\" width=\"574\" height=\"94\" \/><\/a><\/p>\n<p>Unfortunately that does not work.&#160; There are a couple of reason for this: The SomeContent column is not an XML column, which doesn\u2019t help, nor does the fact that SQL considers this expression non-deterministic.<\/p>\n<p>Various articles led me to create this scalar function to help solve the problem:<\/p>\n<p>FUNCTION MarcTestEmployeeID    <br \/>(     <br \/>&#160;&#160;&#160; @SomeXml xml     <br \/>)     <br \/>RETURNS INT     <br \/>AS     <br \/>BEGIN<\/p>\n<p>&#160;&#160;&#160; DECLARE @ret INT = @SomeXml.value(&#8216;(\/MyDoc\/MetaData\/EmployeeID\/node())[1]&#8217;, &#8216;int&#8217;)    <br \/>&#160;&#160;&#160; IF @ret IS NULL     <br \/>&#160;&#160;&#160;&#160;&#160;&#160;&#160; SET @ret = 0     <br \/>&#160;&#160;&#160; RETURN @ret<\/p>\n<p>END<\/p>\n<p>Now SQL was happy and let me create the computed column<\/p>\n<p>ALTER TABLE MarcTest ADD EmployeeID as ([dbo].[MarcTestEmployeeID]([SomeContent]))<\/p>\n<p>Unfortunately, when I attempted to create my index:<\/p>\n<p>CREATE INDEX MarcTestEmployeeID_IDX ON dbo.MarcTest(TypeCode, EmployeeID)<\/p>\n<p>SQL became unhappy once more:<\/p>\n<p>Msg 2729, Level 16, State 1, Line 2    <br \/>Column &#8216;EmployeeID&#8217; in table &#8216;dbo.MarcTest&#8217; cannot be used in an index or statistics or as a partition key because it is non-deterministic.<\/p>\n<p>I\u2019m sure SQL has its own reasons for doing this, but its logic was clearly incorrect as far as my function goes.&#160; Given a blob of XML, dbo.MarcTestEmployeeID will always return the same answer for that XML; ergo, it is deterministic.&#160; Yet SQL remained unhappy.<\/p>\n<p>Happily, <a href=\"http:\/\/stackoverflow.com\/questions\/3651662\/sql-server-deterministic-user-defined-function\">this question and answer about deterministic scalar functions<\/a> solved the issue with a one-line change to the function definition:<\/p>\n<p>FUNCTION MarcTestEmployeeID    <br \/>(     <br \/>&#160;&#160;&#160; @SomeXml xml     <br \/>)     <br \/>RETURNS INT     <br \/><strong>WITH SCHEMABINDING      <br \/><\/strong>\u2026<\/p>\n<p>After creating my index, my sample \u201cSELECT *\u201d query returned the following, which is just what I would expect:<\/p>\n<p><a href=\"http:\/\/bcs.system.tamus.edu\/images\/MarcBlogs\/image-12-38595AEBB6A5.png\"><img loading=\"lazy\" decoding=\"async\" title=\"image\" border=\"0\" alt=\"image\" src=\"http:\/\/bcs.system.tamus.edu\/images\/MarcBlogs\/image_thumb-6-7AD4BB4C3232.png\" width=\"424\" height=\"67\" \/><\/a><\/p>\n<p>Next, I inserted a couple of thousand more records to make things more interesting and ran this query, which is similar to the one I plan to run in the real world:<\/p>\n<p>SELECT * FROM MarcTest WHERE TypeCode = &#8216;CODE1&#8217; AND EmployeeID = 13791<\/p>\n<p>The execution plan indicates that my new index that\u2019s built on the computed column using the UDF and XML data is being used:<\/p>\n<p><a href=\"http:\/\/bcs.system.tamus.edu\/images\/MarcBlogs\/image-17-B9C14C765BE1.png\"><img loading=\"lazy\" decoding=\"async\" title=\"image\" border=\"0\" alt=\"image\" src=\"http:\/\/bcs.system.tamus.edu\/images\/MarcBlogs\/image_thumb-9-AF689DC943FF.png\" width=\"573\" height=\"208\" \/><\/a><\/p>\n<p>Exactly what was hoped for!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Imagine you are given a table with a column of XML in it and a few million rows of data.&#160; Further imagine that you want to write an efficient query using an attribute of that XML in the WHERE clause of your query. A short test scenario describes how I &hellip; <\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15],"tags":[],"class_list":["post-787","post","type-post","status-publish","format-standard","hentry","category-development"],"_links":{"self":[{"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/posts\/787","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/comments?post=787"}],"version-history":[{"count":1,"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/posts\/787\/revisions"}],"predecessor-version":[{"id":788,"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/posts\/787\/revisions\/788"}],"wp:attachment":[{"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/media?parent=787"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/categories?post=787"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mooredynasty.net\/index.php\/wp-json\/wp\/v2\/tags?post=787"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}