Continuing on with my Partitioning post series, this is part 3.
The partitioning includes several major components of work (and can be linked below);
- partitioning large existing non-partitioned tables
- measuring performance impacts of partitioned aligned indexes
- measuring performance impacts of DML triggers for enforcing partitioned unique indexes
- rebuilding tables that are already partitioned (ie. apply a new partitioning scheme)
- implementing partial backups and restores (via leveraging partitions)
- implementing partition aware index optimisation procedures
- Calculating table partition sizes in advance
This blog post deals with partition aligning non-clustered indexes (unique and non-unique) and then measuring performance impacts of DML triggers for enforcing partitioned unique indexes.
I will blog about the other steps later.
What Are My Alignment Options?
Ok, I assume you have got this far because you now want to align your NC indexes to your table partition scheme. Fine choice!
From a partitioning decision point – NC indexes only come in 2 flavors;
- Unique Indexes (and Primary Keys)
- Non-Unique Indexes
Lets talk the easier non-unique indexes first (before it gets more scary with unique indexes!).
There are rules around how you can align NC indexes. If you partition your table on say ColumnA then to align your NC indexes there are 2 options;
- If the NC index is unique then ColumnA MUST be part of the index key. It can be anywhere in the key, even the last column, but it must be there.
Hang on, wont this change my unique key!? AND for my Primary Key’s wont this impact my Foreign Key(s)?
Yes and Yes. Sorry about that, this is a nasty side effect which will be discussed later.
- If the NC index is not unique then ColumnA can either be part of the index key or within the INCLUDE clause.
Meh, so my index is larger, and that’s generally no big deal.
Create Playpen Table, Partition Function/Scheme and Load Data
Lets create our partitioned play pen table, partition function/scheme and then put some data in it. (We’ll use this table throughout this post.)
--Partition Function EXPANDED INT CREATE PARTITION FUNCTION pf_myorders_expanded (int) AS RANGE RIGHT FOR VALUES(0, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000) GO -- Partition Scheme EXPANDED INT CREATE PARTITION SCHEME ps_myorders_expanded AS PARTITION pf_myorders_expanded ALL TO ([PRIMARY]) GO --Partitioned table CREATE TABLE dbo.myorders ( myorder_id INT NOT NULL , myorder_key INT NOT NULL , myorder_date DATETIME NOT NULL , myorder_amt MONEY NOT NULL , myorder_details NVARCHAR(4000) DEFAULT '' NULL , CONSTRAINT pk_myorderid PRIMARY KEY CLUSTERED (myorder_id ASC) WITH (DATA_COMPRESSION = NONE) ON ps_myorders_expanded(myorder_id) ) GO --Load Data begin INSERT INTO dbo.myorders SELECT TOP 85000 ROW_NUMBER() OVER (ORDER BY o1.object_id) , CAST((ROW_NUMBER() OVER (ORDER BY o1.object_id) * RAND(1) * 2) AS INT) , DATEADD(hh, (ROW_NUMBER() OVER (ORDER BY o1.object_id)) / 5, DATEADD(dd, -730, '2015-05-09 00:43')) , RAND(ROW_NUMBER() OVER (ORDER BY o1.object_id)) * RAND(ROW_NUMBER() OVER (ORDER BY o2.object_id)) * 730 , REPLICATE('X', RAND(o1.object_id) * 1000) FROM master.sys.objects o1 CROSS JOIN master.sys.objects o2 CROSS JOIN master.sys.objects o3 ORDER BY 1 end GO
Partition Aligning Non-Unique Indexes
Lets quickly cover this one off and create an aligned non-unique NC index on the partitioned table. The create statement for the NC index specifies the partition function using myorder_id.
CREATE NONCLUSTERED INDEX myorders_notunique_aligned ON dbo.myorders (myorder_date) ON ps_myorders_expanded (myorder_id) -- On Partition Scheme GO
Some key points here;
- What it does under the hood is essentially create an index b-tree on top of each partition defined by ps_myorders_expanded (myorder_id). In this case there are 11 partitions, so think of them as 11 small NC indexes operating independently. (see previous post on the performance implications of this)
- If the non-unique index create does not specify the partition column, then by default SQL will automatically add the partition key myorder_id to the NC index INCLUDE clause and thereby partition the index that way.
- If you have a fast tempdb, then I suggest adding WITH (SORT_IN_TEMPDB = ON) to all NC index creates as this will speed up the process and prevent unnecessary expansion of the database data files during the build.
- And that’s it people – nothing else special here!
Partition Aligning Unique Indexes
There are gottchas to think about before doing this;
- The partitioning column MUST be part of the index key. It can be anywhere in the key, even the last column, but it must be there.
- You cannot put the partitioning column in the INCLUDE statement.
- If you try to create the index without the partitioning column in the key you will get an error like this;
Msg 1908, Level 16, State 1, Line 146
Column ‘myorder_id’ is partitioning column of the index ‘myorders_unique_aligned’. Partition columns for a unique index must be a subset of the index key.
- If your index was already unique and the partitioning column wasn’t in the key, then by adding the column the index is no longer unique.
- If this was a Primary Key then this can have dramatic impacts to your Foreign Keys and all related tables. (see the Summary for a discussion on this)
- If this is so, you need to look at options to enforce uniqueness, such as a DML Trigger to capture INSERT and UPDATE. And this is where the fun starts.
Now lets create an aligned unique NC index on the partitioned table.
CREATE UNIQUE NONCLUSTERED INDEX myorders_unique_aligned ON dbo.myorders (myorder_key, myorder_id) ON ps_myorders_expanded(myorder_id) -- On Partition Scheme GO
Enforcing Uniqueness via a DML Trigger
There are several ways to ensure that the NC index above remains unique on myorder_key but probably the safest is using an Instead Of Insert DML Trigger.
The things to consider before doing this;
- The trigger must work for both singleton and batch INSERT.
- The trigger must work for both singleton and batch UPDATE.
- The trigger must check if the value for myorder_key already exists
- Ensure there is a supporting index on the search column myorder_key (however this should be a gimmie as that’s the index you wanted to create anyway, right?!)
- Not surprisingly there is a performance overhead incurred to every insert…
Lets create the DML Trigger and then test the performance implications. NOTE that this is only the INSERT trigger.
CREATE TRIGGER dbo.IOI_myorders_unique_aligned ON dbo.myorders instead OF INSERT AS BEGIN SET nocount ON IF EXISTS ( SELECT TOP 1 1 FROM dbo.myorders myo INNER JOIN inserted i ON myo.[myorder_key] = i.[myorder_key] ) BEGIN RAISERROR ('ERROR: A duplicate key has been found in [dbo].[myorders] for [myorder_key]. The insert has been rolled back.',16,1) ROLLBACK TRANSACTION RETURN END ELSE BEGIN INSERT INTO dbo.myorders SELECT * FROM inserted END END GO
Instead of Insert DML Trigger Performance – Singleton
Lets see the impacts of a singleton insert.
SET STATISTICS IO ON GO INSERT INTO [dbo].[myorders] SELECT 100000, 100000, GETDATE(), 100, 'My New Order' GO
Table 'myorders'. Scan count 12, logical reads 18, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'myorders'. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Singleton Insert Assessment;
- Capturing the insert via the DML trigger, checking for key duplicates in myorder_key and the re-inserting the single row is 60% of the total query cost. Inserting the row is only 40%.
- The logical reads needed to check if it is OK to perform the insert is 18, and the logical reads to actually do the insert is 8. So it costs 55% more to operate the DML Trigger on a singleton insert.
Instead of Insert DML Trigger Performance – Batch
Lets see the impacts of a batch insert with 1,000 rows.
-- Create a Batch of 1000 Rows SELECT top 1000 (ROW_NUMBER() OVER (ORDER BY o1.object_id)) + 200000 AS myorder_id, (ROW_NUMBER() OVER (ORDER BY o1.object_id)) + 200000 AS myorder_key, GETDATE() AS myorder_date, 100 AS myorder_amt, NULL as myorder_desc INTO #myorders_batch FROM sys.objects o1 CROSS JOIN sys.objects o2 GO -- Load the Batch into the Table INSERT INTO [dbo].[myorders] SELECT * FROM #myorders_batch GO
Table 'myorders'. Scan count 12000, logical reads 20000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'myorders'. Scan count 0, logical reads 6317, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Batch Insert Assessment;
- Capturing the insert via the DML trigger, checking for key duplicates in myorder_key and the re-inserting of the 1,000 rows is now a whopping 91% of the total query cost. Inserting the row is now only a mere 9%.
- The logical reads needed to check if it is OK to perform the insert is 20,000, and the logical reads to actually do the insert is 6,317. So now it costs 68% more to operate the DML trigger on a batch insert.
- The cost to execute the DML Trigger seems to increase faster than a linear rate. At 10,000 rows the cost for the DML Trigger check is 99% of the batch and over 50,000 logical reads.
And So in Summary…
If you want to align a unique NC index (or Primary Key) to your partitioned table then the DML Trigger is the safest option as it captures inserts regardless of how they gets to the table (ie. an application, ETL process, adhoc insert, etc).
The DML Trigger will add a overhead (sometimes significant) to your inserts.
If you have a Primary Key and need to add the new partition column then it can have dramatic knock on effects; think about all the related tables, their Primary Keys and the Foreign Keys, etc. If you have a trusted application that you control (ie like a DW/ETL solution) then you could consider removing the FK’s on the affected tables altogether. Still, not great.
Lastly, this DML Trigger only captures INSERTs, if you anticipate UPDATEs could change the key then you need an INSTEAD OF UPDATE trigger. In that trigger you should check using the UPDATE (ColumnA) clause if the unique columns of interest have been updated before performing any key validation actions. This will save IO.
Long short, you need to test it yourself on your table and your data as your mileage may vary. Investing an hour of project time upfront now will help you sleep at night once it goes live later.
Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here