Performance Impacts of Partitioning DML Triggers

[read this post on Mr. Fox SQL blog]

Continuing on with my Partitioning post series, this is part 3.

The partitioning includes several major components of work (and can be linked below);

This blog post deals with partition aligning non-clustered indexes (unique and non-unique) and then measuring performance impacts of DML triggers for enforcing partitioned unique indexes.

I will blog about the other steps later.

And so, lets get into the nitty gritty of the partitioning details!

What Are My Alignment Options?

Ok, I assume you have got this far because you now want to align your NC indexes to your table partition scheme. Fine choice!

From a partitioning decision point – NC indexes only come in 2 flavors;

Unique Indexes (and Primary Keys)
Non-Unique Indexes

Lets talk the easier non-unique indexes first (before it gets more scary with unique indexes!).

There are rules around how you can align NC indexes. If you partition your table on say ColumnA then to align your NC indexes there are 2 options;

If the NC index is unique then ColumnA MUST be part of the index key. It can be anywhere in the key, even the last column, but it must be there.

Hang on, wont this change my unique key!? AND for my Primary Key’s wont this impact my Foreign Key(s)?

Yes and Yes. Sorry about that, this is a nasty side effect which will be discussed later.

If the NC index is not unique then ColumnA can either be part of the index key or within the INCLUDE clause.

Meh, so my index is larger, and that’s generally no big deal.

Create Playpen Table, Partition Function/Scheme and Load Data

Lets create our partitioned play pen table, partition function/scheme and then put some data in it. (We’ll use this table throughout this post.)

--Partition Function EXPANDED INT
CREATE PARTITION FUNCTION pf_myorders_expanded (int) AS RANGE RIGHT FOR VALUES(0, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000)
GO
-- Partition Scheme EXPANDED INT
CREATE PARTITION SCHEME ps_myorders_expanded AS PARTITION pf_myorders_expanded ALL TO ([PRIMARY])
GO

--Partitioned table
CREATE TABLE dbo.myorders
(
     myorder_id        INT NOT NULL
     , myorder_key     INT NOT NULL
     , myorder_date    DATETIME NOT NULL
     , myorder_amt     MONEY NOT NULL
     , myorder_details NVARCHAR(4000) DEFAULT '' NULL
     , CONSTRAINT pk_myorderid PRIMARY KEY CLUSTERED (myorder_id ASC)
       WITH (DATA_COMPRESSION = NONE)
       ON ps_myorders_expanded(myorder_id)
)
GO

--Load Data
begin
    INSERT INTO dbo.myorders
    SELECT
        TOP 85000
        ROW_NUMBER() OVER (ORDER BY o1.object_id)
        , CAST((ROW_NUMBER() OVER (ORDER BY o1.object_id) * RAND(1) * 2) AS INT)
        , DATEADD(hh, (ROW_NUMBER() OVER (ORDER BY o1.object_id)) / 5, DATEADD(dd, -730, '2015-05-09 00:43'))
        , RAND(ROW_NUMBER() OVER (ORDER BY o1.object_id)) * RAND(ROW_NUMBER() OVER (ORDER BY o2.object_id)) * 730
        , REPLICATE('X', RAND(o1.object_id) * 1000)
    FROM   master.sys.objects o1
           CROSS JOIN master.sys.objects o2
           CROSS JOIN master.sys.objects o3
    ORDER  BY 1
end
GO

Partition Aligning Non-Unique Indexes

Lets quickly cover this one off and create an aligned non-unique NC index on the partitioned table. The create statement for the NC index specifies the partition function using myorder_id.

CREATE NONCLUSTERED INDEX myorders_notunique_aligned
    ON dbo.myorders (myorder_date)
    ON ps_myorders_expanded (myorder_id) -- On Partition Scheme
GO

Some key points here;

What it does under the hood is essentially create an index b-tree on top of each partition defined by ps_myorders_expanded (myorder_id). In this case there are 11 partitions, so think of them as 11 small NC indexes operating independently. (see previous post on the performance implications of this)
If the non-unique index create does not specify the partition column, then by default SQL will automatically add the partition key myorder_id to the NC index INCLUDE clause and thereby partition the index that way.
If you have a fast tempdb, then I suggest adding WITH (SORT_IN_TEMPDB = ON) to all NC index creates as this will speed up the process and prevent unnecessary expansion of the database data files during the build.
And that’s it people – nothing else special here!

Partition Aligning Unique Indexes

There are gottchas to think about before doing this;

The partitioning column MUST be part of the index key. It can be anywhere in the key, even the last column, but it must be there.
You cannot put the partitioning column in the INCLUDE statement.
If you try to create the index without the partitioning column in the key you will get an error like this;

Msg 1908, Level 16, State 1, Line 146
Column ‘myorder_id’ is partitioning column of the index ‘myorders_unique_aligned’. Partition columns for a unique index must be a subset of the index key.

If your index was already unique and the partitioning column wasn’t in the key, then by adding the column the index is no longer unique.
If this was a Primary Key then this can have dramatic impacts to your Foreign Keys and all related tables. (see the Summary for a discussion on this)
If this is so, you need to look at options to enforce uniqueness, such as a DML Trigger to capture INSERT and UPDATE. And this is where the fun starts.

Now lets create an aligned unique NC index on the partitioned table.

CREATE UNIQUE NONCLUSTERED INDEX myorders_unique_aligned
    ON dbo.myorders (myorder_key, myorder_id)
    ON ps_myorders_expanded(myorder_id) -- On Partition Scheme
GO

Enforcing Uniqueness via a DML Trigger

There are several ways to ensure that the NC index above remains unique on myorder_key but probably the safest is using an Instead Of Insert DML Trigger.

The things to consider before doing this;

The trigger must work for both singleton and batch INSERT.
The trigger must work for both singleton and batch UPDATE.
The trigger must check if the value for myorder_key already exists
Ensure there is a supporting index on the search column myorder_key (however this should be a gimmie as that’s the index you wanted to create anyway, right?!)
Not surprisingly there is a performance overhead incurred to every insert…

Lets create the DML Trigger and then test the performance implications. NOTE that this is only the INSERT trigger.

CREATE TRIGGER dbo.IOI_myorders_unique_aligned
ON dbo.myorders
instead OF INSERT
AS
BEGIN
    SET nocount ON

    IF EXISTS
    (
        SELECT TOP 1 1
        FROM   dbo.myorders myo
               INNER JOIN inserted i
                       ON myo.[myorder_key] = i.[myorder_key]
    )
    BEGIN
        RAISERROR ('ERROR: A duplicate key has been found in [dbo].[myorders] for [myorder_key]. The insert has been rolled back.',16,1)
        ROLLBACK TRANSACTION
        RETURN
    END
    ELSE
    BEGIN
        INSERT INTO dbo.myorders
        SELECT *
        FROM   inserted
    END
END
GO

Instead of Insert DML Trigger Performance – Singleton

Lets see the impacts of a singleton insert.

SET STATISTICS IO ON
GO
INSERT    INTO [dbo].[myorders]
SELECT    100000, 100000, GETDATE(), 100, 'My New Order'
GO

Table 'myorders'. Scan count 12, logical reads 18, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 Table 'myorders'. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Singleton Insert Assessment;

Capturing the insert via the DML trigger, checking for key duplicates in myorder_key and the re-inserting the single row is 60% of the total query cost. Inserting the row is only 40%.
The logical reads needed to check if it is OK to perform the insert is 18, and the logical reads to actually do the insert is 8. So it costs 55% more to operate the DML Trigger on a singleton insert.

Instead of Insert DML Trigger Performance – Batch

Lets see the impacts of a batch insert with 1,000 rows.

-- Create a Batch of 1000 Rows
SELECT    top 1000
        (ROW_NUMBER() OVER (ORDER BY o1.object_id)) + 200000 AS myorder_id,
        (ROW_NUMBER() OVER (ORDER BY o1.object_id)) + 200000 AS myorder_key,
        GETDATE() AS myorder_date,
        100 AS myorder_amt,
        NULL as myorder_desc
INTO    #myorders_batch
FROM    sys.objects o1 CROSS JOIN sys.objects o2
GO
-- Load the Batch into the Table
INSERT    INTO [dbo].[myorders]
SELECT    *
FROM    #myorders_batch
GO

Table 'myorders'. Scan count 12000, logical reads 20000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 Table 'myorders'. Scan count 0, logical reads 6317, physical reads 0, read-ahead reads 14, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Batch Insert Assessment;

Capturing the insert via the DML trigger, checking for key duplicates in myorder_key and the re-inserting of the 1,000 rows is now a whopping 91% of the total query cost. Inserting the row is now only a mere 9%.
The logical reads needed to check if it is OK to perform the insert is 20,000, and the logical reads to actually do the insert is 6,317. So now it costs 68% more to operate the DML trigger on a batch insert.
The cost to execute the DML Trigger seems to increase faster than a linear rate. At 10,000 rows the cost for the DML Trigger check is 99% of the batch and over 50,000 logical reads.

And So in Summary…

If you want to align a unique NC index (or Primary Key) to your partitioned table then the DML Trigger is the safest option as it captures inserts regardless of how they gets to the table (ie. an application, ETL process, adhoc insert, etc).

The DML Trigger will add a overhead (sometimes significant) to your inserts.

If you have a Primary Key and need to add the new partition column then it can have dramatic knock on effects; think about all the related tables, their Primary Keys and the Foreign Keys, etc. If you have a trusted application that you control (ie like a DW/ETL solution) then you could consider removing the FK’s on the affected tables altogether. Still, not great.

Lastly, this DML Trigger only captures INSERTs, if you anticipate UPDATEs could change the key then you need an INSTEAD OF UPDATE trigger. In that trigger you should check using the UPDATE (ColumnA) clause if the unique columns of interest have been updated before performing any key validation actions. This will save IO.

Long short, you need to test it yourself on your table and your data as your mileage may vary.

Investing an hour of project time upfront now will help you sleep at night once it goes live later.

Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here

Pingback: Rebuild a Standard Table to a Partitioned Table – Mr. Fox SQL
Pingback: Performance Impacts of Partitioned Aligned Indexes – Mr. Fox SQL
Pingback: Rebuilding Existing Partitioned Tables to a New Partition Scheme – Mr. Fox SQL
Pingback: Implementing Partial Backups and Restores (via Leveraging Partitions) – Mr. Fox SQL
Thomas Franz 7 July 2015 / 6:25 PM

Hint: if you have an IDENTITY column (or sequence) as PK you should consider to partition by this ID instead of e.g. the OrderDate.

You can create a daily / weekly / monthly job that assignes a new filegroup to the partition scheme and splits the partition function by the Max(ID) to this time.

Downside:
– does not work for UNID’s
– you’ll need a separate partition function / schema for each table
– if you insert rows not sorted by the OrderDate (entering an order retrospectively) this order will reside in the “wrong” partition

LikeLike

- Mr. Fox SQL (Rolf Tesmer) 7 July 2015 / 10:39 PM
  
  Thanks Thomas, yes that is an interesting approach and yep if the PK is the partitioning key it would get around the issue with having to use Triggers to enforce it. Though you would have to have pretty close control of your database/application and create the next partition at the exactly right time.
  
  LikeLike
  
Pingback: Implementing Partition Aware Index Optimisation Procedures – Mr. Fox SQL
Pingback: Calculating Table Partition Sizes in Advance – Mr. Fox SQL
Pingback: SQL Saturday Melbourne (#464) – 20 Feb 2016 – Mr. Fox SQL

Mr. Fox SQL

Outfoxing cloud data challenges by being rather damn sneaky…