Performance Impacts of Partitioned Aligned Indexes

[read this post on Mr. Fox SQL blog]

Continuing on with my Partitioning post series, this is part 2.

The partitioning includes several major components of work (and can be linked below);

This blog post deals with deciding whether to align non-clustered indexes to the table (or not) via Measuring Performance Impacts of Partitioned Aligned Indexes.

I will blog about the other steps later.

And so, lets get into the nitty gritty of the partitioning details!

To Align or Not Align – that is the question!

When you leverage partitioning you can apply it to a HEAP table or CLUSTERED (index) table, you don’t have to partition the non-clustered indexes along with it. When you don’t the NC indexes are termed as non-aligned, and (not surprisingly) when they are partitioned they are termed as aligned.

Big deal, I hear you think… well like everything in life there are Pros and Cons. There’s heaps on the web for this already – but a quick summary of the important bits.

	PRO	CON
Aligned NC Index	Generally faster query performance Allows partition switching Allows index optimisation to target a specific partition	Some types of queries are slower
Non-Aligned NC Index	Some types of queries are faster	Does not allow partition switching Entire index must be optimised

Putting aside Partition Switching and Optimisations for a later posts, lets focus on Query Performance.

Query Performance when Using NC Indexes

Lets see this in action. Lets create our play pen table, partition function/scheme and then put some data in it. (We’ll use this table throughout this post.)

Create Playpen Table, Partition Function/Scheme and Load Data

--Partition Function EXPANDED INT
CREATE PARTITION FUNCTION pf_myorders_expanded (int) AS RANGE RIGHT FOR VALUES(0, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000)
GO
-- Partition Scheme EXPANDED INT
CREATE PARTITION SCHEME ps_myorders_expanded AS PARTITION pf_myorders_expanded ALL TO ([PRIMARY])
GO

--Partitioned table
CREATE TABLE dbo.myorders
(
     myorder_id        INT NOT NULL
     , myorder_key     INT NOT NULL
     , myorder_date    DATETIME NOT NULL
     , myorder_amt     MONEY NOT NULL
     , myorder_details NVARCHAR(4000) DEFAULT '' NULL
     , CONSTRAINT pk_myorderid PRIMARY KEY CLUSTERED (myorder_id ASC)
       WITH (DATA_COMPRESSION = NONE)
       ON ps_myorders_expanded(myorder_id)
)
GO

--Load Data
begin
    INSERT INTO dbo.myorders
    SELECT
        TOP 85000
        ROW_NUMBER() OVER (ORDER BY o1.object_id)
        , CAST((ROW_NUMBER() OVER (ORDER BY o1.object_id) * RAND(1) * 2) AS INT)
        , DATEADD(hh, (ROW_NUMBER() OVER (ORDER BY o1.object_id)) / 5, DATEADD(dd, -730, '2015-05-09 00:43'))
        , RAND(ROW_NUMBER() OVER (ORDER BY o1.object_id)) * RAND(ROW_NUMBER() OVER (ORDER BY o2.object_id)) * 730
        , REPLICATE('X', RAND(o1.object_id) * 1000)
    FROM   master.sys.objects o1
           CROSS JOIN master.sys.objects o2
           CROSS JOIN master.sys.objects o3
    ORDER  BY 1
end
GO

Create Non-Aligned Non-Unique NC Index

Now lets create a non-aligned non-unique NC index on the partitioned table. Essentially its just a plain old boring vanilla flavor NC index.

CREATE NONCLUSTERED INDEX myorders_notunique_nonaligned
    ON dbo.myorders (myorder_date)
    ON [PRIMARY] -- Explit filegorup
GO

Some key points here;

If you don’t specify the filegroup name (ie [PRIMARY]) then by default SQL will assume you wanted a partitioned NC index and automatically align it to the same partition function as that of the underlying table. (I hope that’s what you wanted says SQL? Too bad if it takes 6 hours to create it before you notice!)
It does not know (or care) if the underlying table is partitioned or not.

Create Aligned Non-Unique NC Index

Now lets create an aligned non-unique NC index on the partitioned table. This is more special, it has been told to create the NC index on the partition function using myorder_id.

CREATE NONCLUSTERED INDEX myorders_notunique_aligned
    ON dbo.myorders (myorder_date)
    ON ps_myorders_expanded (myorder_id) -- On Partition Scheme
GO

Some key points here;

What is does under the hood is essentailly create an index b-tree on top of each partition defined by ps_myorders_expanded (myorder_id). In this case there are 11 partitions, so think of them as 11 small NC indexes operating independently. (there is a trick here!)

But hang on, above you said “If the NC index is not unique then ColumnA can either be part of the index key or within the INCLUDE clause” which its not so how did this index get created?

By default SQL will automatically add the partition key myorder_id to the NC index INCLUDE clause and thereby partition the index that way. This is not hard to fathom, this table has a clustered index and any NC indexes (like this one) always include the clustered keys regardless if its partitioned or not.

But hang on, above you said “When you leverage partitioning you can apply it to a HEAP table” and a HEAP is not a CLUSTERED so how did this index get created?

Well in this case the NC index will contain a pointer to the ROW ID in the HEAP where that row is kept. (As an aside – this is why when you create/drop a clustered index on a HEAP all NC indexes must be rebuilt as SQL must swap the cluster keys for ROW ID’s.)

So What Does Our Table Structure Look Like?

You can use this handy query to see the NC and clustered partition structures;

SELECT s.NAME AS 'schema'
       , o.NAME AS 'table'
       , CASE o.type
             WHEN 'v' THEN 'View'
             WHEN 'u' THEN 'Table'
             ELSE o.type
         END AS objecttype
       , i.NAME AS indexname
       , i.type_desc
       , p.data_compression_desc
       , ds.type_desc AS DataSpaceTypeDesc
       , p.partition_number
       , pf.NAME AS pf_name
       , ps.NAME AS ps_name
       , CASE
             WHEN partitionds.NAME IS NULL THEN ds.NAME
             ELSE partitionds.NAME
         END AS partition_fg
       , i.is_primary_key
       , i.is_unique
       , p.rows
FROM   sys.indexes i
       INNER JOIN sys.objects o
               ON o.object_id = i.object_id
       INNER JOIN sys.data_spaces ds
               ON DS.data_space_id = i.data_space_id
       LEFT JOIN sys.schemas s
              ON o.schema_id = s.schema_id
       LEFT JOIN sys.partitions p
              ON i.index_id  = p.index_id
             AND i.object_id = p.object_id
       LEFT JOIN sys.destination_data_spaces dds
              ON i.data_space_id    = dds.partition_scheme_id
             AND p.partition_number = dds.destination_id
       LEFT JOIN sys.data_spaces partitionds
              ON dds.data_space_id = partitionds.data_space_id
       LEFT JOIN sys.partition_schemes AS ps
              ON dds.partition_scheme_id = ps.data_space_id
       LEFT JOIN sys.partition_functions AS pf
              ON ps.function_id = pf.function_id
WHERE  o.NAME = 'myorders'
ORDER  BY s.NAME
          , o.NAME
          , i.NAME
          , p.partition_number
GO

Now to compare these NC puppies…!

Measuring NC Index Performance – Test #1

If I run the query below then you can see a noticeable but small difference. FYI – I am just forcing the index (WITH INDEX = *) for the sake of the comparison…

SET statistics IO on
GO
-- FORCE NON-ALIGNED
SELECT    COUNT(*) as Records,
        MAX(myorder_id) as PrimaryOrder,
        SUM(myorder_amt) as OrderAmount
FROM    dbo.myorders WITH (INDEX = myorders_notunique_nonaligned)
WHERE    myorder_date = '2013-05-17 08:43:00.000'
GO
-- FORCE ALIGNED
SELECT    COUNT(*) as Records,
        MAX(myorder_id) as PrimaryOrder,
        SUM(myorder_amt) as OrderAmount
FROM dbo.myorders WITH (INDEX = myorders_notunique_aligned)
WHERE    myorder_date = '2013-05-17 08:43:00.000'
GO

The Non-Aligned seek is 2x more efficient than the Aligned index.

Table 'myorders'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 Table 'myorders'. Scan count 12, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

…but Gad, Why is it So?

The reason is the aligned NC index is partitioned on the column myorder_id but the WHERE clause is searching on myorder_date.
So despite the column being the leading key segment SQL must scan all 11 partitioned aligned NC index b-trees to find which of those partitions has that search term, whereas the non-aligned index can be searched directly and efficiently.
If we had 15000 partitions it must scan all 15000. (I think you can see where its going.)

Measuring NC Index Performance – Test #2

The above is a targeted seek so its small potatoes, we can easily create a query where the Non-Aligned index is more than 12,000 times more efficient than the Aligned index!

Lets just do a few minor changes to our query. We’ll add a TOP statement, modify the WHERE clause operator and add an ORDER BY (on the NC index key).

SET statistics IO on
GO
-- FORCE NON-ALIGNED
SELECT    TOP 5 *
FROM    dbo.myorders WITH (INDEX = myorders_notunique_nonaligned)
WHERE    myorder_date > '2013-05-17 08:43:00.000'
ORDER BY myorder_date, myorder_id
GO
-- FORCE ALIGNED
SELECT    TOP 5 *
FROM dbo.myorders WITH (INDEX = myorders_notunique_aligned)
WHERE    myorder_date > '2013-05-17 08:43:00.000'
ORDER BY myorder_date, myorder_id
GO

The Non-Aligned index is 12,260 x more efficient than the Aligned index.

Table 'myorders'. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'myorders'. Scan count 12, logical reads 257476, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

…but Gad, Why is it So (Again)?

Ok the example is a bit contrived as I am forcing the index, but even if you drop the non-aligned NC index myorders_notunique_nonaligned the SQL optimiser will choose to scan the table and wont use aligned NC index. The difference is still a noticeable 813x slower!
The reason is the aligned NC index is partitioned on the column myorder_id but then sorted within the partition by myorder_date. The WHERE clause is searching on myorder_date and then ordering by myorder_date and asking for a TOP 5.
So SQL must go into each partition and find the rows that meet the criteria, then bring them back into a work table, sort them and then bring back the top 5.

What Can I do to Improve Performance?

Broadly, the only consistent way to improve performance for aligned NC indexes is to help the SQL optimiser by adding the partition key (myorder_id) as part of the WHERE clause, however this is not practical and also may not help (or only partially help) in all cases.

SELECT    COUNT(*) as Records,
        MAX(myorder_id) as PrimaryOrder,
        SUM(myorder_amt) as OrderAmount
FROM    dbo.myorders
WHERE    myorder_date = '2013-05-17 08:43:00.000'
AND myorder_id BETWEEN 0 AND 2000
GO

And So, What are the Decision Points to Align or Not?

If you are doing many singleton pointed queries on your NC indexes (which is typically what they are for) and you don’t care are about partition switching, then you will find non-aligned NC indexes better than aligned NC indexes.

If you plan to deploy hundreds-thousands of partitions then the performance benefit of non-aligned NC indexes better than aligned NC indexes.

If you execute many TOP n statements then the performance benefit of non-aligned NC indexes is better than aligned NC indexes. (or just retain a single non-aligned NC index to support that specific TOP statement – at the cost of partition switching)

Long short, you need to test it yourself on your table and your data as your mileage may vary.

Investing an hour of project time upfront now will help you sleep at night once it goes live later.

Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here

Mr. Fox SQL

Outfoxing cloud data challenges by being rather damn sneaky…