Continuing on with my Partitioning post series, this is part 2.
The partitioning includes several major components of work (and can be linked below);
- partitioning large existing non-partitioned tables
- measuring performance impacts of partitioned aligned indexes
- measuring performance impacts of DML triggers for enforcing partitioned unique indexes
- rebuilding tables that are already partitioned (ie. apply a new partitioning scheme)
- implementing partial backups and restores (via leveraging partitions)
- implementing partition aware index optimisation procedures
- Calculating table partition sizes in advance
This blog post deals with deciding whether to align non-clustered indexes to the table (or not) via Measuring Performance Impacts of Partitioned Aligned Indexes.
I will blog about the other steps later.
And so, lets get into the nitty gritty of the partitioning details!
To Align or Not Align – that is the question!
When you leverage partitioning you can apply it to a HEAP table or CLUSTERED (index) table, you don’t have to partition the non-clustered indexes along with it. When you don’t the NC indexes are termed as non-aligned, and (not surprisingly) when they are partitioned they are termed as aligned.
Big deal, I hear you think… well like everything in life there are Pros and Cons. There’s heaps on the web for this already – but a quick summary of the important bits.
|Aligned NC Index||
|Non-Aligned NC Index||
Putting aside Partition Switching and Optimisations for a later posts, lets focus on Query Performance.
Query Performance when Using NC Indexes
Lets see this in action. Lets create our play pen table, partition function/scheme and then put some data in it. (We’ll use this table throughout this post.)
Create Playpen Table, Partition Function/Scheme and Load Data
--Partition Function EXPANDED INT CREATE PARTITION FUNCTION pf_myorders_expanded (int) AS RANGE RIGHT FOR VALUES(0, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000) GO -- Partition Scheme EXPANDED INT CREATE PARTITION SCHEME ps_myorders_expanded AS PARTITION pf_myorders_expanded ALL TO ([PRIMARY]) GO --Partitioned table CREATE TABLE dbo.myorders ( myorder_id INT NOT NULL , myorder_key INT NOT NULL , myorder_date DATETIME NOT NULL , myorder_amt MONEY NOT NULL , myorder_details NVARCHAR(4000) DEFAULT '' NULL , CONSTRAINT pk_myorderid PRIMARY KEY CLUSTERED (myorder_id ASC) WITH (DATA_COMPRESSION = NONE) ON ps_myorders_expanded(myorder_id) ) GO --Load Data begin INSERT INTO dbo.myorders SELECT TOP 85000 ROW_NUMBER() OVER (ORDER BY o1.object_id) , CAST((ROW_NUMBER() OVER (ORDER BY o1.object_id) * RAND(1) * 2) AS INT) , DATEADD(hh, (ROW_NUMBER() OVER (ORDER BY o1.object_id)) / 5, DATEADD(dd, -730, '2015-05-09 00:43')) , RAND(ROW_NUMBER() OVER (ORDER BY o1.object_id)) * RAND(ROW_NUMBER() OVER (ORDER BY o2.object_id)) * 730 , REPLICATE('X', RAND(o1.object_id) * 1000) FROM master.sys.objects o1 CROSS JOIN master.sys.objects o2 CROSS JOIN master.sys.objects o3 ORDER BY 1 end GO
Create Non-Aligned Non-Unique NC Index
Now lets create a non-aligned non-unique NC index on the partitioned table. Essentially its just a plain old boring vanilla flavor NC index.
CREATE NONCLUSTERED INDEX myorders_notunique_nonaligned ON dbo.myorders (myorder_date) ON [PRIMARY] -- Explit filegorup GO
Some key points here;
- If you don’t specify the filegroup name (ie [PRIMARY]) then by default SQL will assume you wanted a partitioned NC index and automatically align it to the same partition function as that of the underlying table. (I hope that’s what you wanted says SQL? Too bad if it takes 6 hours to create it before you notice!)
- It does not know (or care) if the underlying table is partitioned or not.
Create Aligned Non-Unique NC Index
Now lets create an aligned non-unique NC index on the partitioned table. This is more special, it has been told to create the NC index on the partition function using myorder_id.
CREATE NONCLUSTERED INDEX myorders_notunique_aligned ON dbo.myorders (myorder_date) ON ps_myorders_expanded (myorder_id) -- On Partition Scheme GO
Some key points here;
- What is does under the hood is essentailly create an index b-tree on top of each partition defined by ps_myorders_expanded (myorder_id). In this case there are 11 partitions, so think of them as 11 small NC indexes operating independently. (there is a trick here!)
But hang on, above you said “If the NC index is not unique then ColumnA can either be part of the index key or within the INCLUDE clause” which its not so how did this index get created?
- By default SQL will automatically add the partition key myorder_id to the NC index INCLUDE clause and thereby partition the index that way. This is not hard to fathom, this table has a clustered index and any NC indexes (like this one) always include the clustered keys regardless if its partitioned or not.
But hang on, above you said “When you leverage partitioning you can apply it to a HEAP table” and a HEAP is not a CLUSTERED so how did this index get created?
- Well in this case the NC index will contain a pointer to the ROW ID in the HEAP where that row is kept. (As an aside – this is why when you create/drop a clustered index on a HEAP all NC indexes must be rebuilt as SQL must swap the cluster keys for ROW ID’s.)
So What Does Our Table Structure Look Like?
You can use this handy query to see the NC and clustered partition structures;
SELECT s.NAME AS 'schema' , o.NAME AS 'table' , CASE o.type WHEN 'v' THEN 'View' WHEN 'u' THEN 'Table' ELSE o.type END AS objecttype , i.NAME AS indexname , i.type_desc , p.data_compression_desc , ds.type_desc AS DataSpaceTypeDesc , p.partition_number , pf.NAME AS pf_name , ps.NAME AS ps_name , CASE WHEN partitionds.NAME IS NULL THEN ds.NAME ELSE partitionds.NAME END AS partition_fg , i.is_primary_key , i.is_unique , p.rows FROM sys.indexes i INNER JOIN sys.objects o ON o.object_id = i.object_id INNER JOIN sys.data_spaces ds ON DS.data_space_id = i.data_space_id LEFT JOIN sys.schemas s ON o.schema_id = s.schema_id LEFT JOIN sys.partitions p ON i.index_id = p.index_id AND i.object_id = p.object_id LEFT JOIN sys.destination_data_spaces dds ON i.data_space_id = dds.partition_scheme_id AND p.partition_number = dds.destination_id LEFT JOIN sys.data_spaces partitionds ON dds.data_space_id = partitionds.data_space_id LEFT JOIN sys.partition_schemes AS ps ON dds.partition_scheme_id = ps.data_space_id LEFT JOIN sys.partition_functions AS pf ON ps.function_id = pf.function_id WHERE o.NAME = 'myorders' ORDER BY s.NAME , o.NAME , i.NAME , p.partition_number GO
Now to compare these NC puppies…!
Measuring NC Index Performance – Test #1
If I run the query below then you can see a noticeable but small difference. FYI – I am just forcing the index (WITH INDEX = *) for the sake of the comparison…
SET statistics IO on GO -- FORCE NON-ALIGNED SELECT COUNT(*) as Records, MAX(myorder_id) as PrimaryOrder, SUM(myorder_amt) as OrderAmount FROM dbo.myorders WITH (INDEX = myorders_notunique_nonaligned) WHERE myorder_date = '2013-05-17 08:43:00.000' GO -- FORCE ALIGNED SELECT COUNT(*) as Records, MAX(myorder_id) as PrimaryOrder, SUM(myorder_amt) as OrderAmount FROM dbo.myorders WITH (INDEX = myorders_notunique_aligned) WHERE myorder_date = '2013-05-17 08:43:00.000' GO
The Non-Aligned seek is 2x more efficient than the Aligned index.
Table 'myorders'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'myorders'. Scan count 12, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
…but Gad, Why is it So?
- The reason is the aligned NC index is partitioned on the column myorder_id but the WHERE clause is searching on myorder_date.
- So despite the column being the leading key segment SQL must scan all 11 partitioned aligned NC index b-trees to find which of those partitions has that search term, whereas the non-aligned index can be searched directly and efficiently.
- If we had 15000 partitions it must scan all 15000. (I think you can see where its going.)
Measuring NC Index Performance – Test #2
The above is a targeted seek so its small potatoes, we can easily create a query where the Non-Aligned index is more than 12,000 times more efficient than the Aligned index!
Lets just do a few minor changes to our query. We’ll add a TOP statement, modify the WHERE clause operator and add an ORDER BY (on the NC index key).
SET statistics IO on GO -- FORCE NON-ALIGNED SELECT TOP 5 * FROM dbo.myorders WITH (INDEX = myorders_notunique_nonaligned) WHERE myorder_date > '2013-05-17 08:43:00.000' ORDER BY myorder_date, myorder_id GO -- FORCE ALIGNED SELECT TOP 5 * FROM dbo.myorders WITH (INDEX = myorders_notunique_aligned) WHERE myorder_date > '2013-05-17 08:43:00.000' ORDER BY myorder_date, myorder_id GO
The Non-Aligned index is 12,260 x more efficient than the Aligned index.
Table 'myorders'. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'myorders'. Scan count 12, logical reads 257476, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
…but Gad, Why is it So (Again)?
- Ok the example is a bit contrived as I am forcing the index, but even if you drop the non-aligned NC index myorders_notunique_nonaligned the SQL optimiser will choose to scan the table and wont use aligned NC index. The difference is still a noticeable 813x slower!
- The reason is the aligned NC index is partitioned on the column myorder_id but then sorted within the partition by myorder_date. The WHERE clause is searching on myorder_date and then ordering by myorder_date and asking for a TOP 5.
- So SQL must go into each partition and find the rows that meet the criteria, then bring them back into a work table, sort them and then bring back the top 5.
What Can I do to Improve Performance?
Broadly, the only consistent way to improve performance for aligned NC indexes is to help the SQL optimiser by adding the partition key (myorder_id) as part of the WHERE clause, however this is not practical and also may not help (or only partially help) in all cases.
SELECT COUNT(*) as Records, MAX(myorder_id) as PrimaryOrder, SUM(myorder_amt) as OrderAmount FROM dbo.myorders WHERE myorder_date = '2013-05-17 08:43:00.000' AND myorder_id BETWEEN 0 AND 2000 GO
And So, What are the Decision Points to Align or Not?
If you are doing many singleton pointed queries on your NC indexes (which is typically what they are for) and you don’t care are about partition switching, then you will find non-aligned NC indexes better than aligned NC indexes.
If you plan to deploy hundreds-thousands of partitions then the performance benefit of non-aligned NC indexes better than aligned NC indexes.
If you execute many TOP n statements then the performance benefit of non-aligned NC indexes is better than aligned NC indexes. (or just retain a single non-aligned NC index to support that specific TOP statement – at the cost of partition switching)
Long short, you need to test it yourself on your table and your data as your mileage may vary.
Investing an hour of project time upfront now will help you sleep at night once it goes live later.
Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here