Implementing Partition Aware Index Optimisation Procedures

[read this post on Mr. Fox SQL blog]

Continuing on with my Partitioning post series, this is part 6.

The partitioning includes several major components of work (and can be linked below);

This blog post deals with implementing partition aware index optimisation procedures.

And so, lets get into the nitty gritty of the partitioning details!

Why are we doing this at all?

So firstly why might you want to target an index rebuild to a specific partition? Well when you perform an index rebuild its a time, data/log IO and (depending if ONLINE option used) tempdb intensive operation that you want to minimise wherever you can.

Table and Index Partitions allow you to segregate the table into logical/physical collections of data based on a key column and as data is loaded the key will determine in which partition that row is placed. Just like a non-partitioned index as data changes over time fragmentation can be created in the partition in which the updated rows existed.

So the cool thing here – if 80% of your partitioned table is NOT fragmented then why rebuild all perfectly good healthly unfragmented partitions just to optimise the 20% of those that ARE the fragmented baddies?

Enter Partitioned Based Reindexing!

So what are my options?

Essentially the options you have to rebuild a partitioned index are exactly the same as those that are available for non-partitioned indexes, the only additional SQL option is PARTITION = X (where X = the partition number you want to rebuild)

So that means you can issue normal index rebuild options like;

SORT_IN_TEMPDB
DATA_COMPRESSION
MAXDOP
ALL
ONLINE *

(*) What’s interesting is that prior to SQL 2014 online reindexing of partitions was not supported in SQL 2012! You could reindex a partition offline but that of course affects your overall database availability, so thankfully this was a warmly welcomed quiet addition in SQL 2014!

You will also note that PARTITION = X only allows a single value, so you must issue a separate SQL command for each partition you want to rebuild. I personally think it would have been nice to be able to do this PARTITION = (X, Y, Z) but that’s just me efficiently saving key strokes.

Lets Create a Playpen Partitioned Table with Data

Lets create our playpen partitioned table, put in some sample data and then build a fresh clustered and non-clustered index on the table.

--Partition Function &amp;amp;amp;amp;amp;amp;amp;amp;amp; Scheme 
CREATE PARTITION FUNCTION pf_myorders (int) AS RANGE RIGHT FOR VALUES(0, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000)
GO
CREATE PARTITION SCHEME ps_myorders AS PARTITION pf_myorders ALL TO ([PRIMARY])
GO

--Create Table
CREATE TABLE dbo.myorders
(
myorder_id INT NOT NULL
, myorder_key INT NOT NULL
, myorder_date DATETIME NOT NULL
, myorder_amt MONEY NOT NULL
, myorder_details VARCHAR(4000) DEFAULT '' NULL
, myorder_short_details as left(myorder_details, 120)
)
GO

--Load Data
begin
INSERT INTO dbo.myorders
SELECT
TOP 42500
(ROW_NUMBER() OVER (ORDER BY o1.object_id) - 1) * 2
, CAST((ROW_NUMBER() OVER (ORDER BY o3.object_id) * RAND(1) * 2) AS INT)
, DATEADD(hh, (ROW_NUMBER() OVER (ORDER BY o1.object_id)) / 3, '2013-01-01')
, RAND(ROW_NUMBER() OVER (ORDER BY o3.object_id)) * RAND(ROW_NUMBER() OVER (ORDER BY o3.object_id)) * 730
, REPLICATE('X', RAND(o1.object_id) * 1000)
FROM master.sys.objects o1
CROSS JOIN master.sys.objects o2
CROSS JOIN master.sys.objects o3
ORDER BY 1
end
GO

--Create Partition aligned C index on Table
ALTER TABLE dbo.myorders
ADD CONSTRAINT pk_myorderid PRIMARY KEY CLUSTERED 
(
myorder_id ASC
)
WITH
(
SORT_IN_TEMPDB = ON, 
ONLINE = OFF, 
DATA_COMPRESSION = NONE,
MAXDOP = 0,
FILLFACTOR = 100
)
ON [ps_myorders](myorder_id)
GO

--Create Partition aligned NC index
CREATE UNIQUE NONCLUSTERED INDEX uk_myorderid
ON dbo.myorders
(
myorder_key,
myorder_id,
myorder_short_details
)
WITH
(
SORT_IN_TEMPDB = ON, 
ONLINE = OFF, 
DATA_COMPRESSION = NONE,
MAXDOP = 0,
FILLFACTOR = 100
)
ON [ps_myorders](myorder_id)
GO

Now Lets Create Some Nasty Fragmentation!

Lets put a logical spanner in the works and create fragmentation in partitions 7 and 8.

--Create Fragmentation
UPDATE myo
SET myorder_id = myorder_id - 1
FROM dbo.myorders myo
WHERE myorder_date BETWEEN '2014-01-01 00:00:00' and '2014-03-31 23:59:59'
GO

UPDATE myo
SET myorder_key = myorder_key - 1
FROM dbo.myorders myo
WHERE myorder_date BETWEEN '2014-01-01 00:00:00' and '2014-03-31 23:59:59'
GO

What Does the Partition Fragmentation Look Like?

You can check the level of fragmentation at the partition level by running this handy query below.

SELECT s.NAME AS 'schema'
 ,o.NAME AS 'table'
 ,CASE o.type
 WHEN 'v' THEN 'View'
 WHEN 'u' THEN 'Table'
 ELSE o.type
 END AS objecttype
 ,i.NAME AS indexname
 ,i.type_desc
 ,p.partition_number
 ,dmv.Page_Count
 ,p.rows,
 cast(dmv.Avg_Fragmentation_In_Percent as numeric(5, 2)) as AvgFragPct
FROM sys.indexes i
 INNER JOIN sys.objects o
 ON o.object_id = i.object_id
 INNER JOIN sys.schemas s
 ON o.schema_id = s.schema_id
 INNER JOIN sys.partitions p
 ON i.index_id = p.index_id
 AND i.object_id = p.object_id
 INNER JOIN sys.DM_DB_INDEX_PHYSICAL_STATS(DB_ID(), NULL, NULL, NULL, N'LIMITED') AS dmv
 ON dmv.OBJECT_ID = i.object_id
 AND dmv.index_id = i.index_id
 AND dmv.partition_number = p.partition_number
WHERE o.NAME = 'myorders'
ORDER BY s.NAME
 ,o.NAME
 ,i.NAME
 ,p.partition_number

Lets Do Some Index Rebuild Tests!

So lets just revisit the standard vanilla flavor full table index rebuild. Even though the table is partitioned the rebuild command without PARTITION = X works. In this case it simply rebuilds all 12 partitions on the table. Work wise this is equivalent of PARTITION = ALL.

--REBUILD NORMALLY
ALTER INDEX uk_myorderid ON dbo.myorders rebuild
GO
ALTER INDEX pk_myorderid ON dbo.myorders rebuild 
GO

Table 'myorders'. Scan count 12, logical reads 774.
Table 'myorders'. Scan count 12, logical reads 4268.

So what about if I just did partition 8?

--REBUILD SINGLE PARTITION
ALTER INDEX uk_myorderid ON dbo.myorders rebuild partition = 8
GO
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 8
GO

Table 'myorders'. Scan count 1, logical reads 91.
Table 'myorders'. Scan count 1, logical reads 513.

So this represents about 12% of work as compared to the full table reindex. Now imagine your table had 12 billion rows and 196 partitions, suddenly this is pretty economical!

NOTE – if the host machine has multiple cores and SQL is by default allowed to parallelise the rebuild operation then you may see more scans than 1, equal to the query MAXDOP

What’s faster – rebuild every partition, or the table?

So you may think that the work/time taken for the FULL rebuild is broadly inline with the work/time if you did each partition individually?

ALTER INDEX pk_myorderid ON dbo.myorders rebuild WITH (MAXDOP = 1)
GO
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 1 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 2 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 3 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 4 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 5 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 6 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 7 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 8 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 9 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 10 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 11 WITH (MAXDOP = 1)
ALTER INDEX pk_myorderid ON dbo.myorders rebuild partition = 12 WITH (MAXDOP = 1)
GO

The MAXDOP 1 logical reads and time for the two operations is as follows…

FULL REBUILD
- Logical Reads = 4271
- Time ms= 313
EACH INDIVIDUAL PARTITION
- Logical Reads = 4269
- Time ms = 371

So the work is about the same, however it took 16% longer to perform the same reindex operations doing each partition individually. This is hardly surprising considering SQL will need to setup query plans for every partition and execute those plans in serial.

So there you go!

AND of course, as I always say, please test this yourself as your results may vary!

Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here

Mr. Fox SQL

Outfoxing cloud data challenges by being rather damn sneaky…