Azure SQL IaaS – TempDB Performance on SSD vs Premium Disk

When it comes to deploying SQL Server onto an Azure IaaS VM, it is sometimes difficult to know the best option to deploy the SQL Server tempdb database for your workload.

In many of the SQL templates on the marketplace it is often deployed to the C:\ by default after which you should redeploy the database to either D:\ (local SSD) or to an attached premium disk (P10, P20, P30).  The Microsoft SQL on IaaS Performance Best Practice article states both are possibilities under certain circumstances, however it does not provide empirical workload evidence as which to use when.

For those who have not seen the article – read here – https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-sql-performance/

The key comment of interest is this…

For D-series, Dv2-series, and G-series VMs, the temporary drive on these VMs is SSD-based. If your workload makes heavy use of TempDB (e.g. for temporary objects or complex joins), storing TempDB on the D drive could result in higher TempDB throughput and lower TempDB latency.

…and this…

So I thought lets test a OLTP type SQL workload!

AND SO – lets do some testing to validate this puppy!

Continue reading

Compression Performance with Low Key Selectivity

When it comes to enabling data compression in SQL Server I suspect most people don’t do a lot of testing as to the impacts of either enabling compression and its ongoing maintenance.  I assume most are interested in its ongoing performance for queries, and you know that’s damn fair enough too!

If you read the Microsoft Data Compression Whitepaper (and I mean whos got time to do that!) then it has some interesting technical tidbits burried deep in there which affect the impacts of applying and managing data compression.

https://technet.microsoft.com/en-us/library/dd894051(v=sql.100).aspx

Here are some of the ones of interest…

SQL Server uses statistics on the leading column to distribute work amongst multiple CPUs, thus multiple CPUs are not beneficial when creating, rebuilding, or compressing an index where the leading column of the index has relatively few unique values or when the data is heavily skewed to just a small number of leading key values – only limited effective parallelism will be achieved in this case.


And this…

Compressing or rebuilding a heap with ONLINE set to ON uses a single CPU for compression or rebuild. However, SQL Server first needs to scan the table—the scan is parallelized, and after the table scan is complete, the rest of the compression processing of the heap is single-threaded.


And this…

When a heap is compressed, if there are any nonclustered indexes on the heap, they are rebuilt as follows:
(a) With ONLINE set to OFF, the nonclustered indexes are rebuilt one by one.
(b) With ONLINE set to ON, all the nonclustered indexes are rebuilt simultaneously.
You must account for the workspace required to rebuild the nonclustered indexes, because the space for the uncompressed heap is not released until the rebuild of the nonclustered indexes is complete.

 

That first one is a cracker – it hit me once when compressing a SQL Server table (600M+ rows) on a 64 core Enterprise SQL Server.  After benchmarking several other data compression activities I thought I had a basic “rule of thumb” (based on GB data size and number of rows)… of which just happened to be coincidence!

This also begs the question of why would you use low selectivity indexes?  Well I can think of a few cases – but the one which stands out the most is the identification of a small number of rows within a greater collection – such as an Index on TYPE columns (ie; [ProcessingStatusFlag] CHAR(1) = [P]rocessed, [U]nprocessed, [W]orking, [F]ailed, etc)

… AND SO – lets do some testing to validate this puppy!

Continue reading