Query Azure CosmosDB from a SQL Server Linked Server

Recently I had a requirement to combine data that I already had in SQL Server (2016) with JSON document data already stored in Azure CosmosDB.  Both databases were operational and continuously accepting data so I didn’t want to go to the trouble of doing the delta load thing between them, instead I just wanted to be able to query directly on demand.

And so – the purpose of this article is to outline the method to connect direct to Azure CosmosDB from SQL Server using a SQL Linked Server.

Finally … SQL & NoSQL … together at last!

For those interested to learn more about Azure CosmosDB, check out my previous blog post here – https://mrfoxsql.wordpress.com/2016/05/11/azure-documentdb-preparing-loading-querying-data/

Or the official documentation here – https://docs.microsoft.com/en-us/azure/cosmos-db/

And so right up front – this solution only works for SQL Server on VM/IaaS – and is not supported for Azure SQL DB (ASDB) – mainly as ASDB doesn’t support SQL Linked Servers! (Damn, they say!)

Continue reading

Advertisements

Streaming Reporting: SQL Change Data Capture (CDC) to Power BI

Extending on my previous post about redirecting SQL CDC changes to Azure Event Hub, I have had a few people ask for details/options to stream SQL data into the Power BI API.

Specifically – they were looking for an easy method to leverage the ADD ROWS functionality of the Power BI API so they could push real-time data into a Power BI service dataset.

This method provides the ability to update the Power BI Dataset with new rows every few seconds, instead of a Power BI report having to either use Direct Connect or Scheduled data refresh capability which can be very limiting.

If interested in how the SQL CDC and Event Hubs work together, then read here from my previous post – https://mrfoxsql.wordpress.com/2017/07/12/streaming-etl-send-sql-change-data-capture-cdc-to-azure-event-hub/

The purpose of this post is to quickly show how to extend and explore pushing new SQL data rows via Azure Stream Analytics into Power BI.

And so, lets get into some CDC to Power BI streaming action!

Continue reading

Streaming ETL: SQL Change Data Capture (CDC) to Azure Event Hub

I had a recent requirement to capture and stream real-time data changes on several SQL database tables from an on-prem SQL Server to Azure for downstream processing.

Specifically we needed to create a streaming ETL solution that …

  1. Captured intermediate DML operations on tables in an on-prem SQL database
  2. Transmit data securely and real-time into Azure
  3. Store the delta changes as TXT files in Azure Data Lake Store (ADLS)
  4. Visualise the real-time change telemetry on a Power BI dashboard (specifically the number of Inserts, Updates, Deletes over time).

The first part was easy; SQL has a feature called Change Data Capture (CDC) which does an amazing job of tracking DML changes to seperate system tables.  If you dont know about CDC then see here – https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server

The second part wasn’t easy, and after some searching I came across this blog post by Spyros Sakellariadis which gave me inspiration and starter code for my streaming ETL solution.  Excellent post.  See here – https://azure.microsoft.com/en-us/resources/samples/event-hubs-dotnet-import-from-sql/

And so, the final architecture looks something like this…

The solution picks up the SQL data changes from the CDC Change Tracking system tables, creates JSON messages from the change rows, and then posts the message to an Azure Event Hub.  Once landed in the Event Hub an Azure Stream Analytics (ASA) Job distributes the changes into the multiple outputs.

What I found pretty cool was that I could transmit SQL delta changes from source to target in as little as 5 seconds end to end!

And so, lets get into some CDC to Event Hub data streaming action!

Continue reading

What Exactly is the “Data Platform” Nowadays?

A couple of months ago I was presenting at SQL Saturday Melbourne (582) on Azure Cognitive Services and got chatting with some of the other presenters about our sessions.

I co-presented with Krissy Rumpff from Microsoft Data Platform Team (https://www.linkedin.com/in/krumpff/) – and for those interested our session is here – http://www.sqlsaturday.com/582/Sessions/Details.aspx?sid=56483    …or…    you can look at the recording here – https://channel9.msdn.com/Events/Ignite/Australia-2017/DA321

Anyway, whats interesting is that some of the other presenters were asking why we were presenting on Cognitive Services, when in fact this was SQL Saturday?  And, you know, Cognitive is not the Data Platform?

This is actually an interesting point – and since then I have had a pretty good think about what this means – and so this is the purpose of this blog post!

Continue reading

A Collection of SQL Server Help Scripts

Like probably every SQL DBA, consultant, architect etc etc out there that has ever worked on or used SQL Server they will likely have their own personal collection of SQL Server Help Scripts.

So not unsurprisingly I also have such a collectionand so this is the purpose of this post!

Continue reading

SQL Saturday 582 Melbourne (11 Feb 2017) and Ignite 2017 (14-17 Feb 2017)

For those not aware there are 2 cool local and national events coming to both Melbourne and Gold Coast

  • SQL Saturday 582 (Sat 11 Feb 2017).  Melbourne.  For those looking for some great free local SQL (and related!) learning, you simply cannot go past a SQL Saturday anywhere in the world!  And this one right here in Melbourne will again be no exception.  There is a lineup of fantastic local speakers including Microsoft and MVP’s as well as international speakers too.  http://www.sqlsaturday.com/582/EventHome.aspx
  • Ignite 2017 (14-17 Feb 2017).  Gold Coast.  For those wanting more than a day and across a wider variety of topics, then Ignite will also be another you simply cannot miss.  As above there is a fantastic lineup of both local Microsoft and local MVP/experts as well as an impressive list of international speakers.  This is a 4 day back-to-back set of technical sessions. https://msftignite.com.au/

UPDATE: Ignite 2017 has come and gone – and so here’s some links!

 

I want to focus this short blog post on SQL Saturday 582 as there are some great things happening right here locally in Melbourne… 

SQL Saturday 582 – Melbourne

SQL Saturday is an excellent free learning resource for all things SQL Server – all costs are covered by donations and sponsorships.  Some of the excellent sponsors this year are Microsoft, Wardy IT, Idera and PASS.

Some of the session focus areas include SQL 2016 (many deep dives across almost all facets!), SQL DB/DW in Azure, Azure Cognitive Services and Machine Learning, R, Agile Methods, Power BI, Powershell, BIML …and more!

Furthermore – Some of the top Microsoft product group (PG) speakers will also be attending:

  • Lindsey Allen
  • Sunil Agarwal
  • Matt Masson
  • Julie Koesmarno
  • Ajay Jagannathan

Sunil will also be helping to run a SQL Clinic on the day where you can talk directly to Microsoft about your biggest pain points or suggestions for the next versions of SQL Server.

For those wanting to come along here are the links you need to know.  Please go to the website and register to attend.

The event is being held at Monash University (Caulfield Campus, 888 Dandenong Road, Caulfield East, Victoria)

 

The Rise of the Machines…

For those attending – I am co-presenting a pretty cool session with the awesome Krissy Rumpff (from local Microsoft SQL Team) on Azure Cognitive Services.  

Session Details here – http://www.sqlsaturday.com/582/Sessions/Details.aspx?sid=56483

We’ll talk though what they are, why you should care and how to interact with them (via API’s).  We hope to show some pretty fun demos and draw some relevant use cases – and if time permits will look for some unwilling audience participation!

Feel free to pop in and introduce yourself!

 

I hope to see you all in Melbourne at SQL Saturday!


Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here

Azure Cognitive Services API’s with SQL Server 2016 CLR

Azure Cognitive Services is relatively new functionality within Azure that exposes some truly amazing APIs that have the ability to do some truly amazing things.

Before I dive into SQL and DLL code to make use of Cognitive Services, lets take a second to understand what I am talking about – imagine this;

  • A customer walks to an electronic kiosk in a shopping centre and says “Hi Cortana, I want to book a holiday, I really need a break from this bad weather.  Any ideas on where I should go?
  • The kiosk is run by a “bot” capable of conversing in 30 languages and which has been trained on understanding context and intent.
  • It recognises you as a 35 yo male who looks and sounds unhappy, and because it recognises your face and voice it knows it has talked to you before in another shopping centre last week when you asked for directions to a Surf Shop clothing store.
  • It also recognises a beach image on your T-Shirt, making note of the link between your previously asked directions and your clothing.
  • As you spoke in English, it replies in English – “Good to see you again.  Now, would you consider a beach holiday to Bali or Thailand?” – Why beach? Well the recommendations engine has determined that is where 35yo male surfer types go when the local weather is bad!
  • You negotiate a package using natural language, and close out the conversation.
  • The “bot” visualises your increased sentiment from the initial baseline and says “I’m glad I could make your day better! Enjoy your flight next week!

It may sound futuristic – but this is exactly what Cognitive Services (API’s) can do right now – and in my example I have only used 6 out of the 21 Azure Cognitive Services!  Microsoft Research has built these powerful Azure ML Models and wrapped them up into a single, simple, consumable publicly available API.

Some other amazing deployments for Cognitive Services

  • At a trade show, or even a window display at a shopping centre, a company could use emotion detection to see how people are reacting to their products.
  • Facial recognition could be used to find missing children quickly at an amusement park.
  • The APIs can determine the male:female ratio and ages of patrons at a nightclub, and identify VIPs or banned guests.
  • The object recognition capabilities can enable a blind person to read a menu in a restaurant or have their surroundings described to them

For those not familiar with Azure Cognitve Services APIs, check out this link which has online demos you can try – https://www.microsoft.com/cognitive-services/en-us/apis

For those not familiar with the Azure Bot Framework, check out this link – https://dev.botframework.com/

Anyway – despite all this, for this post today we’ll just focus on something pretty simple – making usage of the Text Analytics API right within SQL Server 2016.

And so, lets get to scoring some sentiment!

Continue reading

Azure SQL IaaS – TempDB Performance on SSD vs Premium Disk

When it comes to deploying SQL Server onto an Azure IaaS VM, it is sometimes difficult to know the best option to deploy the SQL Server tempdb database for your workload.

In many of the SQL templates on the marketplace it is often deployed to the C:\ by default after which you should redeploy the database to either D:\ (local SSD) or to an attached premium disk (P10, P20, P30).  The Microsoft SQL on IaaS Performance Best Practice article states both are possibilities under certain circumstances, however it does not provide empirical workload evidence as which to use when.

For those who have not seen the article – read here – https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-sql-performance/

The key comment of interest is this…

For D-series, Dv2-series, and G-series VMs, the temporary drive on these VMs is SSD-based. If your workload makes heavy use of TempDB (e.g. for temporary objects or complex joins), storing TempDB on the D drive could result in higher TempDB throughput and lower TempDB latency.

…and this…

So I thought lets test a OLTP type SQL workload!

AND SO – lets do some testing to validate this puppy!

Continue reading

Compression Performance with Low Key Selectivity

When it comes to enabling data compression in SQL Server I suspect most people don’t do a lot of testing as to the impacts of either enabling compression and its ongoing maintenance.  I assume most are interested in its ongoing performance for queries, and you know that’s damn fair enough too!

If you read the Microsoft Data Compression Whitepaper (and I mean whos got time to do that!) then it has some interesting technical tidbits burried deep in there which affect the impacts of applying and managing data compression.

https://technet.microsoft.com/en-us/library/dd894051(v=sql.100).aspx

Here are some of the ones of interest…

SQL Server uses statistics on the leading column to distribute work amongst multiple CPUs, thus multiple CPUs are not beneficial when creating, rebuilding, or compressing an index where the leading column of the index has relatively few unique values or when the data is heavily skewed to just a small number of leading key values – only limited effective parallelism will be achieved in this case.


And this…

Compressing or rebuilding a heap with ONLINE set to ON uses a single CPU for compression or rebuild. However, SQL Server first needs to scan the table—the scan is parallelized, and after the table scan is complete, the rest of the compression processing of the heap is single-threaded.


And this…

When a heap is compressed, if there are any nonclustered indexes on the heap, they are rebuilt as follows:
(a) With ONLINE set to OFF, the nonclustered indexes are rebuilt one by one.
(b) With ONLINE set to ON, all the nonclustered indexes are rebuilt simultaneously.
You must account for the workspace required to rebuild the nonclustered indexes, because the space for the uncompressed heap is not released until the rebuild of the nonclustered indexes is complete.

 

That first one is a cracker – it hit me once when compressing a SQL Server table (600M+ rows) on a 64 core Enterprise SQL Server.  After benchmarking several other data compression activities I thought I had a basic “rule of thumb” (based on GB data size and number of rows)… of which just happened to be coincidence!

This also begs the question of why would you use low selectivity indexes?  Well I can think of a few cases – but the one which stands out the most is the identification of a small number of rows within a greater collection – such as an Index on TYPE columns (ie; [ProcessingStatusFlag] CHAR(1) = [P]rocessed, [U]nprocessed, [W]orking, [F]ailed, etc)

… AND SO – lets do some testing to validate this puppy!

Continue reading

SQL Filegroup/File Proportional Fill Algorithm

I had a question at work recently where there was some confusion around how SQL Server allocates data across data files within a filegroup in a user database.  There was a mention that data was not being distributed evenly across files and also that a trace flag was needed for SQL Server to distribute data evenly.  I am uncertain if those circumstances were database config related or something else outside of proportional fill.

So I thought I’d do a quick post just to clarify how proportional fill works via demonstration.

SQL Server has used a proportional fill strategy across data files in a filegroup for some time (as long as I care to remember anyway) and this has been pretty well documented in SQL BoL and a number of blog posts on the web already.

Filegroups use a proportional fill strategy across all the files within each filegroup. As data is written to the filegroup, the SQL Server Database Engine writes an amount proportional to the free space in the file to each file within the filegroup, instead of writing all the data to the first file until full. It then writes to the next file.

As soon as all the files in a filegroup are full, the Database Engine automatically expands one file at a time in a round-robin manner to allow for more data, provided that the database is set to grow automatically.

https://technet.microsoft.com/en-us/library/ms187087(v=sql.105).aspx

When multiple files are involved, and if these are ideally located on different physical spindles on the underlying disk subsystem, then a rather nicely performing data striping can be achieved for the database.  If proportional fill kicks in and starts to focus on files with more free space then you may get hot spots for those files.  However nowadays with auto-tiering SAN’s, SSD and (abstracted) cloud storage (for IaaS deployments) this is beginning to matter less and less.

However – Lets get into breaking down the proportional fill algorithms!

Continue reading