[read this post on Mr. Fox SQL blog]
The “modern data platform” architecture is becoming more and more popular as organisations shift towards identifying, collecting and centralising their data assets and driving towards embracing a “data driven culture“.
Microsoft Azure has a suite of best-of-breed PaaS based services which can be plugged together by organisations wishing to create large scale Data Lake / Data Warehouse type platforms to host their critical corporate data.
When working with customers going down the Modern Data Platform path I often hear very similar questions;
- What is the most suitable and scaleable architecture for my use case?
- How should I logically structure my Data Lake or Data Warehouse?
- What is the most efficient ETL/ELT tool to use?
- How do I manage batch and streaming data simultaneously?
While these are all very valid questions, sorry, but that’s not what this blog is about! (one for another blog perhaps?)
In my view – what often doesn’t get enough attention up front are the critical aspects of monitoring, auditing and availability. Thankfully, these are generally not too difficult to plug-in at any point in the delivery cycle, but as like with most things in cloud there are just so many different options to consider!
So the purpose of this blog is to focus on the key areas of Azure Services Monitoring and Auditing for the Azure Modern Data Platform architecture.
[read this post on Mr. Fox SQL blog]
I had a recent requirement to capture and stream real-time data changes on several SQL database tables from an on-prem SQL Server to Azure for downstream processing.
Specifically we needed to create a streaming ETL solution that …
- Captured intermediate DML operations on tables in an on-prem SQL database
- Transmit data securely and real-time into Azure
- Store the delta changes as TXT files in Azure Data Lake Store (ADLS)
- Visualise the real-time change telemetry on a Power BI dashboard (specifically the number of Inserts, Updates, Deletes over time).
The first part was easy; SQL has a feature called Change Data Capture (CDC) which does an amazing job of tracking DML changes to seperate system tables. If you dont know about CDC then see here – https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server
The second part wasn’t easy, and after some searching I came across this blog post by Spyros Sakellariadis which gave me inspiration and starter code for my streaming ETL solution. Excellent post. See here – https://azure.microsoft.com/en-us/resources/samples/event-hubs-dotnet-import-from-sql/
And so, the final architecture looks something like this…
The solution picks up the SQL data changes from the CDC Change Tracking system tables, creates JSON messages from the change rows, and then posts the message to an Azure Event Hub. Once landed in the Event Hub an Azure Stream Analytics (ASA) Job distributes the changes into the multiple outputs.
What I found pretty cool was that I could transmit SQL delta changes from source to target in as little as 5 seconds end to end!
And so, lets get into some CDC to Event Hub data streaming action!
PASS 2015 continues in Seattle, and today was my session at 1045am on Using Azure Machine Learning (ML) to Predict Seattle House Prices. The background and info on my session is here http://www.sqlpass.org/summit/2015/Sessions/Details.aspx?sid=7794
Overall I was pretty happy with how it went - and I think everyone who attended had a lot of fun with some of the games and tests I injected into the presentation. Everyone had a chance to be a Real Estate Agent :) - and at the same time learn some great methods around performing Azure ML Regression Predictive Analytics.
BUT – moving right along – I also attended 3 other sessions today, again I cannot blog about all of them in the time I have, but the one which made me think the most about technology implementations and how they can improves lives was Understanding Real World Big Data Scenarios by Nishant Thacker of Microsoft.
It wasn’t about use cases for big data (as this is a horse already bolted), but more around really innovative and interesting ways the ecosystem of Azure technologies could be deployed to solve some complex business problems, or moreso simply ways to make our lives better!