Streaming ETL: SQL Change Data Capture (CDC) to Azure Event Hub

I had a recent requirement to capture and stream real-time data changes on several SQL database tables from an on-prem SQL Server to Azure for downstream processing.

Specifically we needed to create a streaming ETL solution that …

  1. Captured intermediate DML operations on tables in an on-prem SQL database
  2. Transmit data securely and real-time into Azure
  3. Store the delta changes as TXT files in Azure Data Lake Store (ADLS)
  4. Visualise the real-time change telemetry on a Power BI dashboard (specifically the number of Inserts, Updates, Deletes over time).

The first part was easy; SQL has a feature called Change Data Capture (CDC) which does an amazing job of tracking DML changes to seperate system tables.  If you dont know about CDC then see here – https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server

The second part wasn’t easy, and after some searching I came across this blog post by Spyros Sakellariadis which gave me inspiration and starter code for my streaming ETL solution.  Excellent post.  See here – https://azure.microsoft.com/en-us/resources/samples/event-hubs-dotnet-import-from-sql/

And so, the final architecture looks something like this…

The solution picks up the SQL data changes from the CDC Change Tracking system tables, creates JSON messages from the change rows, and then posts the message to an Azure Event Hub.  Once landed in the Event Hub an Azure Stream Analytics (ASA) Job distributes the changes into the multiple outputs.

What I found pretty cool was that I could transmit SQL delta changes from source to target in as little as 5 seconds end to end!

And so, lets get into some CDC to Event Hub data streaming action!

Continue reading

PASS 2015 Session Report – Understanding Real World Big Data Scenarios

PASS 2015 continues in Seattle, and today was my session at 1045am on Using Azure Machine Learning (ML) to Predict Seattle House Prices.  The background and info on my session is here http://www.sqlpass.org/summit/2015/Sessions/Details.aspx?sid=7794

Overall I was pretty happy with how it went - and I think everyone who attended had a lot of fun with some of the games and tests I injected into the presentation.  Everyone had a chance to be a Real Estate Agent :) - and at the same time learn some great methods around performing Azure ML Regression Predictive Analytics.

 

BUT – moving right along – I also attended 3 other sessions today, again I cannot blog about all of them in the time I have, but the one which made me think the most about technology implementations and how they can improves lives was Understanding Real World Big Data Scenarios by Nishant Thacker of Microsoft.

It wasn’t about use cases for big data (as this is a horse already bolted), but more around really innovative and interesting ways the ecosystem of Azure technologies could be deployed to solve some complex business problems, or moreso simply ways to make our lives better!

Continue reading