Azure Cognitive Services API’s with SQL Integration Services Packages (SSIS)

[read this post on Mr. Fox SQL blog]

I had a recent requirement to integrate multi-language support into a SQL DW via a SQL SSIS ETL solution.  Specifically the SQL DW platform currently only supported English translation data for all Dimension tables, but the business was expanding internationally so there was a need to include other language translations of the Dimensional attributes.

We wanted to do this without having to manually translate English text attributes that exist already, or new ones that are added or modified over time.  We wanted an automated method that simply “worked“.

Enter Azure Cognitive Services Translator Text API service!

So the purpose of this blog is to outline the code/pattern we used to integrate the Azure Cognitive Services API into SQL SSIS ETL packages.

Primer on Azure Cognitive Services

Azure Cognitive Services has been available in Azure for almost 2 years now.  They are a suite of API’s that expose amazing intelligent AI services which have the ability to do some truly amazing things.  The services cover the 5 core pillars of Vision, Speech, language, knowledge and Search.

As at writing there are almost 30 Azure Cognitive API’s.  You can see the full list of all API’s here – https://azure.microsoft.com/en-us/services/cognitive-services/directory/

I have blogged before on an end-to-end AI scenario which applies several of the Azure Cognitive Services together in a single solution.  You can read about that here – https://mrfoxsql.wordpress.com/2016/09/13/azure-cognitive-services-apis-with-sql-server-2016-clr/

(Interesting aside – when I wrote that original blog post 18 months ago there were only 21 Cognitive API’s, and now there’s 28!  Cannot wait to see where this space goes in the next 18 months!)

 

The Azure Cognitive Services “Translator Text API”

So, going back to my scenario of language translation in SSIS for ETL workloads

My SQL SSIS package leverages the Translator Text API service.  For those who want to learn the secret sauce then I suggest to check here – https://azure.microsoft.com/en-us/services/cognitive-services/translator-text-api/

essentially this API is pretty simple;

  1. It accepts source text, source language and target language.  (The API can translate to/from over 60 different languages.)
  2. You call the API with your request parameters + API Key
  3. The API will respond with the language translation of the source text you sent in
  4. So Simple, so fast, so effective!

To use the Translator Text API service, you need to provision the service in the Azure Portal which is shown here – https://docs.microsoft.com/en-us/azure/cognitive-services/translator/translator-text-how-to-signup

Billing is based on number of characters included in the text to be translated.  There is a FREE tier that allows you to translate up to 2M characters.  (For reference, a typical 30-page document has around 17K characters.).  Overall its very economical for ETL solutions!

What else can this delightful Translator Text API do, I hear you think…

  • Text Transformation (ie normalise loose text like “Dis is 2 strange” into something that, well, non-teenagers can understand!)
  • Custom Transformation (ie translate your specific business domain terms/words that otherwise wouldn’t translate into other languages)
  • Profanity Filtering (ie mark up profanity, or remove it entirely from the string)

The Full API Reference is here – http://docs.microsofttranslator.com/text-translate.html

The Full FAQ is here – https://www.microsoft.com/en-us/translator/faq.aspx

 

SQL SSIS package solution

The SQL SSIS package Data Flow workspace looks like this below.

 

 

 

 

 

 

 

 

The ETL package itself isnt that technologically edgy, in fact its pretty darn basic really.  The interesting bit (for this blog anyway) is the “Script Component” which calls the Azure Cognitive Services API and collects the API response.

Its important to note that this Script Component is defined as a Transformation script and the code has been written in c# (VS2017).

SQL SSIS – Transformation c# “Script Component”

 
#region Namespaces
using System;
using System.IO;
using System.Net;
using System.Runtime.Serialization;
using System.Web;
#endregion

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
 public override void Input0_ProcessInputRow(Input0Buffer Row)
 {
 string AuthKey = Variables.TranslateAuthKey;
 string TranslateFrom = Variables.TranslateFrom;
 string TranslateTo = Variables.TranslateTo;
 string BaseURL = "https://api.microsofttranslator.com/V2/Http.svc/Translate";
 string EnglishProductSubcategoryName = Row.EnglishProductSubcategoryName; // The source text to translate
 
 // Make REST call to Azure Cognitive Service Translator Text API
 string FullURL = BaseURL + "?text=" + HttpUtility.UrlEncode(EnglishProductSubcategoryName) + "&from=" + TranslateFrom + "&to=" + TranslateTo;
 HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(FullURL);
 httpWebRequest.Headers.Add("Ocp-Apim-Subscription-Key", AuthKey);
 using (WebResponse response = httpWebRequest.GetResponse())
 using (Stream stream = response.GetResponseStream())
 {
 DataContractSerializer dcs = new DataContractSerializer(Type.GetType("System.String"));
 Row.GermanProductSubcategoryName = (string)dcs.ReadObject(stream); // The translated text value
 }
 }
}

SQL SSIS – Adding Script References to the c# code

To make the script work, you need to ensure you add references in the Script component editor for the following libraries…

  • System.Net.Http
  • System.Net.Http.WebRequest
  • System.Runtime.Serialization
  • System.Web

SQL SSIS – package variables and connections

When you open the solution in VS2017, you will need to update the following…

  1. Package VariableTranslateAuthKey” holds the Translator Text API Key.  You need to deploy the Translator Text API service via the Azure Portal into your Subscription (as per the link in the above section), and grab the API Key.
  2. Package VariableTranslateFrom” holds the Source Language.  Currently this is set as “en” for English language transaction.
  3. Package VariableTranslateTo” holds the Target Language.  Currently this is set as “de” for German language transaction.  This could be any up to 60 languages.
  4. The Connection Manager to point the package to your SQL Server where you want to create some sample Stage + DW tables.  They will be created and loaded with sample data by the SQL SSIS package in your tempdb database

Once done you can just run the SSIS package and it will connect to your SQL Server, provision the database tables in the SQL tempdb database, create some sample data and call out to the Azure Cognitive API to translate!

 

The Translator Text API in Action in SQL SSIS!

So lets do some translation testing – all we need to do is run the SQL SSIS translation package!

On the first run the package will create the Stage and DW tables in tempdb, and create 30 new Stage rows to be translated and loaded into the DW table.

 

 

 

 

 

 

 

 

 

The package takes about 5 seconds to run end to end. So it was translating at a rate of about 6 rows/sec (which also includes all of the key lookups and prep time)

Ok, awesome, but what about if we just add one new Stage data row and one updated Stage data row?

 

And the execution run in SQL SSIS

 

 

 

 

 

 

 

 

And the translation result in SSMS

Ok, so we can translate text, and very quicklyMaybe next time we should add in the Azure Cognitive Text to Speech API so automate actually saying it! 🙂

 

VS2017 Solution Downloads

The Visual Studio 2017 Solution containing the SQL SSIS package I wrote with all of the code and references can be downloaded from my github repo here – https://github.com/rolftesmer/SQLSSIS_AzureCognitiveAPI

The SSIS solution isnt considered “production ready” and is just a quick example of how you can write a SQL SSIS package to call out to external web API’s, and integrate the response into a downstream ETL data flow.

Feel free to download the solution, and modify as needed to suit your scenario.

 

Summary

So there you have it, a pretty simple method to integrate Azure Cognitive API’s right into your SSIS ETL and related data processes.

As at writing, given there are almost 30 Azure Cognitive API’s, then this opens up many other ways you could use the code/pattern in this blog for intelligence processing right within the SQL SSIS ETL services…

  1. Parse a set of picture files in a directory or blob store though the Vision API’s (Computer Vision API, Face API, Content Moderator API, etc) and save the tagged metadata into a database for later processing.
  2. Similar to the above, but this time with sound byte files parsing though the Speech API’s (Translator API, Recognition API, Bing Text to Speech / Speech to Text API’s, etc)
  3. Performing sentiment analysis and key phrase extraction on text you are loading into a database, or reading from a database using the Text Analytics API’s
  4. …and probably a stack more I haven’t thought of so far

Also given that SQL SSIS is such a powerful ETL service, you can of course combine the API response with any other data you may already have in your source or target platforms to create a fully integrated intelligent ETL solution!

 

So as usual, and as I always say, please test this out yourself with your own data as your mileage may vary!


Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here

Advertisements

5 thoughts on “Azure Cognitive Services API’s with SQL Integration Services Packages (SSIS)

  1. Tony 24 March 2018 / 10:18 PM

    Hi, I like this article and your referenced article https://mrfoxsql.wordpress.com/2016/09/13/azure-cognitive-services-apis-with-sql-server-2016-clr/

    Have you been able to call an API from an Azure DB without setting up a separate, dedicated VM running SQL Server to run either SSIS or CLR functions? I am trying to figure out how I can call an API on one row or a batch of rows in Azure DB. There might end up being different API’s such as Microsoft Cognitive API or other vendor ML API and so I would want to be able to script up a function and then be able to call the function in a SQL statement

    Thanks,
    Tony

    Like

    • Mr. Fox SQL (Rolf Tesmer) 24 March 2018 / 10:50 PM

      Hi Tony – at the moment its unfortunately not possible in Azure SQL DB as it dosent support CLR. However if it fits your use case, you could run it via services in Azure that are external to the DB – like Azure Automation or Logic Apps (which could read the DB, call the API, and write back to the DB). As FYI the new Azure SQL Database Managed Instance (PaaS Service in public preview now) will support CLR – https://docs.microsoft.com/en-us/azure/sql-database/sql-database-features

      Like

      • Tony 24 March 2018 / 11:07 PM

        Great, thanks for the reply Rolf.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s