[UPDATED 7 JUN 2020 – ADDED NEW TEXT ANALYTICS v3.0 FEATURES!]
Just last week we had the fantastic opportunity to present at Microsoft Ignite 2017 in the Gold Coast on Azure Cognitive Services – and we had an absolute blast of a time!
I co-presented with Kristina Rumpff who works at Microsoft in the Data Platform team as a Solution Architect. I focused on an overview of the suite of Azure Cognitive Services along with a deep dive into the Text Analytics service, and Krissy focused on the LUIS service coupled together with Azure Bots.
- For those interested – this is our session on Channel 9 (75 mins) – https://channel9.msdn.com/Events/Ignite/Australia-2017/DA321
- We also did an interview for SSW which is here (10 mins) – https://www.youtube.com/watch?v=jILcbJw1nso
- If you just need a refresher on what Cognitive Services is then go here – https://azure.microsoft.com/en-us/services/cognitive-services/
Fast Start to Azure Text Analytics Cognitive API’s
Leading up to, and since, the session I had a few people ask if there is anything pre-canned application wise which can call the Text Analytics API’s which they can just use.
The answer to that is kind of yes…
- Text Analytics MS Docs Ref – https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/
- Running Text Analytics in Containers! – https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-install-containers
- My previous blog post on embedding the API into SQL 2016 (CLR) – https://mrfoxsql.wordpress.com/2016/09/13/azure-cognitive-services-apis-with-sql-server-2016-clr/
- Build your own pipeline using Azure Logic Apps (which mind you even though its a graphical workflow service you still need to build it!) – https://azure.microsoft.com/en-au/services/logic-apps/
- Luckily this is an excellent blog post showing you how to build your very own Logic App to push text to the Text Analytics API – very cool! – https://gautambiztalkblog.com/2017/01/09/logic-app-to-detect-sentiment-and-extract-key-phrases-from-your-text/
However apart from that I didn’t find anything else out there which people can quickly leverage to do this for them… so I wrote one!
UPDATE: I have since found this app handy myself when I need to quickly process a stack of once-off random text that someone sent to me for some other downstream reporting. (…AND this was a good chance to learn more c# coding!)
And so, lets see the application code in action!
Azure Text Analytics Winforms Application
The application is a c# windows forms solution built using Visual Studio 2019. Its pretty simple – it takes in a source file containing lines of text to run against the Text Analytics API and writes several output files containing the results of that assessment.
I have the full solution and sample files in GitHub at the end of this post.
These are the key application inputs, processing and outputs….
- Text Analytics Base URL – is already defaulted to the correct API endpoint
- Text Analytics Version URL – is already defaulted to the correct version
- Text Analytics API Key – this is the key to access your Text Analytics API which has been deployed into Azure. My previous post on Text Analytics shows you how to deploy the service and get your key – https://mrfoxsql.wordpress.com/2016/09/13/azure-cognitive-services-apis-with-sql-server-2016-clr/
- Source Text File – this is the file you want to grade. Text only. Each line is considered a single “gradable” data point. That means the entire line is considered a single column. Text lines can be anything such as free text, sentences, paragraphs, tweets etc. I have tested up to several thousand lines in an input file. The application only takes in a single column and line of text to grade, so if your source data has additional columns of which one is the text, then you need to pull that text into a separate file. You will need to manually merge the graded results back into your master file later.
- Output Folder Path – this is the folder location where the assessment output files will be written. Each execution of the application on any input file will generate 4 text output files with the results of the assessment. (see below)
- Split Document into Sentences – for each text line in the Source Text File a regex expression will parse the line and split it into individual sentences. Thus in addition to each line being analysed, each individual sentence in that line will also be analysed.
- Replace TAB Character – replace any TAB characters with your desired character. (enforced)
- Trim Left | Trim Right – remove leading or trailing spaces
- Remove Hash Tags – remove any #hashtags from the text. What I have found is that hashtags can skew the grading result if they are present.
- Remove URL – remove any URL’s from the text. What I have found is that URL’s can skew the grading result if they are present.
- Text API Operations – select from Sentiment, Key Phrases, or Named Entity operations. Language is always assumed to be English (en)
- START! – select this to kick off the processing. The dialog window will show the execution status and % complete. Once complete the output files will be written to the output folder path location. If errors are encountered, such as out of API quota etc, they will be shown in a popup window.
Each execution of the application on any input file will generate 4 text output files with the results of the assessment. The application runs at a rate of about 1-2 calls per second (the max send rate cannot exceed 100/min as this is the API limit).
I have benchmarked the app on a Windows 10 Desktop at a rate of 10K text predictions in 90 mins (ie Sentiment, Key Phrases, Entity).
- File 1 [AzureTextAPI_SentimentText_YYYYMMDDHHMMSS.txt] – the sentiment score between 0 and 1 for each individual line in the Source Text File. The entire line in the file is graded as a single data point. 0 is negative, 1 is positive. The sentiment is also graded positive, neutral, negative.
- File 2 [AzureTextAPI_SentenceText_YYYYMMDDHHMMSS.txt] – if the “Split Document into Sentences” option was selected then this contains each individual sentence in each individual line with the sentiment score of that sentence between 0 and 1. 0 is negative, 1 is positive. The sentiment is also graded positive, neutral, negative.
- File 3 [AzureTextAPI_KeyPhrasesText_YYYYMMDDHHMMSS.txt] – the key phrases identified within the text on each individual line in the Source Text File.
- File 4 [AzureTextAPI_EntityText_YYYYMMDDHHMMSS.txt] – the named entities identified within the text on each individual line in the Source Text File.
If you want to see any of the API features, then check out these MS Docs here;
- Sentiment – https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-sentiment-analysis
- Key Phrases – https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-keyword-extraction
- Named Entities – https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking
Using Power BI to Visualise Results
Each of the files has a unique ID for each analysed line and graded result, so you can easily load them in your fav reporting tool of choice and do any type of reporting you want! I will make the assumption that you know how to use Power BI so I wont talk about that here.
Load all of the 4 output text files into Power BI and join them in the modelling section as per the below. Each of the output files have headers so its pretty obvious which columns in which files join to the others.
Once done you can then do some naming and formatting customisations on the columns and tables as you desired, or even add new DAX calculated measures, etc. After that you can then use that polished model to build out some pretty cool reporting visualisations such as these examples below!
Wrap Up and Summary
So thats it folks, a simple c# winforms application to grade text using the Azure Cognitive Services Text Analytics API’s for Sentiment, Key Phrases, and Named Entities and then coupled together with a simple method to visualise the graded output.
AND of course, as I always say, please test this yourself as your results may vary!
Text Analytics Winforms Application Code
NOTE TO CODE: I am not a seasoned app developer, of which you may get just a hint once you see my “code” – so go easy on me! As such the application code is provided free without any support or warranty of any kind. The code has not been thoroughly tested and is not considered production ready. The code is provided free of charge and can be reused in any way you wish. Please check my disclaimer below to learn more.
- The GitHub repo with the Visual Studio solution is here – https://github.com/rolftesmer/AzureTextAnalyticsAPIApp
- If you just want to run the application, then in the GitHub repo above I have a single setup.exe installer. Just run this to install the app, and you are good to go!
- The GitHub repo has a sample file you can use to do some test grading. See under the “@SampleFilesToScore” folder. It also contains some sample output files so you can see what these look like too.
Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here