A couple of months ago I was presenting at SQL Saturday Melbourne (582) on Azure Cognitive Services and got chatting with some of the other presenters about our sessions.
I co-presented with Krissy Rumpff from Microsoft Data Platform Team (https://www.linkedin.com/in/krumpff/) – and for those interested our session is here – http://www.sqlsaturday.com/582/Sessions/Details.aspx?sid=56483 …or… you can look at the recording here – https://channel9.msdn.com/Events/Ignite/Australia-2017/DA321
Anyway, whats interesting is that some of the other presenters were asking why we were presenting on Cognitive Services, when in fact this was SQL Saturday? And, you know, Cognitive is not the Data Platform?
This is actually an interesting point – and since then I have had a pretty good think about what this means – and so this is the purpose of this blog post!
What was the Data Platform – before now that is?
Historically up to perhaps a few years back when customers, peers, colleagues, etc talked about the “Data Platform” they were typically referring to an RDBMS providing the core layer on which the majority of corporate data of known value was hosted. Providing this were your typical products such a SQL Server, Oracle, etc.
Over time these RDBMS engines were extended with components like ETL services, reporting services, master and quality data services, cube/model services, etc. The RDBMS products have (without a doubt) expanded their capability to include lots of non-traditional capability like XML, JSON, CLR, R and the like, but what I’d observed with customers was that fundamentally the Data Platform for them was still considered relational at its core.
Sure – other technology like Big Data and NoSQL were popping up here too – however these were sparsely deployed and in the customers I’d worked with these were often not considered part of their primary Data Platform, but something more of a special project deployment or unique need.
What is the Data Platform – today?
Before I can answer this question, first a bit on my own career path!
As they say you are a sum of your experiences – and so my career hasn’t had much in the way of twists and turns, and instead has followed a reasonably straight path.
My roles have been…
- DEC RDB DBA – yeah, if you’re wondering! https://en.wikipedia.org/wiki/Oracle_Rdb
- Oracle DBA
- SQL DBA
- SQL Consultant
- SQL BI/DW Solution Architect
- Data Solution Architect
From Solution Architecture days (#5) I’d noted a shift in customer awareness of how other technologies could help a business grow, innovate and differentiate themselves. Big Data in particular was gaining awareness and adoption in addition to their current core relational workloads. Before this time I just didn’t see that much interest or opportunities present themselves in this space.
It was also around this time that Public Cloud providers were just getting a foothold and growing at incredible rates. Public Cloud platforms, such as Azure, had introduced new layers infrastructure obfuscation in the form of Platform as a Service (PaaS) that enabled the ability to architect and rapidly deploy a highly scalable data solution that would rarely have been considered for on-prem due to cost and complexity.
To learn more about PaaS (IaaS + SaaS) see here – https://azure.microsoft.com/en-au/overview/what-is-paas/
What this meant is that innovation – in particular in Azure, ISV’s, data services/products, and data related infrastructure – accelerated dramatically and changed the definitions of what now comprised the “Data Platform”.
When I talk to customers today about the "Data Platform" it almost always encompasses a range of data services across a mix of On-Prem, IaaS, PaaS and SaaS.
The decision of which data service to use comes down to simply selecting an existing off-the-shelf cloud service that matches the business requirements. In contrast, for example, in the past it may have been trying to fit an obvious NoSQL use case into a traditional RDBMS platform because that was all you had available to use in your Data Platform.
I now see the “Data Platform” as much broader than ever before and includes many other “non-traditional” data services…
- Relational Database Platform (RDBMS) (ie SQL Server, Azure SQLDB/DW, Oracle, SAP Hana, MySQL, etc)
- NoSQL (ie DocumentDB, MongoDB, Cassandra, etc)
- Big Data Solutions (ie Hadoop, Data Lake, etc)
- Intelligent Data (ie Cognitive Services, Machine Learning, Deep Learning, etc)
- Data Ingestion/Management (ie Event Hub, IoT Hub, Stream Analytics, Polybase, Data Catalog, Data Factory, etc)
Just outside of the periphery of these are additional services such as Bots, and Workflow (ie Logic Apps, Flow, etc) which are not “Data Platform” per se but worth mentioning. Some will probably successfully argue these could and should be part of the Data Platform – ? Time will tell.
One thing is for certain, this makes life pretty darn interesting for us “Data Professionals” right? Should I… Could I… and How Can I… ever keep across such a vast and constantly evolving data space?
What does that mean for today’s “Data Professionals”?
Up to this point generally the “Data Professional” was the SME that had grazed comfortably within the well defined edges of the traditional “Data Platform” space of yesteryear. However as mentioned above, given that those edges are becoming hazy and less well defined, stating that you are a “Data [Platform] Professional” can suddenly be met with a degree of confusion...
“Ok, that’s great, but what part of the data platform do you specialise in?”
I’d been thinking about this for a while – and then happened across these two rather well timed articles which in my view summed this up pretty nicely – so I wont add too much more commentary here!
- The challenge with being a “data professional” – [Eugene Meidinger] – http://www.sqlgene.com/2017/03/30/the-challenge-with-being-a-data-professional/
- What makes a Data Platform Professional – [Kevin Feasel] – https://36chambers.wordpress.com/2017/04/03/what-makes-a-data-platform-professional/
Just to round out my thoughts here – my comment back into the blog was along these lines…
Yes agree for “data professionals” you cannot be an expert in all [data technologies] – but you need to have an awareness of it all, so I kind of think the term “Data Professional” is workable. I agree you need 1-2 SME skills that form your deep knowledge backbone and then continue to review if that skill is relevant/up-to-date. For the rest [of the technical data skills] – get a gauge on broad industry shifts and ensure you understand why its occurring and its impact to your chosen SME skills, and adjust accordingly. Clearly themes are either beginning to form (or already formed) in our data related industry – such as Hyper-Scale IaaS/PaaS Cloud Services (ie Azure/AWS/Google), recognition of “Data as an Asset” and “Data Gravity” (ie Data Lake / Big Data, etc), and a new wave around “Intelligent Data” (ie ML/AI/Bots/Cognitive). Ex. If your SME skill is “SQL Server + Data Warehousing” then be conscious of how cloud PaaS services can help/hurt you, how to integrate DW with DL [Data Lake], and how to prepare data pipelines for downstream ML [Machine Learning] services. Review yearly and be prepared to change quickly!
So… whats next for the Data Platform?
You know its actually really hard to say what’s going to change in say the next 1-3 years – but hey what the hell, I’ll give it a shot!
- Public Cloud (such as Azure) will become the default setting for customers and any new data deployments. Fewer new or existing data workloads will either land and/or stay within the on-prem Data Platform. Some may hold in Private Cloud, others deploy a Hybrid Cloud, but any expanding footprint on-prem will reduce.
- There may be 3 (or possibly 4) Hyper-Scale Public Cloud providers in the market in the next few years with modern Data Platform capability – with potentially a smattering of specialist cloud data service providers (ie those just offering a narrow but very deep range of specialist cloud services unique to them)
- PaaS will continue to grow and take over data services traditionally only available on IaaS (ie any popular or well-regarded data service that only has an IaaS deployment model today, is likely to have a PaaS option tomorrow thus removing the reliance on any infrastructure or management overheads required to run that service. Once this has become the norm, then providers of that service are likely to innovate via feature release to PaaS cloud first, then eventually finding its way to IaaS last)
- More highly complex Intelligent Data services will be obfuscated down to scalable, multi-tenanted publicly available pay-for-use API services (ie this is essentially the democratisation or commoditization of complex machine learning or intelligence services and making these available to anyone anywhere anytime)
- Specialist SaaS providers will move more into the IaaS and PaaS management space and offer up pre-built end-to-end business solutions (including SLA/support) comprising of Data Pipelines or applications built using one or more public clouds (ie many customers have similar needs, so industry providers will build out specific targeted ISV solutions and package this as SaaS. Customers wont know, or potentially even care, where this resides)
So in summary – what this means is that cloud will get bigger fast and will obfuscate away more of the traditional “hands on” deployment and configuration activities that previously “Data Platform Professionals” used to do.
Given this potential industry shift – it is now more important than ever before to continue to elevate yourself further up the data value chain whenever you can. Be aware of the breadth of all data services, but become a deep expert in 2-3 data services.
Nobody has a crystal ball – and so as I always say – do you own research as your results may vary!
Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here