-
From Public to Private – Securing Databricks with VNet Injection
Introduction When you create a standard Databricks workspace, it just works — and for many development or test environments, that’s perfectly fine, its something I’ve always used on this blog as I make an example then destroy it. The default setup gives you a managed network, quick deployment, and enough security for non-sensitive workloads (HTTPS…
-
You Can’t Put the Genie Back Into the Bottle – Genie AI/BI
AI services are everywhere now, even (apparently) used in washing machine cycles (SAMSUNG Series 6 AI Energy), whether you like it or not those 2 letters (A.I) are on the lips of every manager you will come across for the foreseeable future, if you’re reading this blog you have no doubt heard many conversations stating…
-
Data Warehousing with Databricks – Part 2 Transforming Data with DBT
Introduction In the previous post (Data Warehousing with Databricks – Part 1 Dynamic Extract and Load – Cookie Codes) I demonstrated how Azure Data Factory can take various distributed data sources and bring them into a centralised location via a metadata driven process. As demonstrated in that post we have our data in a centralised…
-
Find me a Friend – Social Networks and Graph
Introduction In the last post I discussed how certain recommendation algorithms work – with the examples being various ways in which you could code a movie recommendation. I thought what would be interesting is to take another approach to this kind of problem and demonstrate an approach for how a social network identifies people who…
-
Databricks – Connecting to GitHub
Introduction This post will continue to be on the theme of Databricks, the posts so far have shown an introduction to spark / data bricks as a concept and a couple of administrative tasks (creating a service principal / mounting / accessing storage). This post is similar in nature. It will show how you can…
-
Databricks – Mounting Storage and Importing Files
Introduction In my last post (Learning Databricks – Accessing a Data Lake Using a Service Principal – Cookie Codes). I described the process for setting up a Service principal in Azure and how you can store its credentials securely and access them via Databricks. This post is going to continue from this point and use…
-
Learning Databricks – Accessing a Data Lake Using a Service Principal
Introduction One of the first tasks to learn when using Data bricks is how to access data contained in a data lake. I will be using Azure as my cloud provider in these posts and therefore my data lake will be Azure Data Lake (gen2). This post will show the necessary steps that you need…
-
The Data LakeHouse – Introducing Databricks
Introduction One of the goals in making this blog was to allow me to have a space to learn new technologies I haven’t used much in the past and hopefully provide interesting examples to help others learn these technologies. I have never used Databricks before but have heard a lot about it, I simply just…