Introduction
When you create a standard Databricks workspace, it just works — and for many development or test environments, that’s perfectly fine, its something I’ve always used on this blog as I make an example then destroy it. The default setup gives you a managed network, quick deployment, and enough security for non-sensitive workloads (HTTPS and AD Authentication).
However enterprise customers often face a different reality: strict security policies, regulatory requirements, and the need to control every packet of network traffic. That’s where VNet injection comes in. This post will seek to explain the network architecture of Databricks and how you can customise the network security of a Databricks service to meet enterprise policies.
Control Plane and Data Plane – Securing the Back End
In Classic Databricks (let’s not worry about serverless for now), a Databricks service you start actually consists of several sub-components. Two important concepts are the Control Plane and the Data Plane;

On the diagram above you can see the deployed Databricks Service has 2 components, These are known as the “Back-end” in various network documents on Databricks, whereas a user connecting to a Workspace is known as the “Front End”.
- The Control Plane – The Control Plane in Databricks is like air traffic control: it orchestrates, monitors, and directs the data plane, ensuring everything runs safely and efficiently without carrying the data itself, these services are managed by Databricks
- The Data Plane in Databricks is like a plane in the sky: it carries the payload, executes the mission, and follows the control plane’s instructions while reporting its status. This is managed by the company using Databricks and is focused on data operations, compute and storage.
These two pains talk to each other over the public network (the internet) via HTTPS. Enterprise customers will often prefer private only traffic to reduce exposure risks.
How can we Do this? – the answer is to use some additional Azure infrastructure, Virtual networks, subnets and Private Links;

With this approach the Data Plane no longer uses a public network to communicate with the Control Plane, instead using a private link from a virtual network.
Databricks requires 2 (very poorly named) subnets in order for this approach to work.
- The “Public Subnet” (worst name ever) is entirely private but consists of the compute drivers (the hosts or heads of a cluster)
- The “Private Subnet” consists of the worker nodes.
Remember, Databricks operates via distributed compute whereas one Driver can distribute jobs across many worker nodes. One really important thing to consider is when setting up these Subnets you must allocate enough addresses for the number of worker / hosts you will use. over estimating the address range in the subnets is certainly a better approach than underestimating them.
These subnets can then use an Azure private link to connect to the Control Plane – eliminating public network communications.
Creating the VNet
prior to creating the workspace, we first need the virtual network and sub-nets to exist. Log into Azure and create a new Virtual Network. Set up 3 Subnets;

I’ve left the default, but you can see ive created public (256 potential hosts), private (256 potential workers) and private link (64 different connections – think storage accounts, salesforce connections etc..)
for my trivial example – this is fine, for an large company – you would likely want more public and private addresses. Please also note the ranges are different for each of these 10.0.1,10.0.2 and 10.0.3.
Here you can see my created VNet – Named databricks_network_security_example_vnet (catchy…)

Configuring the Workspace
We can now create a Databricks workspace and configure it to use this VNet. Create a new Azure Databricks Service, Name it and the Managed Resource Group Name something appropriate – then click on the networking Tab;
We want Secure Cluster Connectivity and Deploy to a VNet to be selected;

This brings up a large number of options – firstly lets enter our Vnet and Subnet details;

Change the required NSG Rules to “No Azure Databricks Rules” as we are going to use a private link;

leave everything else as it is.. “WHAT? what about this option?” – I hear you say;

This option relates to the “Front End” – a user connecting to a Databricks workspace on a public network via HTTPS and AD Authentication, not public Ip addresses in clusters, or public network Comunication between Data and Control Planes. We will revisit that in a bit….
We now need to configure out Private Endpoint – Add a new private endpoint – pointing to our PrivateLink subnet;

We are going to use a Private DNS Zone – it is this DNS Service will Route the connection from the public (or private) subnet to the private Link Endpoint and then the control Plane. We don’t have an external DNS- so lets use this private one.
For the purpose of this example I don’t care about encryption or updates – so after this I can create the workspace.
Upon the workspace being created you can see the workspace is hosted on a Vnet – with no public Ips allowed;

Lets check the workspace can be launched and we can create a cluster – thereby showing that the data plane and control plane are able to communicate with each other via this private link.
Here we are logged in and Starting a cluster;

We can look in the Databricks Managed Group (where on demand resources are held) and inspect the actual VM itself;

Here you can see there is no public IP shown and the private address is one specified from out subnets, this is a single node cluster so it is just a host (therefore on the “public” subnet (really a terrible name!).
We can also execute a basic query on this cluster and return results – further confirming the communication between the Data Plane and Control Plane is ok;

At this point we have removed public network communication between the Data and Control Planes.
Accessing the Workspace – Securing the Front End
Anyone who knows anything about Network Security (I can assure you, you almost certainly know more than me!) would now be saying – yeah that’s great, but a user logging into this workspace is doing it from a public network, and that’s true.
while we may have locked down the communication internally within the Databricks service, if someone could steal my identity they would be able to download data as say a csv via the GUI having logged in from anywhere, in an enterprise setting this would also need further security.
What we want to do is to configure it so that someone logging in to a workspace is only allowed to do so if they are on our virtual network and not any public access point.
There are a few ways of doing this;
- Using a Virtual Machine that is on out Vnet
- Allowing access to the VNet via a VPN Client (e.g. from my laptop I Could connect to the VNet via an encrypted tunnel (VPN))
for the purposes of this example – I am going to show using a Virtual Machine, which I will use as a Jumpbox – I a user will need to go onto the Virtual Machine to then access the workspace from a browser. The VPN route is also perfectly viable.
Create a VM in our VNet
Here I have set up a VM – on the networking Tab I have selected the VNet we created earlier and ive just used the default subnet;

If I log onto the VM I can access the Databricks workspace (you may need some DNS config) – I can access it from any Public Network location with my correct AD authentication currently;

Block Public Network Access on the Workspace
Lets revisit the workspace in the Azure portal and Change the setting we mentioned earlier “Allow Public Network Access”;

This will de-deploy the workspace, so it may take a few minutes.
If I try now to connect to the workspace I receive an error;

Create Private Endpoints
If I access my Databricks workspace, I can now create a couple of private endpoint connection;
The first one is for Browser authentication;

and is associated to my VNet and the subnet which my VM is using;

We then need to repeat this process with an endpoint to access the control pane (the Databricks ui api);

After these are set up – I can log back onto the VM that is hosted on the VNet;
and I am now able to access Databricks – this is because the VM has a private link to the Databricks workspace, if I try from my laptop (outside the VNet / no private endpoints) – I am blocked.
Here you can see the connection from the VM;

Here you can see the rejection from my laptop over the public network;

We have secured the Front End Access – only a machine on the VNet with a private endpoint can now log into the workspace. If someone stole my identity, password and MFA – they couldn’t access the workspace via a public endpoint.
Conclusion
If you’ve made it this far – well done. In this example I have shown how the backend and front end of Databricks can be secured, as well as explaining a little bit about the classic Databricks architecture.
There are a few more security issues one might need to consider;
- Setting up Private endpoints to Storage accounts that are used (this is very similar – a private endpoint created in the storage account with a dfs sub resource looking at the azure Databricks VNet and subnet)
- Encryption of Data
- Securing Severless compute
What’s Next
I might revisit this issue in the future and talk about how Serverless compute fits in with this architecture and how that can be secured. Its probably worth looking at how this configuration could be automated as IAC (perhaps in terraform). It might also be worth looking at how to use a VPN client instead of a Jumpbox VM to connect to a locked down private front end.
On a security note I should add I have removed all the resources shown in this post.