Azure Data Lake Firewall - Databricks

Securing your Azure Data Lake is a must. In this article learn how to use Databricks VNet Injection to leverage the firewall built into Azure Storage.

Azure Data Lake Firewall - Databricks

We live in interesting times! On the one hand we see an insatiable need for more and more data so we can solve interesting business problems using powerful machine learning algorithms and analytics. Then on the other hand we see a tightening grip on the security measures put in place to protect that same data to avoid a breach causing almost irreparable harm. This a multi-faceted problem, that requires locking down our Azure Data Lakes in a multi-pronged solution.

For the purposes of this article, we're going to review taking a first step by enabling Azure Data Lake's Firewall and Virtual Network as an initial protection layer.

When adding a new instance of the Azure Databricks Service into your resource group, a new "Databricks appliance" is deployed as a new resource group in your subscription. This is a managed resource group that is populated with it's own VNet, security group, and storage account. It uses this managed resource to create new VM's as you spin cluster's up and down.

At first glance you might think just add your Databrick's VNet and Subnet to the Data Lake Storage Account and call it a day. The problem is in order to add these subnets to the Storage Account's Virtual Network, it requires the subnet to have the 'Microsoft.Storage' service endpoint enabled. If you try adding this it will throw the following exception:

Failed to enable service points. The scope {your databricks vnet's resource Id} cannot perform write operation because following scopes are locked: {your managed databricks resource id}. Please remove the lock and try again.

What's happening is it is not possible to modify a managed resource group. The work around for this is to redeploy your Azure Databricks Service into your virtual network.

Azure Databricks - VNet Injection

In order to deploy the Azure Databricks Service into your virtual network (also called VNet Injection) you will need to specify the following parameters during the creation of the service:

  • Virtual Network
  • Public Subnet Name
  • Public Subnet CIDR Range
  • Private Subnet Name
  • Private Subnet CIDR Range
Databricks Virtual Network Integration

This will provision 2 new Subnets (Public / Private) within your virtual network along with a network security group that must be applied to it.

Unfortunately you cannot simply modify your existing Azure Databricks Service and change this. You will need to recreate your Azure Databricks Service.

The good news however is that now that you have a subnet within your control, you can add these two new subnets into your Azure Storage account's (Data Lake) Firewall & Virtual Networks.

Image by MasterTux from Pixabay