Azure Data Lake Gen 2 Firewall - SQL Data Warehouse

If you are using SQL Data Warehouse with Polybase tables backed by Azure Data Lake Gen 2 without a firewall enabled, you are at risk. There are a few steps to get it work properly. See how in this article.

Azure Data Lake Gen 2 Firewall - SQL Data Warehouse

If you were to ask your resident DBA if it was ok if you hosted your SQL Data Warehouse on the wide open internet, I don't think you'd have to guess what he would say. If you're leveraging Polybase against your Azure Data Lake Gen 2 without a firewall enabled, you're essentially doing the same thing.

The first step to adding the firewall is to head on over to your Azure Storage Account and navigate to the Firewall and virtual Networks link. Ensure you select "Allow Access from: Selected networks". Now you need to select the checkbox that says "Allow trusted Microsoft to access this storage account." Ensure you click the save button to apply your changes.

Trusted Microsoft Services Exception

Azure SQL Server (including Azure SQL Data Warehouse) IS a trusted Microsoft resource, however there are some additional steps you need take in order for this to function properly.

  1. Register your Azure SQL Server with Azure Active Directory. If you have an existing SQL Server, you can use this script (exclude braces{} below):
Connect-AzAccount 
Select-AzSubscription -SubscriptionId {your subscriptionId}
Set-AzSqlServer -ResourceGroupName {your RG Name} -ServerName {your sql server name} -AssignIdentity

2. Have someone with Owner permission apply the "Storage Blob Data Contributor" to the SQL Server that was registered in the previous step.

3. Create the master key for the SQL Data Warehouse if you have not yet already done so:

CREATE MASTER KEY [ENCRYPTION BY PASSWORD = 'your password'];

4. This one is SUPER critical: Create a database scoped credential with the exact identity "Managed Service Identity"

CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = 'Managed Service Identity';

5. Create your external data source using Polybase. Ensure the credential you specify matches the credential above.

CREATE EXTERNAL DATA SOURCE ext_datasource_with_abfss WITH (TYPE = hadoop, LOCATION = 'abfss://yourfile@mystorageaccount.dfs.core.windows.net', CREDENTIAL = msi_cred);

At this point, you should be able to query your Polybase external tables with the firewall enabled!