Azure Firewall is a cloud-native, stateful, managed firewall service with built-in high availability, with incredible automated scaling and throughput capabilities, and many security features. In some cases, though, the price point can be a bit high to get started if they’re being used for basic L3/L4 rules, routing, and NAT capabilities. For this reason, last year the “Basic” Azure Firewall SKU was released.
There are two costs associated with Azure Firewall:
- Firewall base cost (per unit)
- Data Processed
The base cost per firewall unit is as follows (in East US, as of September 2023):
- Premium: $1,277/mo
- Standard: $912/mo
- Basic: $288/mo
As for the data processing costs, they are as follows:
- Premium: $0.016/GB ($16/TB)
- Standard: $0.016/GB ($16/TB)
- Basic: $0.065/GB ($65/TB)
You will note that with the lower base cost, comes a higher processing charge, so don’t get too excited thinking you can just swap production to a Basic SKU and save money, make sure and do your homework.
The features and comparisons of these SKUs is well documented at https://learn.microsoft.com/en-us/azure/firewall/choose-firewall-sku
You can look at a more in-depth list of features and capabilities of the Firewall Basic SKU here https://learn.microsoft.com/en-us/azure/firewall/basic-features.
You will notice that the documentation states that the Firewall Basic SKU says that it’s “recommended for environments with an estimated throughput of 250 Mbps”, this got me thinking though, how much throughput can it actually handle? Understandably this SKU is primarily designed for lower level or lab environments, but what if I need to do a data load or something, is it really limited to 250 Mbps?
To test this theory, I built a little lab environment:
- Hub network with an Azure Firewall Basic SKU
- Two spoke networks with a Route Table that point all traffic through the Hub Firewall
- A Windows Client OS in the “client” vnet
- A Linux server in the “server-vnet” network
- Both VMs are configured as DS4_v4 so they have plenty of CPU and are capable of up to 10Gbps so we can be sure they won’t be a bottleneck in the testing.
Now that I have the lab environment build, I wanted to test the throughput of the firewall in two ways:
- SNAT the traffic through the Firewall to the Internet
- Route the traffic between spokes
Internet Speed Test:
For the first test, I chose to simply use everyone’s favorite at-home test – speedtest.net. To verify that we’re using the firewall as the path to the internet, I also took a screenshot of the public IP associated with it to compare to the speed test.
You will note that the IP address in both the pip and on the speed test are the same, and also that it looks like we’re getting 1Gbps through the Firewall Basic SKU to the internet.
Intranet Speed Test:
The second test is between spoke networks, routed through the hub. For this I configured my Linux server with the OpenSpeedTest docker container, you will notice the private IP in the address bar. This is a very cool project that I also run at home for quick diagnostics and load testing.
In this test, the client OS is traversing the peer to the hub network, then being routed by the Azure Firewall across the peer to the server network; with this test I got roughly 1Gbps as well, with <1 ms latency.
While payloads like speedtest.net are surely not what I would consider a thorough load test, these are the types of results I was looking for in this quick test. If we were testing something for production, we would want to include things like max connections, speed per connection, total bandwidth, scaling metrics, etc.
The throughput available in Azure Firewall is impressive, with the Standard SKU capable of up to 30 Gbps and the Premium up to 100 Gbps (see https://learn.microsoft.com/en-us/azure/firewall/firewall-performance for more information). The Firewall Basic SKU though still holds its own, showing that it’s capable of 1 Gbps throughput which in many cases of lower-level or lab environments is more than enough. While I didn’t create an auto-deploy ARM template for this lab environment, let me know if it would be useful and I’ll put something together on GitHub.
I frequently work with organizations who are migrating data from an on-premises datacenter into Azure. Undoubtably the question will come up “should we use an Azure Data Dox to ship that much data?”, and most of the time the room echoes in a resounding “Yes!”.
I’ve been working in Azure for many years and have seen a lot of data migrations, and while data box is a wonderful service and is yet another way that Microsoft enables and empowers customers to do what’s best for them, it’s just that, an option for what might be the best fit. I write the same thing almost every month to various people, and figured it was time to post it to use as a reference.
Note: My thoughts here are in no way indented to conflict with the official product documentation and are rather a more experience-based thought experiment to accelerate time-to-value in regard to data migration, at the bottom of this post there are links to two great official pieces of documentation that are more technically focused, please give those a read as well.
Most of the time when we think about uploading or downloading data to and from the internet, we think in terms of gigabytes – typically single-digit gigabytes at that. Even what the home-ISPs like to reference as the bandwidth heavy services like movie streaming, will typically use less than 7GB/hr. With that in mind, when we think of the amount of data that is used in an enterprise, we’re typically talking Terabytes or even for those very large organizations – Petabytes. When we talk about migrating that amount of data to a different physical location (for example, Azure) it seems outlandish to think about moving it online – or is it?
Azure Data Box
If you haven’t taken a look at the Azure Data Box Family of offerings, I highly suggest it. There are 4 different offerings of Data Box:
- Data Box Disk: 8 TB SSD for offline transfer
- Data Box: 100 TB appliance for offline transfer
- Data Box Heavy: 1 PB appliance for offline transfer
- Data Box Gateway: A VM appliance storage gateway used for managed online data transfer.
These devices ship to your location for a nominal fee, you load up the data, then ship it back, and Microsoft loads the data into the destination you choose. The idea is that up to a 40Gbps network connection on your local network is going to be much faster than it would be to send this data over the Internet, VPN, or ExpressRoute connection and is a great option.
Offline Transfer Considerations
I challenge everyone to think through this process though when considering an offline migration. Specifically, we need to think about how long it will take to get the process approved (among other factors) to move your company’s data using a shipping carrier. I’ve worked with organizations where the policy for this type of process requires a private courier, active GPS, and someone following the truck along the entire route (I’ve even seen requirements for armed guards or police escort), among many other requirements from various departments within the organization.
Let’s look at the most common components of this process that might influence the timeline of your data migration.
- Privacy & Legal Team Approvals: Depending on the data, privacy and legal may need to be involved to inspect the process for data device handling, determine who has visibility into the data, how it is destroyed upon completion of the ingestion, and potentially even determine insurance implications.
- Security Approval: From a technical controls perspective they will want to make sure proper encryption is used at the data level and hardware level, determine who controls the keys for encryption, ensure device attestation, and even certify these devices to be plugged into the datacenter based on the controls in place for certain hardware vendors.
- Ordering & Shipping: The process of receiving your Data Box takes up to 10 business days, depending on availability and other factors.
- Loading the Data: There are two points that are important here, the first is how fast can the data be retrieved (e.g., is the data passed through a source that only has a 1 Gb link, are there disk throughput limitations, do you need to limit the transfer rate to not impact other workloads, etc.). The second point to consider is write throughput on the Data Box itself, while there is ample network connectivity with each device, the larger devices are designed for capacity rather than performance and while there is good throughput, they are not designed for high I/O which is important for datasets with smaller file sizes.
- Shipping to Microsoft: Standard shipping time applies to shipping the device back to Microsoft, typically a few days.
- Microsoft transferring the data: After the device is received it is inspected for damage, then setup to copy the data to the destination you selected when you requested the Data Box – this could be a few hours to a few days depending on availability, data size, I/O size, and both the type of Data Box itself and the target storage location.
(Time to Legal Approval) + (Time to Privacy Approval) + (Time to Security Approval) + (Ordering & Shipping Time) + (Time to Load the Data) + (Shipping Time) + (Time to Unload the Data)
When thinking about these lead times it’s important to be honest with yourself. How long after you send the email, or meeting invite, will it take to get full approval from Legal, Security, and Privacy? In most cases, this is typically a few weeks and depends on the organizational processes and sensitivity of the data, sometimes up to a few months.
For example, let’s say it takes 1 month for full approval to ship the data, which is certainly a reasonable timeframe. Let’s also assume it takes 2 days to get the Data Box hooked up in the datacenter, and that you’re copying 50 TB at 5 Gbps over the LAN. With a generalized timeline, this operation would roughly look like the following:
Example: 50 TB, 5 Gbps LAN Offline Transfer with Data Box
1 Month for approval + 8 Days for shipping + 2 Days for setup + 2 Days for data copy (~26 hr. for actual data movement) + 2 days to prep for shipping + 3 days for shipping + 1 day for receiving + 1 day for copying data (likely less)
30 days + 8 days + 2 days + 2 days + 2 days + 3 days + 1 day + 1 day = ~49 days
Now let’s assume that same data was copied “online” (Internet, ExpressRoute, VPN, etc.) at even just 100 Mbps averaged across the day. In most cases organizations would be able to leverage more bandwidth than this, but it makes for easy calculations. If you copied 50 TB online, at 100Mbps, it would take ~53.5 days. In this scenario the time to copy the data online vs offline is very close, and without any of the fuss of approvals and shipping. If you assume you can use 125 Mbps of bandwidth you’re looking at ~42.5 days which is even faster than the offline mode.
At this point I’m sure there are a few people saying “yes, but what if I had a LOT of data, say 1 PB!”. I’ve done many multi-PB data migrations to Azure and have seen them go both online and offline, let’s do the calculation and see how it looks. While it may not be the case for everyone, in my experience with the increase of the dataset size comes longer approval lead times for various reasons. Additionally, these types of organizations typically also have more bandwidth capacity – again, these are generalized numbers, but in my personal experience they are realistic.
NOTE: Data Box Heavy requires a QSFP+ compatible cable, which I find is not as common in most datacenters, make sure you have one on-hand prior to receiving the device.
For this calculation let’s assume 2 PB of data that can be copied on the LAN at 10 Gbps. Keep in mind that if there was actually 2PB of data you’d need 3 Data Boxes because you get 770 TB of usable space after overhead per Data Box Heavy. Take note though, that I’m not taking the multiple Data boxes into account in the calculation, which would realistically extend the timeline.
Example: 2 PB, 10 Gbps LAN Offline Transfer with Data Box Heavy
2.5 Months for approval + 8 Days for shipping + 2 Days for setup + 22 Days for data copy + 2 days to prep for shipping + 3 days for shipping + 1 day for receiving + 4 days for copying data
75 days + 8 days + 2 days + 22 days + 2 days + 3 days + 1 day + 4 days = ~117 days (~3.9 months)
Like I said earlier, typically if an organization has this much data they have much more bandwidth – 2 Gbps for this operation would not be unreasonable to assume as a generalization. Given 2 Gbps bandwidth, it would take ~107 days to copy this data online compared to ~117 days copying it offline.
However, I will say that I’ve been in situations where an organization had other limitations such as the total available capacity on a firewall or edge router, and they would have to upgrade at significant expense to be able to handle an extra 2 Gbps so they could only do something like 250 Mbps. At that speed it would take 874 days to copy and at those speeds with that much data it certainly does not make sense to move the data online, and using a Data Box would be much more efficient to copy the data offline.
NOTE: Data Box will not ship across international borders (except countries within the European Union), please see the FAQ reference link if that is a requirement for your data transfer.
If you are going to copy the data online, there are various ways to accomplish this task. In general, I see AzCopy, Azure Data Factory, Azure Data Box Gateway, or depending on the target storage location any number of other tools used for online data movement.
There are some considerations when choosing your tooling such as cost (of the tool only, ingress bandwidth to Azure is free), performance, manageability and whether there is data churn that needs to continuously be uploaded after the initial import. Keep in mind that you can also control your bandwidth with online copies and for example use less bandwidth during business hours and more at night, and some of these tools will help facilitate that for you.
I won’t go into depth on this decision process but let me know if I should write another blog on that topic.
The two reference links below have wonderful information about choosing a data transfer solution, and as noted earlier I HIGHLY suggest reviewing them as well. The purpose of this blog was to talk about some of the processes and procedures that’s typically not addressed when looking purely at the technology.
- Choose an Azure solution for data transfer
- Data transfer for large datasets with moderate to high network bandwidth
I hope going through these scenarios was helpful when considering methods for data transfer into Azure. My goal here was not to go in depth on anything in particular, but more think through the process. As takeaways, here are a few points to keep in mind about transferring large amounts of data into Azure.
- Be honest with yourself about approval timelines for shipping your company’s (and/or customer’s) data.
- Use a file transfer calculator to see how long it would actually take to transfer X data at Y speeds – it’s probably not as long as you think.
- For good reason, there will likely be a lot of meetings, documentation, email threads, and other time-consuming activities for shipping data physically – and that should also count for something in terms of cost.
- There will likely also be some of the aforementioned procedural work for online data migration, but in most cases not nearly as much.
- Online is not always going to work out, sometimes Data Box is going to be the best fit.
Recently, I posted the “Shared Storage Options in Azure: Part 1 – Azure Shared Disks” blog post, the first in the 5-part series. Today I’m posting Part 2 – IaaS Storage Server. While this post will be fairly rudimentary insofar as Azure technical complexity, this is most certainly an option when considering shared storage options in Azure and one that is still fairly common with a number of configuration options. In this scenario, we will be looking at using a dedicated Virtual Machine to provide shared storage through various methods. As I write subsequent posts in this series, I will update this post with the links to each of them.
- Part 1: Azure Shared Disks
- Part 2: IaaS Storage Server
- Part 3: Azure Storage Services
- Part 4: Azure NetApp Files
- Part 5: Conclusion
Virtual Machine Configuration Options:
While it may not seem vitally important, the VM SKU you choose can impact your ability to provide storage capabilities in areas such as Disk Type, Capacity, IOPS, or Network Throughput. You can view the list of VM SKUs available on Azure at this link. As an example, I’ve clicked into the General Purpose, Dv3/Dvs3 series and you can see there are two tables that show upper limits of the SKUs in that family.
In the limits for each VM you can see there are differences between Max Cached and Temp Storage Throughput, Max Burst Cached and Temp Storage Throughput, Max uncached Disk Throughput, and Max Burst uncached Disk Throughput. All of these represent very different I/O patterns, so make sure to look carefully at the numbers.
Below are a few links to read more on disk caching and bursting:
- Disk Caching: https://docs.microsoft.com/en-us/azure/virtual-machines/premium-storage-performance#disk-caching
- Disk Bursting: Managed disk bursting – Azure Virtual Machines | Microsoft Docs
You’ll notice when you look at VM SKUs that there is an L-Series which is “storage optimized”. This may not always be the best fit for your workload, but it does have some amazing capabilities. The outstanding feature of the L-Series VMs are the locally mapped NVMe drives which as of the time of writing this post on the L80s_v2 SKU can offer 19.2TB of storage at 3.8 Million IOPS / 20,000 MBPS.
The benefits of these VMs are the extremely low latency, and high throughput local storage, but the caveat to that specific NVMe storage is that it is ephemeral. Data on those disks does not persist a reboot. This means it’s incredibly good at serving from a local cache, tempdb files, etc. though its not storage that you can use for things like a File Server backend (without some fancy start-up scripts, please don’t do this…). You will note that the maximum uncached throughput is 80,000 IOPS / 2,000 MBPS for the VM, which is the same as all of the other high spec VMs. As I am writing this, no Azure VM allows for more than that for uncached throughput – this includes Ultra Disks (more on that later).
For more information on the LSv2 series, you can read more here: Lsv2-series – Azure Virtual Machines | Microsoft Docs
Additional Links on Azure VM Storage Design:
- Azure Premium Storage: Design for high performance – Azure Virtual Machines | Microsoft Docs
- Virtual machine and disk performance – Linux – Azure Virtual Machines | Microsoft Docs
Networking capabilities of the Virtual Machine are also important design decisions when considering shared storage, both in total throughput and latency. You’ll notice in the VM SKU charts I posted above when talking about performance there are two sections for networking, Max NICs and Expected network bandwidth Mbps. It’s important to know that these are VM SKU limitations, which may influence your design.
Expected network bandwidth is pretty straight forward, but I want to clarify that the number of Network Interfaces you mount to a VM does not change this number. For example, if your expected network bandwidth is 3200 Mbps and you have an SMB share running on that single NIC, adding a second NIC and using SMB multi-channel WILL NOT increase the total bandwidth for the VM. In that case you could expect each NIC to potentially run at 1,600 Mbps.
The last networking feature to take into consideration is Accelerated Networking. This feature allows for SR-IOV (Single Root I/O Virtualization), which by bypassing the host CPU and offloading the network traffic directly to the Network Interface can dramatically increase performance by reducing latency, jitter, and CPU utilization.
Accelerated Networking is not available on every VM though, which makes it an important design decision. It’s available on most General Purpose VMs now, but make sure to check the list of supported instance types. If you’re running a Linux VM, you’ll also need to make sure it’s a supported distribution for Accelerated Networking.
In an obvious step, the next design decision is the storage that you attach to your VM. There are two major decision types when selecting disks for you VM – disk type, and disk size.
Image Reference: https://docs.microsoft.com/en-us/azure/virtual-machines/disks-types
As the table above shows, there are three types of Managed Disks (https://docs.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview ) in Azure. At the time of writing this, Premium/Standard SSD and Standard HDD all have a limit of 32TB per disk. The performance characteristics are very different, but I also want to point out the difference in the pricing model because I see folks make this mistake very often.
|Disk Type:||Capacity Cost:||Transaction Cost:|
|Ultra SSD||Highest (Capacity/Throughput)||None|
Transaction costs can be important on a machine whose sole purpose is to function as a storage server. Make sure you look into this before a passing glance shows the price of a Standard SSD lower than a Premium SSD. For example, here is the Azure Calculator output of a 1 TB disk across all four types that averages 10 IOPS * ((10*60*60*24*30)/10,000) = 2,592 transaction units.
Sample Standard Disk Pricing:
Sample Standard SSD Pricing:
Sample Premium SSD Pricing:
Sample Ultra Disk Pricing:
The above example is just an example, but you get the idea. Pricing gets strange around Ultra Disk due to the ability to configure performance (more on that later). Though there is a calculable break-even point for disks that have transaction costs versus those that have a higher provisioned cost.
For example, if you run an E30 (1024 GB) Standard SSD at full throttle (500 IOPS) the monthly cost will be ~$336, compared to ~$135 for a P30 (1024 GB) Premium SSD, with which you get x10 the performance. The second design decision is disk capacity. While this seems like a no-brainer (provision the capacity needed, right?) it’s important to remember that with Managed Disks in Azure, the performance scales with, and is tied to, the capacity of the disk.
You’ll note in the above image the Disk Size scales proportionally with both the Provisioned IOPS and Provisioned Throughput. This is to say that if you need more performance out of your disk, you scale it up and add capacity.
The last note on capacity is this, if you need more than 32TB of storage on a single VM, you simply add another disk and use your mechanism for combining that storage (Storage Spaces, RAID, etc.). This same method can be used to further tweak your total IOPS, but make sure you take into consideration the cost of each disk, capacity, and performance before doing this – most often it’s an insignificant cost to simply scale-up to the next size disk. Last but not least, I want to briefly talk about Ultra Disks – these things are amazing!
Unlike with the other disk types, this configuration allows you to select your disk size and performance (IOPS AND Throughput) independently! I recently worked on a design where the customer needed 60,000 IOPS, but only needed a few TB of capacity, this is the perfect scenario for Ultra Disks. They were actually able to get more performance, for less cost compared to using Premium SSDs.
To conclude this section, I want to note two design constraints when selecting disks for your VM.
- The VM SKU is still limited to a certain number of IOPS, Throughput and Disk Count. Adding together the total performance of your disks, cannot exceed the maximum performance of the VM. If the VM SKU supports 10,000 IOPS and you add 3x 60,000 IOPS Ultra Disks, you will be charged for all three of those Ultra Disks at their provisioned performance tiers but will only be able to get 10,000 IOPS out of the VM.
- All of the hardware performance may still be subject to the performance of the access protocol or configuration, more on this in the next section.
Additional Reading on Storage:
- Disk Types: Select a disk type for Azure IaaS VMs – managed disks – Azure Virtual Machines | Microsoft Docs
Software Configuration and Access Protocols:
As we come to the last section of this post, we get to the area that aligns with the purpose of this blog series – shared storage. In this section I’m going to cover some of the most common configurations and access types for shared storage in IaaS. This is by no means an exhaustive list, rather what I find most common.
Scale-Out File Server (SoFS):
First up is Sale-Out File Server, this is a software configuration inside Windows Server that is typically used with SMB shares. SoFS was introduced in Windows 2012, uses Windows Failover Clustering, and is considered a “converged” storage deployment. It’s also worth noting that this can run on S2S (Storage Space Direct), which is the method I recommend using with modern Windows Server Operating Systems. Scale-Out File Server is designed to provide scale-out file shares that are continuously available for file-based server application storage. It provides the ability to share the same folder from multiple nodes of the same cluster. It can be deployed in two configuration options, for Application Data or General Purpose. See the additional reading below for the documentation on setup guidance.
- Storage Spaces Direct: Storage Spaces Direct overview | Microsoft Docs
- Scale-Out File Server: Scale-Out File Server for application data overview | Microsoft Docs
- Setup guide for 2-node SSD for RDS UPD: Deploy a two-node Storage Spaces Direct SOFS for UPD storage in Azure | Microsoft Docs
Now into the access protocols – SMB has been the go-to file services protocol on Windows for quite some time now. In modern Operating Systems, SMB v3.* is an absolutely phenomenal protocol. It allows for incredible performance using things like SMB Direct (RDMA), Increasing MTU, and SMB Multichannel which can use multiple NICs simultaneously for the same file transfer to increase throughput. It also has a list of security mechanisms such as Pre-Auth Integrity, AES Encryption, Request Signing, etc. There is more information on the SMB v3 protocol below, if you’re interested, or still think of SMB in the way we did 20 years ago – check it out. The Microsoft SQL Server team even supports SQL hosting databases on remote SMB v3 shares.
NFS has been a similar staple as a file server protocol for a long while also, and whether you’re running Windows or Linux can be used in your Azure IaaS VM for shared storage. For organizations that prefer an IaaS route compared to PaaS, I’ve seen many use this as a cornerstone configuration for their Azure Deployments. Additionally, a number of HPC (High Performance Compute) workloads, such as Azure CycleCloud (HPC orchestration) or the popular Genomics Workflow Management System, Cromwell on Azure prefer the use of NFS.
- Create NFS Ubuntu Linux Server volume – Azure Kubernetes Service | Microsoft Docs
- azure-quickstart-templates/nfs-ha-cluster-ubuntu at master · Azure/azure-quickstart-templates (github.com)
While I would not recommend the use of custom block storage on top of a VM in Azure if you have a choice, some applications do still have this requirement in which case iSCSI is also an option for shared storage in Azure.
That’s it! We’ve reached the end of Part 2. Okay, here we go with the Pros and Cons for using an IaaS Virtual Machine for your shared storage configuration on Azure.
Pros and Cons:
- More control, greater flexibility of protocols and configuration.
- Depending on the use case, potentially greater performance at a lower cost (becoming more and more unlikely).
- Ability to migrate workloads as-is and use existing storage configurations.
- Ability to use older, or more “traditional” protocols and configurations.
- Allows for the use of Shared Disks.
- Significantly more management overhead as compared to PaaS.
- More complex configurations, and cost calculations compared to PaaS.
- Higher potential for operational failure with the higher number of components.
- Broader attack surface, and more security responsibilities.
Alright, that’s it for Part 2 of this blog series – Shared Storage on IaaS Virtual Machines. Please reach out to me in the comments, on LinkedIn, or Twitter with any questions about this post, the series, or anything else!
- Part 1: Azure Shared Disks
- Part 2: IaaS Storage Server
- Part 3: Azure Storage Services
- Part 4: Azure NetApp Files
- Part 5: Conclusion
In the past, I’ve written a few blog posts about setting up different types of VPNs with Azure.
- Azure Point-to-Site VPN with RADIUS Authentication « The Tech L33T
- Azure Web Apps with Cost Effective, Private and Hybrid Connectivity « The Tech L33T
- Azure Site-to-Site VPN with PFSense « The Tech L33T
Since the market is now full of customers who are running Palo Alto Firewalls, today I want to blog on how to setup a Site-to-Site (S2S) IPSec VPN to Azure from an on-premises Palo Alto Firewall. For the content in this post I’m running PAN-OS 10.0.0.1 on a VM-50 in Hyper-V, but the tunnel configuration will be more or less the same across deployment types (though if it changes in a newer version of PAN-OS let me know in the comments and I’ll update the post).
Alright, let’s jump into it! The first thing we need to do is setup the Azure side of things, which means starting with a virtual network (vnet). A virtual network is a regional networking concept in Azure, which means it cannot span multiple regions. I’m going to use “East US” below, but you can use whichever region makes the most sense to your business since the core networking capabilities shown below are available in all Azure regions.
With this configuration I’m going to use 10.0.0.0/16 as the overall address space in the Virtual Network, I’m also going to configure two subnets. The “hub” subnet is where I will host any resources. In my case, I’ll be hosting a server there to test connectivity across the tunnel. The “GatewaySubnet” is actually a required name for a subnet that will later house our Virtual Network Gateway (PaaS VPN Appliance). This subnet could be created later in the portal interface for the Virtual Network (I used this method in my PFSense VPN blog post), but I’m creating it ahead of time. Note that this subnet is name and case sensitive. The gateway subnet does not need a full /24, (requirements for the subnet here), it will do for my quick demo environment.
Now that we have the Virtual Network deployed, we need to create the Virtual Network Gateway. You’ll notice that once we choose to deploy it in the “vpn-vnet” network that we created, it will automatically recognize the “GatewaySubnet” and will deploy into that subnet. Here we will choose a VPN Gateway type, and since I’ll be using a route-based VPN, select that configuration option. I won’t be using BGP or an active-active configuration in this environment so I’ll leave those disabled. Validate, and create the VPN Gateway which will serve as the VPN appliance in Azure. This deployment typically takes 20-30 minutes so go crab a cup of coffee and check those dreaded emails.
Alright, now that the Virtual Network Gateway is created we want to create “connection” to configure the settings needed on the Azure side for the site-to-site VPN.
Here we’ll name the connection, set the connection type to “Site-to-Site (IPSec)”, set a PSK (please don’t use “SuperSecretPassword123″…) and set the IKE Protocol to IKEv2. You’ll notice that you need to set a Local Network Gateway, we’ll do that next.
Let’s go configure a new Local Network Gateway, the LNG is a resource object that represents the on-premises side of the tunnel. You’ll need the public IP of the Palo Alto firewall (or otherwise NAT device), as well as the local network that you want to advertise across the tunnel to Azure.
Once that’s complete we can finish creating the connection, and see that it now shows up as a site-to-site connection on the Virtual Network Gateway, but since the other side isn’t yet setup the status is unknown. If you go to the “Overview” tab, you’ll notice it has the IP of the LNG you created as well as the public IP of the Virtual Network Gateway – you will want to copy this down as you’ll need it when you setup the IPSec tunnel on the Palo Alto.
Alright, things are just about done now on the Azure side. The last thing I want to do is kick off the deployment of a VM in the “hub” subnet that we can use to test the functionality of the tunnel. I’m going to deploy a cheap B1s Ubuntu VM. It doesn’t need a public IP and a basic Network Security Group (NSG) will do since there is a default rule that allows all from inside the Virtual Network (traffic sourced from the Virtual Network Gateway included).
Now that the test VM is deploying, let’s go deploy the Palo Alto side of the tunnel. The first thing you’ll need to do is create a Tunnel Interface (Network –> Interfaces –> Tunnel –> New). In accordance with best practices, I created a new Security Zone specifically for Azure and assigned that tunnel interface. You’ll note that it will deploy a sub interface that we’ll be referencing later. I’m just using the default virtual router for this lab, but you should use whatever makes sense in your environment.
Next we need to create an IKE Gateway. Since we set the Azure VNG to use IKEv2, we can use that setting here also. You want to select the interface that is publicly-facing to attach the IKE Gateway, in my case it is ethernet 1/2 but your configuration may vary. Typically you’ll have the IP address of the interface as an object and you can select that in the box below, but in my case my WAN interface is using DHCP from my ISP so I leave it as “none”.
It is important to point out though, that if your Palo Alto doesn’t have a public IP and is behind some other sort of device providing NAT, you’ll want to use the uplink interface and select the “local IP address” private IP object of that interface. I suspect this is an unlikely scenario, but I’ll call it out just in case.
The peer address is the public IP address of the Virtual Network Gateway of which we took note a few steps prior, and the PSK is whatever we set on the connection in Azure. Lastly, make sure the Liveness Check is enabled on the Advanced Options Screen.
Next we need an IPSec Crypto Profile. AES-256-CBC is a supported algorithm for Azure Virtual Network Gateways, so we’ll use that along with sha1 auth and set the lifetime to 8400 seconds which is longer than lifetime of the Azure VNG so it will be the one renewing the keys.
Now we put it all together, create a new IPSec Tunnel and use the tunnel interface we created, along with the IKE Gateway and IPSec Crypto Profile.
Now that the tunnel is created, we need to make appropriate configurations to allow for routing across the tunnel. Since I’m not using dynamic routing in this environment, I’ll go in and add a static route to the virtual router I’m using to advertise the address space we created in Azure to send out the tunnel interface.
Great! Now at this point I went ahead and grabbed the IP of the Ubuntu VM I created earlier (which was 10.0.1.4) and did a ping test. Unfortunately they all failed, what’s missing?
Yes yes, I did commit the changes (which always seems to get me) but after looking at the traffic logs I can see the deny action taking place on the default interzone security policy. Yes I could have not mentioned this, but hey, now if it doesn’t work perfectly for the first time for you – you can be assured you’re in good company.
Alright, if you recall we created the tunnel interface in its own Security Zone so I’ll need to create a Security Policy from my Internal Zone to the Azure Zone. You can use whatever profiles you need here, I’m just going to completely open interzone communication between the two for my lab environment. If you want machines in Azure to be able to initiate connections as well remember you’ll need to modify the rule to allow traffic in that direction as well.
Here we go, now I should have everything in order. Let’s go kick off another ping test and check a few things to make sure that the tunnel came up and shows connected on both sides of things. It looks like the new Allow Azure Security Policy is working, and I see my ping application traffic passing!
Before I go pull up the Windows Terminal screen I want to quickly check the tunnel status on both sides.
Success!!! Before I call it, I want to try a two more things so I’ll SSH into the Ubuntu VM, install Apache, edit the default web page and open it in a local browser.
At this point I do want to call out the troubleshooting capabilities for Azure VPN Gateway. There is a “VPN Troubleshoot” functionality that’s a part of Azure Network Watcher that’s built into the view of the VPN Gateway. You can select the gateway on which you’d like to run diagnostics, select a storage account where it will store the sampled data, and let it run. If there are any issues with the connection this will list them out for you. It will also list some specifics of the connection itself so if you want to dig into those you can go look at the files written to the blob storage account after the troubleshooting action is complete to get information like packets, bytes, current bandwidth, peak bandwidth, last connected time, and CPU utilization of the gateway. For further troubleshooting tips you can also visit the documentation on troubleshooting site-to-site VPNs with Azure VPN Gateways.
That’s it, all done! The site-to-site VPN is all setup. The VPN Gateway in Azure makes the process very easy and the Palo Alto side isn’t too bad either once you know what’s needed for the configuration.
If you have any questions, comments, or suggestions for future blog posts please feel free to comment blow, or reach out on LinkedIn or Twitter. I hope I’ve made your day a little bit easier!
Reading Time: 5 minutes(Edit: I’ve also now posted about how to do this with a Palo Alto Firewall as well, you can see that post here: https://hansencloud.com/2020/11/18/azure-site-to-site-vpn-with-palo-alto-firewall/ )
If you’re like me, you like to have a little bit more control over your network (home or business) than is available with the ISP-provided router – enter PFSense. The Netgate Appliances work very well and I’ve worked with plenty of home networks, as well as small and medium businesses that have used them as their cost-effective Router/Firewall/VPN appliance combination.
Now, how do we setup a Site-to-Site VPN with our infrastructure hosted in Azure with PFSense. It’s actually pretty easy! Let’s jump into it.
The first thing we need to do in Azure is setup a virtual network (or vnet). A virtual network is a regional construct, meaning that it cannot span multiple regions. I’m going to choose the “West US” region for now since that’s where i’ll be building my resources after this is configured.
In this particular setup I’m using 10.2.0.0/16 as my virtual network address space and have designated two subnets for a later project.
After the virtual network is configured, we’ll need to create a “Gateway Subnet” which is a specific prerequisite to deploying a VPN Gateway into the virtual network.
Alright, the network and subnets are all setup in Azure. Now let’s go create a Virtual Network Gateway to act as our PaaS VPN appliance. I’m going to be using a route-based VPN, so I’ll use that VPN type and choose the virtual network that we just created. You’ll notice that it also selects the special GatewaySubnet that was created for the VPN Gateway. I’m going to create a new public IP address to be associated with the VPN Gateway, and won’t be using active-active or BGP for this lab so I’ll leave those disabled.
The VPN Gateway takes about 20-30 minutes to deploy so go ahead and go stretch and grab a cup of coffee. Once it’s done we’ll go in and create a “connection”, which we’ll designate as “site-to-site IPSec”, then set the PSK that will be used here and on the on-premises appliance.
You’ll need to also create a local network gateway which is an Azure resource that represents the information about the on-premises VPN appliance and IP ranges. You’ll want to enter the public IP of the PFSense appliance on-premises and whatever address space you’d like to add to the route table of the virtual network so it knows to route those addresses across the VPN through the gateway.
Once that’s done we’ll go grab the public IP of the VPN Gateway from the overview page so we can go setup the PFSense side of the VPN.
Alright, now let’s go setup an IPSec VPN in PFSense. Open the IPSec VPN settings page and let’s create a Phase 1 configuration.
I will want to select the Authentication Method of Mutual PSK and enter the PSK we setup on the Connection on the VPN Gateway in the “Pre-Shared Key” field. I’m going to be connecting to some other resources with this setting so I am using both AES 128 and 256 with relative SHA 256 hashes and both DH groups 2 and 14. I recommend reviewing the documentation on cryptographic requirements and Azure VPN gateways though for reference.
Next let’s create a Phase 2 configuration for the IPSec VPN. I’m designating 10.0.0.0/8 to the VPN assuming that I may expand my Azure environment at some later point. If we wanted to be exact this would be the 10.2.0.0/16 address space that we configured in our virtual network. For the Encryption Algorithms required, please see the cryptographic requirements documentation noted above. If you wanted to automatically ping a host in Azure to bring up/keep up the tunnel you can configure that here as well.
Now that we’ve created both the Phase 1 and Phase 2 configurations we can “apply changes” to add those changes to the running configuration.
After the settings have saved the tunnel will take a minute to come up, you may take this time to spin up a quick VM in your Azure virtual network to use for testing connectivity. Once that tunnel comes up you can see the connection statistics on the IPSec Status page. Similarly if you look at the overview page of the site-to-site connection we created on the VPN Gateway on Azure you can see the tunnel status and connection statistics.
For troubleshooting purposes, there is a “VPN Troubleshoot” functionality that’s a part of Azure Network Watcher that’s built into the view of the VPN Gateway. You can select the gateway on which you’d like to run diagnostics, select a storage account where it will store the sampled data, and let it run. If there are any issues with the connection this will list them out for you. It will also list some specifics of the connection itself so if you want to dig into those you can go look at the files written to the blob storage account after the troubleshooting action is complete to get information like packets, bytes, current bandwidth, peak bandwidth, last connected time, and CPU utilization of the gateway. For further troubleshooting tips you can also visit the documentation on troubleshooting site-to-site VPNs with Azure VPN Gateways.
That’s it, all done! The site-to-site VPN is all setup. The VPN gateway in Azure really makes this process very easy, and the PFSense side is fairly easy to setup as well.
If you have any questions or suggestions for future blog posts feel free to comment below, or reach out to me via email, twitter, or LinkedIn. I hope I’ve made your day at least a little bit easier!
Reading Time: 4 minutesFor the money, it’s hard to beat the Azure VPN Gateway. Until recently though, Point-to-Site VPNs were a bit clunky because they needed mutual certificate authentication. It wasn’t bad, but it certainly wasn’t good. Thankfully, Microsoft now allows RADIUS backed authentication. This post is how you impliment said configuration.
To start off, here is my environment information I’m using to setup this configuration.
Virtual Network: “raidus-vnet”
Virtual Network Address Space: 10.1.0.0/24
Virtual Network VM Subnet: 10.1.0.0/28
Virtual Network Gateway Subnet: 10.1.0.16/28
VPN Gateway SKU: VpnGW1
VPN Client Address Pool: 172.28.10.0/24
Domain Controler/NPS Server Static IP: 10.1.0.10
Virtual Network (VNET) Setup:
You most likely already have a VNET where you will be configuring this setup, but if you don’t you need to create one with two subnets. One subnet for infrastructure, and one “Gateway Subnet”. The Gateway Subnet will be used automatically, and is required, when you configure the VPN Gateway.
VPN Gateway Setup:
The Azure VPN Gateway is just about as easy as it gets to configure and to managed (sometimes to a fault). The only caveat you need to be aware of in this scenerio, is that RADIUS Point-to-Site authentication is only available on the SKU “VPNGW1” and above. You’ll then need to choose the vnet where you have created the VPN Gateway, and create a Public IP Address resource.
This will take anywhere from 20-45 minutes to provision, as noted. While that’s running, you can provision your NPS (Network Policy Server) VM. This being a test environment, I provisioned a VM to be both the domain controler and the NPS box. Make sure to set a static IP on the NPS box’s NIC in Azure, you’ll need a static for your VPN configuration. I used 10.1.0.10.
After complete, you will need to configure the VPN Gateway’s Point-to-Site configuration. Choose “RADIUS authentication”, enter in the static IP of the will-be NPS server, and set a Server Secret. This being a test environment, my password is obviously not as secure as I hope yours would be.
Now, go back into that VM that was created earlier and install the NPS role.
After it’s installed, you need to create a Network Policy with a condiational access clause (I used a group in AD) and tell it what security type you want to allow.
Next, you’ll need to create a Client Access Policy. Here you need the IP of the VPN Gateway you created, and the shared secret. Here is the interesting bit, you can’t view the IP of your VPN Gateway in the Gateway Subnet. If you looked at “Connected Devices” in the VNET the VPN Gateway doesn’t show any IP. I know that the VPN Gateway is deployed (behind the scenes) as an H/A pair, but I would assume they’re using a floating IP that they could surface. Anyways, there is no real way to find it – but it looks like (after testing with a dozen different deployments) it uses the 4th available IP in the subnet. This subnet being a 10.1.0.16/28, the 4th IP is 10.1.0.21. This is the IP that goes in the address of the RADIUS Client.
Next, I configure NPS Accounting. You don’t have to do this, but I think it helps for the sake of connection logging and for troubleshooting. You can log a few different ways, and choose here just to use a text file to a subfolder I created called “AzureVPN”.
Generate VPN Client Package:
Now that everything is set, you need to generate a VPN Client Package to distribute to your users.
After it is installed, you can see the VPN Connection in the VPN list and users can logon using their domain credentials.
After logging in, we can go back and look at the accounting log which shows us the successfull authentication of that user.
There we go, connecting to an Azure VPN Gateway with RADIUS authentication using domain credentials. I think we can all thank Microsoft for this one, and not having to do cert management anymore.
I hope I’ve made your day at least a little bit easier.
Reading Time: 2 minutesThe concept of a full-proxy architecture, along with SSL Bridging has seemed to confuse a good majority of people to whom I’ve attempted to explain. In that light, here we go. I could write a long drawn-out explanation of this process (and will, if requested) but most folks reading this want a quick answer. Let’s proceed.
A few things to note:
- “Full Proxy Architecture”, this means that clients or servers on either side of the F5 never talk to each other. The client thinks the F5’s endpoint (iApp) is the server, and the server thinks the F5 is the client. They never talk to each other.
- “SSL Bridging”, this means Client -> F5 is encrypted, then decrypted for processing, then re-encrypted, and F5 -> server is encrypted.
- “F5” is actually a company name, this products have many other names, such as F5 BIG-IP LTM ADC.
- It is a networking device, not a server, you can’t RDP to it like some people have assumed (although you can SSH into the management system and the TMSH data plane).
There is typically some confusion around what certs are on what box and whether or not they match. If they use the F5, the answer is – it doesn’t matter. They ONLY need to care about, and trust the cert that’s applied by the SSL Bridging profile attached the iApp that corresponds with the endpoint for that app. In the example I’ve drawn below (thanks to a fancy bright-link board) I show that the source client (which can be a server if you want), the F5, and the destination server all have different certs. Though, again all that matters to the anyone besides the F5 is the cert that the F5 uses. Note that the steps are numbered in green.
I hope this makes your day at least a little bit easier.
Server 2012 R2 “does not have a network adapter available to create a virtual switch” when configuring VDI
Reading Time: 2 minutesI recently ran into this issue when doing an all-in-one VDI install, on top of a server that had been used for other things in the past. The “quick start” VDI option is supposed to essentially do everything for you, but I ran into this issue.
“The Server does not have a network adapter available to create a virtual switch”
Taking a quick look here, I do have a vSwitch. Why is it complaining?
It turns out that the installer isn’t actually complaining about the fact that there is no vSwitch, it’s complaining that there IS a vSwitch. It needs it to be a “blank slate” so it can manage it and do it’s thing. I’m not a fan of this, because I intend to manage my VDI environment using SCVMM, and the VDI component itself won’t have a whole lot to say about it. Nevertheless, this is how you get past this. Delete any vSwitches.
All gone, now try the installer again.
There we go, now we’re off onto the next step without any errors. Have fun!
I hope I’ve made your day, at least a little bit easier.