Azure Overlapping Network Design, VNET-to-VNET NAT

Matt Hansen

2 years ago

Contents

1 Overview:
2 Design Options:
- 2.1 Disconnected:
- 2.2 Connected:
3 Solution:
- 3.1 Considerations:
4 Conclusion:

Reading Time: 7 minutes

Overview:

From day one in Azure (which for me was over a decade ago), one of the critical design requirements has been to choose a network address space that doesn’t overlap with that which is configured on-premises or any other connected environments. Admittedly, this has gotten significantly easier since over the years there have been new features released like the resizing of subnets and even the entire address space of virtual networks. However, there are a few services which have been gaining popularity that make this somewhat challenging – such as Azure Kubernetes Service (AKS) and Azure Databricks – these services typically call for a large number of addresses to make sure they can scale appropriately. If an organization hands out a dozen /16s, it quickly becomes an issue for routable networking – let’s take a look at how to approach this architecturally.

Design Options:

Disconnected:

For some situations, a non-connected environment may fit the design requirements. If the AKS, Databricks, or whatever other workload doesn’t need hybrid connectivity then don’t connect it! I’ve found this to be especially true with Azure Databricks, given the majority of its traffic is outbound (including standard connection to the management plane). There are certainly scenarios where this won’t work, which I won’t drain here, but remember that cloud networking isn’t on-prem networking, and not everything has to be connected.

Connected:

Ultimately, as an industry we’re moving towards IPv6 which as I’ve stated to my students “gives us enough addresses to theoretically address each grain of sand on the earth” (reality notwithstanding). At the time of writing, IPv6 is certainly making headway across Azure Services, but isn’t yet pervasive enough to allow a full deployment as the basis of all Azure Virtual Networking.

One step closer to a standard connected network would be potentially using Azure Private Link Service. When I talk about Azure Private Link, most assume Private Link Endpoints, but Private Link Service is a whole other beast. Private Link Service was initially designed with vendors in mind who wanted to provide hosted solution access to their customers, within Azure, but don’t want to have to rely on non-overlapping address spaces for peering or VPNs.

Source: https://learn.microsoft.com/en-us/azure/private-link/private-link-service-overview

This solution leverages the Software Defined Networking (SDN) fabric of Azure and provides an endpoint in the “consumer” network, and a load balancer on the “provider” network, behind which is the configured service. The consumer uses the IP address in their own network, and on the other side the traffic is routed to the backend service. This can work in any IP address configuration and could certainly work for things like a web application hosted on AKS with the Private Link Service functioning as the ingress conduit.

If neither of the above satisfy the design requirements, we’re now at the point where we’re assuming we need full bi-directional routed connectivity. Unless we are okay with handing out large network chunks, we’re pretty much down to NAT. With that said, what NAT capabilities are available in Azure?

NAT Gateway: This is currently designed for outbound SNAT and does not support internal NAT.
Virtual WAN: Virtual WAN nicely integrates with NAT Gateway but is subject to the same limitations.
Azure Load Balancer: In addition to SNAT, Load Balancer does support destination NAT, but not dynamic NAT.
Virtual Network Gateway: This actually does what we need it to do, it supports dynamic NAT, but only across VPN links which come with its own set of performance limitations and it is generally not advisable to connect Virtual Networks over a VPN.
Virtual Network Appliance on an IaaS VM: This will also do what we need it to do, but VMs are limited to 250k flows per machine, will be subject to NAT port exhaustion, and will likely come with additional licensing or at the very least additional management overhead. Scale-out architectures are possible, and I would say this is the second-best solution – but not as integrated as what I propose.

Solution:

What I started doing with customers about 3 years ago, was leveraging Azure Firewall and an “intermediate network”, which allows for a couple of things:

Note: The ingress and egress flow representations are not to suggest that traffic flows from one deployment to the next, but rather how a single flow would operate.

This solution allows for intentionally overlapping network spaces – while that sounds strange it makes it much easier from an architectural perspective. If an engineering build request comes in for AKS or Azure Databricks – give them “the /16”. It will be important to choose an address space that’s not used anywhere else in your network so that your intermediary network doesn’t have routing conflicts, but let’s say you chose 10.30.0.0/16, don’t use that network anywhere else and then you can use it dozens, or even hundreds of times.
Each deployment’s intermediate network will be able to host routable service ingress and will only require a small address space. The documentation states that Azure Firewall needs a /26 network so it can use additional addresses if it needs to scale-out, at time of writing it will let you deploy with a smaller subnet, but I still suggest using a /26. You will additionally need a subnet for your Application Gateway. You may also want to leave a few extra addresses for things like private endpoints.

The design is such that the backend service network is peered to the intermediary network and has route tables for 0.0.0.0/0 traffic to be sent to the Azure Firewall for outbound SNAT. The backend service knows how to get to the Azure Firewall because the two networks are peered together which appends a system route to each network’s routing table. Azure Firewall can then either SNAT the traffic to the internet, or if it’s deployed in the forced-tunnel configuration it can further send the traffic to a central router based on propagated or assigned routes on the Azure Firewall Subnet.

Since unfortunately the Azure Firewall can’t DNAT private addresses as a source, we will use Application Gateway to process inbound traffic. When I started proposing this design there was a slight caveat here that it had to be web traffic, but thankfully now App Gateway supports TCP ports which makes this design much more flexible. Since the intermediary network is designed to be within the contiguous routing domain of the hybrid environment, traffic can be routed to the listener on the App Gateway and then passed back across the non-transitive peer to the backend service.

Let’s think about how all this could look in a network architecture:

10.0.0.0/8 is the overall address space of an organization
10.20.0.0/16 is designated to Azure as a whole as being routable address space
We give each intermediate network an incrementing /24 network, which is routable so that users can get to the listener of the App Gateway and so that Azure Firewall can route SNAT’d traffic back to hybrid networks.
Each of the intermediary networks are either receiving propagated BGP routes or has route tables associated so that it can reach networks through its peering to the hub network.
Each of the intermediary networks is also peered to a single instance of the “reserved space”, that red 10.30.0.0/16 network. That means that you get a full /16 of address space, for the cost of a /24 on your network. All of the backend machines have default routes for SNAT to the Azure Firewall, and the App Gateway could have a backend IP of say the ingress controller on AKS. The overlapping space will never conflict, because it is known to one and only one intermediary network.

This design can scale horizontally as far as you’d like, and you can play with the addresses that you use. A single instance scales very nicely as well, I’ve seen designs that are preparing for upwards of 10,000 machines in AKS cluster(s).

Considerations:

With a functional design specification, let’s consider some additional points:

Who is going to manage the Application Gateway?

Since the App Gateway is very simply passing traffic back to what is likely an ingress controller, it will seldom have to be managed. In my experience it’s a fairly “set it and forget it” type configuration. Regardless, it needs to be determined who will manage this resource.

What about scalability and performance?

We’ve already discussed the scalability from an address perspective, but let’s consider a few other points. While the Azure Firewall does not have metrics to show how many SNAT ports are in use if you’re using a forced-tunnel SNAT, it will if the Firewall is SNAT’ing to the internet. Additionally, the Azure Firewall will auto-scale based on CPU utilization, throughput, and total connections and in my experience, there is a correlation between SNAT port utilization and at least one of these metrics. Firewall will initially have roughly 2.5 Gbps of capacity available and will auto-scale up to 30 Gbps on the Standard SKU and 100 Gbps on the Premium SKU.

Ingress traffic is typically very minimal in this architecture, and I’ve never seen anything close to being concerning in terms of number of connections or throughput. The Application Gateway can handle 20-50,000 connections (depending on the configuration) per instance and can auto-scale as-needed.

Cost

The cost with this design pattern can vary, and will take into account a few things:

Azure Firewall: There are 3 SKUs you can look at using, Basic does what we need but does not auto-scale and is limited to 1Gbps (see my Performance Testing Blog Here), from there Standard and Premium will depend on if you want additional features in the firewall or how much traffic you want to be able to support.
Application Gateway: We again have multiple levels of the SKU depending on what capacity you’re looking for, Basic, Small, Medium, or Large.
Virtual Network Peering: Since these networks are all peered together, there is a data processing fee for that service at roughly ~$20/TB.

Conclusion:

This architecture allows for the scalability of your network address space, while being able to meet the needs of the workload owners who are requesting these larger deployments. I’ve seen this design in practice many times, and it works very well and hopefully this will also be useful to you.

I hope I’ve made your day a little bit easier!

If you have any questions, comments, or suggestions for future blog posts please feel free to comment below or reach out on LinkedIn.