Site icon Hansen Cloud

Azure Overlapping Network Design, VNET-to-VNET NAT

Reading Time: 7 minutes

Overview:

From day one in Azure (which for me was over a decade ago), one of the critical design requirements has been to choose a network address space that doesn’t overlap with that which is configured on-premises or any other connected environments. Admittedly, this has gotten significantly easier since over the years there have been new features released like the resizing of subnets and even the entire address space of virtual networks. However, there are a few services which have been gaining popularity that make this somewhat challenging – such as Azure Kubernetes Service (AKS) and Azure Databricks – these services typically call for a large number of addresses to make sure they can scale appropriately. If an organization hands out a dozen /16s, it quickly becomes an issue for routable networking – let’s take a look at how to approach this architecturally.

Design Options:

Disconnected:

For some situations, a non-connected environment may fit the design requirements. If the AKS, Databricks, or whatever other workload doesn’t need hybrid connectivity then don’t connect it! I’ve found this to be especially true with Azure Databricks, given the majority of its traffic is outbound (including standard connection to the management plane). There are certainly scenarios where this won’t work, which I won’t drain here, but remember that cloud networking isn’t on-prem networking, and not everything has to be connected.

Connected:

Ultimately, as an industry we’re moving towards IPv6 which as I’ve stated to my students “gives us enough addresses to theoretically address each grain of sand on the earth” (reality notwithstanding). At the time of writing, IPv6 is certainly making headway across Azure Services, but isn’t yet pervasive enough to allow a full deployment as the basis of all Azure Virtual Networking.

One step closer to a standard connected network would be potentially using Azure Private Link Service. When I talk about Azure Private Link, most assume Private Link Endpoints, but Private Link Service is a whole other beast. Private Link Service was initially designed with vendors in mind who wanted to provide hosted solution access to their customers, within Azure, but don’t want to have to rely on non-overlapping address spaces for peering or VPNs.

Source: https://learn.microsoft.com/en-us/azure/private-link/private-link-service-overview

This solution leverages the Software Defined Networking (SDN) fabric of Azure and provides an endpoint in the “consumer” network, and a load balancer on the “provider” network, behind which is the configured service. The consumer uses the IP address in their own network, and on the other side the traffic is routed to the backend service. This can work in any IP address configuration and could certainly work for things like a web application hosted on AKS with the Private Link Service functioning as the ingress conduit.

If neither of the above satisfy the design requirements, we’re now at the point where we’re assuming we need full bi-directional routed connectivity. Unless we are okay with handing out large network chunks, we’re pretty much down to NAT. With that said, what NAT capabilities are available in Azure?

Solution:

What I started doing with customers about 3 years ago, was leveraging Azure Firewall and an “intermediate network”, which allows for a couple of things:

Note: The ingress and egress flow representations are not to suggest that traffic flows from one deployment to the next, but rather how a single flow would operate.

  1. This solution allows for intentionally overlapping network spaces – while that sounds strange it makes it much easier from an architectural perspective. If an engineering build request comes in for AKS or Azure Databricks – give them “the /16”. It will be important to choose an address space that’s not used anywhere else in your network so that your intermediary network doesn’t have routing conflicts, but let’s say you chose 10.30.0.0/16, don’t use that network anywhere else and then you can use it dozens, or even hundreds of times.
  2. Each deployment’s intermediate network will be able to host routable service ingress and will only require a small address space. The documentation states that Azure Firewall needs a /26 network so it can use additional addresses if it needs to scale-out, at time of writing it will let you deploy with a smaller subnet, but I still suggest using a /26. You will additionally need a subnet for your Application Gateway. You may also want to leave a few extra addresses for things like private endpoints.

The design is such that the backend service network is peered to the intermediary network and has route tables for 0.0.0.0/0 traffic to be sent to the Azure Firewall for outbound SNAT. The backend service knows how to get to the Azure Firewall because the two networks are peered together which appends a system route to each network’s routing table. Azure Firewall can then either SNAT the traffic to the internet, or if it’s deployed in the forced-tunnel configuration it can further send the traffic to a central router based on propagated or assigned routes on the Azure Firewall Subnet.

Since unfortunately the Azure Firewall can’t DNAT private addresses as a source, we will use Application Gateway to process inbound traffic. When I started proposing this design there was a slight caveat here that it had to be web traffic, but thankfully now App Gateway supports TCP ports which makes this design much more flexible. Since the intermediary network is designed to be within the contiguous routing domain of the hybrid environment, traffic can be routed to the listener on the App Gateway and then passed back across the non-transitive peer to the backend service.

Let’s think about how all this could look in a network architecture:

This design can scale horizontally as far as you’d like, and you can play with the addresses that you use. A single instance scales very nicely as well, I’ve seen designs that are preparing for upwards of 10,000 machines in AKS cluster(s).

Considerations:

With a functional design specification, let’s consider some additional points:

  1. Who is going to manage the Application Gateway?

Since the App Gateway is very simply passing traffic back to what is likely an ingress controller, it will seldom have to be managed. In my experience it’s a fairly “set it and forget it” type configuration. Regardless, it needs to be determined who will manage this resource.

  1. What about scalability and performance?

We’ve already discussed the scalability from an address perspective, but let’s consider a few other points. While the Azure Firewall does not have metrics to show how many SNAT ports are in use if you’re using a forced-tunnel SNAT, it will if the Firewall is SNAT’ing to the internet. Additionally, the Azure Firewall will auto-scale based on CPU utilization, throughput, and total connections and in my experience, there is a correlation between SNAT port utilization and at least one of these metrics. Firewall will initially have roughly 2.5 Gbps of capacity available and will auto-scale up to 30 Gbps on the Standard SKU and 100 Gbps on the Premium SKU.

Ingress traffic is typically very minimal in this architecture, and I’ve never seen anything close to being concerning in terms of number of connections or throughput. The Application Gateway can handle 20-50,000 connections (depending on the configuration) per instance and can auto-scale as-needed.

  1. Cost

The cost with this design pattern can vary, and will take into account a few things:

Conclusion:

This architecture allows for the scalability of your network address space, while being able to meet the needs of the workload owners who are requesting these larger deployments. I’ve seen this design in practice many times, and it works very well and hopefully this will also be useful to you.

I hope I’ve made your day a little bit easier!

If you have any questions, comments, or suggestions for future blog posts please feel free to comment below or reach out on LinkedIn.

Exit mobile version