Elastic VMware Service on AWS: A new chapter of VMware for Amazon Web Services

Amazon Web Services (AWS) has introduced the Elastic VMware Service, an exciting new option for organizations seeking seamless hybrid cloud solutions. This service aims to bring the power of VMware’s virtualization suite into AWS’s extensive cloud ecosystem, offering businesses a pathway to modernize their IT infrastructure while leveraging existing VMware investments.

Key Features and Benefits

The Elastic VMware Service is designed to deliver a managed VMware environment natively integrated with AWS. This enables enterprises to run, manage, and scale VMware workloads directly within AWS, simplifying operations for businesses already using VMware tools like vSphere, NSX, and vSAN. Some key benefits include:

  • Native AWS Integration: Organizations can connect VMware workloads with native AWS services like S3, RDS, or AI/ML tools, unlocking new possibilities for innovation.
  • Operational Consistency: By using familiar VMware tools, IT teams can extend their on-premises environments to the cloud without extensive retraining.
  • Elasticity and Scalability: The service offers scalable resources that adapt to workload demands, optimizing performance and cost efficiency.
  • Simplified Migration: Workloads can be migrated with minimal downtime, enabling smoother transitions to a hybrid cloud model.

Challenges and Opportunities

While the Elastic VMware Service holds promise, it’s essential to recognize that it’s an early-stage offering. Here are some key points to consider:

Parity with Established Services

Compared to mature competitors like Nutanix Cloud Clusters (NC2) on AWS, the Elastic VMware Service is still in its infancy. Features such as advanced disaster recovery, multi-cloud connectivity, or deep integrations may take time to achieve parity. Enterprises evaluating the service should weigh their immediate needs against the current capabilities of Elastic VMware Service and factor in its growth trajectory.

Global Availability

Another critical factor is the service’s geographic availability. Initially, AWS is likely to roll out Elastic VMware Service in select regions, with global expansion following over time. For organizations in regions like Japan, this could mean a delay in adoption until the service becomes widely available. Businesses planning global deployments should account for these regional limitations and monitor AWS’s roadmap for updates.

A Positive Step Forward

Despite these challenges, the introduction of Elastic VMware Service is a testament to AWS’s commitment to hybrid cloud innovation. By addressing enterprise needs for flexibility, scalability, and operational consistency, the service positions itself as a compelling choice for VMware-centric organizations exploring the cloud.

As the service matures, it’s expected to add more features, expand regional availability, and strengthen its competitive position. Enterprises interested in early adoption should engage with AWS to understand the roadmap and ensure alignment with their strategic goals.

Conclusion

The Elastic VMware Service on AWS marks an exciting development in the hybrid cloud space. While it’s a promising start, enterprises should remain mindful of its evolving nature and regional rollout timeline. With time and continued investment, Elastic VMware Service has the potential to become a cornerstone for VMware workloads in the cloud—offering both innovation and operational efficiency.

Links and references

ESXi 8.0 on ARM!

The latest update to the ESXi-ARM Fling, which is now based on ESXi 8.0 Update 3b, is now available for download and installation. The Fling introduces an experimental version of VMware’s hypervisor tailored for ARM-based platforms, such as Raspberry Pi. However, note that it is not limited to ARM but could potentially be run on other ARM-based systems.

This release includes several new features and enhancements, such as support for vSphere Distributed Services Engine (Project Monterey) on ARM and improvements in stability and compatibility.

So, this is fun simply from an enthusiast perspective, but does it have real-world applications? Well, yes, one could argue this could be used in use cases like edge computing and remote office/branch office (ROBO) deployments where the workloads on top of ESXi aren’t too demanding. A particularly exciting use case could be using Raspberry Pi devices as an inexpensive vSAN Witness, enabling advanced storage configurations in a cost-effective way.

Installing it – as usual, requires jumping through a few hoops, but all in all it’s not that challenging to get started. Please refer to the installation instructions here:

https://higherlogicdownload.s3.amazonaws.com/BROADCOM/092f2b51-ca4c-4dca-abc0-070f25ade760/UploadedImages/Flings_Content/VMware-ESXi-Arm-Fling-PDFs.zip

I’ve opted to install this on my Raspberry Pi 4 (8GB) installed in an old Sharp MZ-2000 case. IT has dual monitors, but only micro-HDMI 0 is used. The other shows the standard stretched four pixels.

L2 extension from on-prem VMware cluster to Nutanix Cloud Clusters (NC2) on AWS using Cisco CSR1000V as on-prem VTEP

This guide has been written in cooperation with Steve Loh, Advisory Solutions Architect, Network & Security at Nutanix in Singapore, gentleman extraordinaire and master of the Cisco Dark Arts.

Introduction

Organizations frequently choose to extend on-premises networks to the cloud as part of retaining connectivity between virtual machines (VMs) during migrations. This is called L2 or Layer 2 extension. With an extended network, VMs connected to the same subnet can communicate as usual even when some of them have been migrated to the cloud and others are still on-premises awaiting migration.

L2 extension is sometimes used on a more permanent basis when the same network segment contain VMs to be migrated as well as appliances which must remain on-prem. If the organization doesn’t want to change the IP addresses of any of these entities and still need part of them migrated, they might chose to maintain L2 extension of some subnets indefinitely. However, this is generally considered a risk and is not recommended.

Architecture diagram

This is a graphical representation of the network stretch from on-prem VMware to Nutanix Cloud Clusters in AWS which is covered in this blog post. Other configurations are also possible, especially when Nutanix is also deployed on-prem.

Video of process

For those preferring watching a demo video of this process rather than reading the blog, please refer to the below.

Limitations and considerations

While L2 extension is a useful tool for migrations, please keep the following points in mind when deciding whether or not to utilize this feature:

  • L2 extension will complicate routing and thereby also complicate troubleshooting in case there are issues
  • L2 extension may introduce additional network latency. This takes the shape of trombone routing where traffic need to go from the cloud via a gateway on the on-premises side and then back to the cloud again. Nutanix Flow Policy Based Routing (PBR) may be used to alleviate this.
  • If routing is set to go via one default gateway either on-premises or in the cloud, if the network connecting the on-premises DC with the cloud environment has downtime, the VMs on the side without the default gateway will no longer be able to communicate with networks other than their own
  • The Nutanix L2 extension gateway appliance does not support redundant configurations at time of writing
  • A Nutanix L2 extension gateway can support up to five network extensions
  • A single Prism Central instance can support up to five Nutanix L2 extension gateways
  • Always keep MTU sizes in mind when configuring L2 extension to avoid unnecessary packet fragmentation. MTU settings can be configured when extending a network.
  • Even though VMs are connected to an extended network, if the current version of Move is used for migration, VM IP addresses will not be retained. A Move release in the near future will enable IP retention when migrating from VMware ESXi to NC2 on AWS.

Types of L2 extension

Various methods of extending a network exist. This blog will cover one of these cases – on-premises VMware with Cisco CSR1000V as VTEP to Nutanix Cloud Clusters (NC2) on AWS with a Nutanix gateway appliance.

On-premises VLANs or Flow overlay networks can be extended using Nutanix GW appliances to Flow overlay networks in NC2. It is also possible to extend using Nutanix VPN appliances in case the network underlay is not secure (directly over the internet). Finally, when the on-premises environment does not run Nutanix, using a virtual or physical router with VXLAN and VTEP capabilities is possible. This blog focuses on the last use case as it is a commonly discussed topic among customers considering NC2 and L2 extension.

Routing

When extending networks, the default gateway location and routing to and from VMs on an extended network become important to understand. Customers used to extending networks with VMware HCX or NSX Autonomous Edge are familiar with the concept of trombone routing over a default gateway located on-premises. With Nutanix it is possible to use Policy Based Routing (PBR) to control how routing should be performed for different networks. In many ways, Nutanix PBR offers more detailed tuning of routes than can be done with VMware MON in HCX.

A key difference between extending networks with HCX vs. with Nutanix is that with HCX the extended network appears as a single entity, although it exists on both sides (on-prem and cloud). The default gateway would generally be on-prem and both DHCP and DNS traffic would be handled by on-prem network entities, regardless if a VM was on-prem or in the cloud.

For L2 extension with Nutanix, things work a bit differently. The on-prem network will be manually recreated as an overlay network on NC2, with the same default gateway as the on-prem network but with a different DHCP range. The on-prem and cloud networks are then connected through a Nutanix GW appliance deployed as a VM in Prism Central.

Prerequisites

This guide assumes that an on-premises VMware vSphere 7.x environment and an NC2 version 6.8.1 cluster are already present. It also assumes that the on-prem and NC2 environments are connected over a L3 routed network, like a site-to-site (S2S) VPN or DirectConnect (DX). The two environments have full IP reachability and can ping each other.

In this case we are extending a VLAN which has been configured as a port group with VLAN tagging on standard vSwitches on the ESXi hosts.

Overview of steps

  1. Recreate the network to be extended using Flow overlay networking on NC2
  2. Deploy the Nutanix gateway appliance on NC2
  3. Deploy the Cisco CSR1000V in the on-premises VMware cluster
  4. Enable Promiscuous mode and Forged transmits on the vSwitch portgroup of the VLAN to be extended
  5. Register the CSR1000V routable IP address as a Remote Gateway in NC2
  6. Configure the CSR1000V IP leg on the VLAN to be extended and set up VNI and other settings required to extend the network
  7. Extend the network from NC2 with the CSR1000V as the on-prem VTEP

In the demo video included in this post we also perform a migration of a VM from on-prem to NC2 and verify connectivity with ICMP.

Step 1: Recreate the network to be extended using Flow overlay networking on NC2

Access Prism Central, navigate to Network and Security. Select “Create VPC” to add a new Nutanix Flow VPC to hold the subnet we want to extend.

After the VPC has been created, go to Subnets and create a subnet in the newly created VPC with settings matching the network which will be extended. In this case we create “VPC-C” and a subnet with a CIDR of “10.42.3.0/24”. The default gateway is configured to be the same as on-prem but the DHCP range is set to not overlap.

Step 2: Deploy the Nutanix gateway appliance on NC2

In Prism Central, navigate to “Network and Security” and select “Connectivity”. From here click “Create Gateway” and select “Local” to create the gateway on the NC2 side.

Add a name, set the Gateway Attachment to VPC and select the VPC which was just created in the previous steps.

For Gateway Service, select VTEP and allow NC2 to automatically assign a Floating IP from the AWS VPC CIDR. This IP will be accessible from the on-prem environment and will be used as the anchor point for the L2E when configuring the CSR1000V in a later step

Note that a new VM (the gateway appliance) will automatically be deployed on the NC2 cluster by Prism Central.

Step 3: Deploy the Cisco CSR1000V in the on-premises VMware cluster

Deploy the Cisco appliance on the VMware cluster and select the first network interface to connect to the routable underlay network (IP connectivity to NC2) and the second and third interfaces to connect into the port group of the VLAN to be extended.

In this case VL420 is routable underlay network and VL423 the VLAN to be extended

Configure an IP address on the management network and make note of it as we will use it in a subsequent step. In this case we use “10.42.0.106” as the management IP address on VL420.

Step 4: Enable Promiscuous mode and Forged transmits on the vSwitch portgroup of the VLAN to be extended

In order to pass traffic from the on-premises network to the NC2 network it is necessary to enable Promiscuous mode and Forged transmits on the vSwitch port group on the VMware cluster. In this case we are using standard vSwitches.

Step 5: Register the CSR1000V routable IP address as a Remote Gateway in NC2

We need to create a representation of the on-premises CSR1000V appliance in NC2 so that we can refer to it when extending the network in a later step. This is essentially just a matter of adding in the IP address as a “Remote Gateway”.

In Prism Central, navigate to “Network and Security”, select “Connectivity” and “Create Gateway”. Select “Remote” and add the details for the on-prem Cisco appliance. Give it a name, select “VTEP” as the “Gateway Service” and add the IP address. Let the VxLAN port remain as “4789”.

Step 6: Configure the CSR1000V IP leg on the VLAN to be extended and set up VNI and other settings required to extend the network

In this step we do the configuration of the CSR1000V over SSH. To enable SSH you may need to to the following through the console for the virtual appliance first.

Enable SSH

en
conf t
username cisco password Password1!
line vty 0 4
login local
transport input ssh
end

Now when SSH is available, SSH to the appliance as the user “cisco” with password “Password1!” and complete the remaining configurations.

Configure interface in VLAN to be extended

Configure the 2nd interface to be a leg into the VLAN to be extended by giving it an IP address and enabling the interface

CSR1000V3#
CSR1000V3#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
CSR1000V3(config)#
CSR1000V3(config)#int gi2
CSR1000V3(config-if)#
CSR1000V3(config-if)#ip address 10.42.3.2 255.255.255.0
CSR1000V3(config-if)#no shut

Configure NVE 1 and the VNI to be used + link with the NC2 gateway IP

For ingress-replication, use the Floating IP from the AWS VPC CIDR range which was assigned to the gateway appliance after deploying on NC2.

CSR1000V3#
CSR1000V3#conf term
Enter configuration commands, one per line.  End with CNTL/Z.
CSR1000V3(config)#
CSR1000V3(config)#
CSR1000V3(config)#int NVE 1
CSR1000V3(config-if)#no shutdown
CSR1000V3(config-if)#source-interface gigabitEthernet 1
CSR1000V3(config-if)#member vni 4300
CSR1000V3(config-if-nve-vni)#ingress-replication 10.70.177.192
CSR1000V3(config-if-nve-vni)#
CSR1000V3(config-if-nve-vni)#end
CSR1000V3(config-if)#end

Configure bridge domain and L2E via the 3rd interface (Gi3)

CSR1000V3#
CSR1000V3#conf t
Enter configuration commands, one per line. End with CNTL/Z.
CSR1000V3(config)#bridge-dom
CSR1000V3(config)#bridge-domain 12
CSR1000V3(config-bdomain)#member VNI 4300
CSR1000V3(config-bdomain)#member gigabitEthernet 3 service-instance 1
CSR1000V3(config-bdomain-efp)#end

CSR1000V3#conf t
Enter configuration commands, one per line. End with CNTL/Z.
CSR1000V3(config)#int
CSR1000V3(config)#interface giga
CSR1000V3(config)#interface gigabitEthernet 3
CSR1000V3(config-if)#no shut

CSR1000V3(config-if)#
CSR1000V3(config-if)#service instance 1 ethernet
CSR1000V3(config-if-srv)# encapsulation untagged
CSR1000V3(config-if-srv)#no shut
CSR1000V3(config-if-srv)#end
CSR1000V3#
CSR1000V3#

Configure a default route

We set the default route to go over the default gateway for the underlay we use to connect to AWS and NC2 on AWS

CSR1000V3#
CSR1000V3#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
CSR1000V3(config)#
CSR1000V3(config)#
CSR1000V3(config)#ip route 0.0.0.0 0.0.0.0 10.42.0.1
CSR1000V3(config)#
CSR1000V3(config)#
CSR1000V3(config)#end

Step 7: Extend the network from NC2 with the CSR1000V as the on-prem VTEP

In Prism Central, navigate to “Network and Security” and select “Subnets”. From here we will create the network extension.

Click the subnet to be extended and then select “Extend”

If we were extending a network between two Nutanix clusters we would select “Across Availability Zones” but in this case we extend from a pure VMware environment and a 3rd party (Cisco) appliance, so we select “To A Third-Party Data Center”.

Select the CSR1000V as the remote VTEP gateway and the VPC which contains the subnet we want to extend.

For “Local IP Address”, enter a free IP in the subnet to be extended. This will be the leg the Nutanix gateway appliance extends into that subnet.

Also set the VxLAN VNI we used when configuring the CSR1000V earlier. Adjust the MTU to ensure there is no packet fragmentation. The default is 1392 but this will vary depending on the connectivity provider between on-prem and the AWS cloud.

Configuration of the Layer 2 extension is now complete. In the following section we verify connectivity from on-prem to the cloud using the newly extended network.

Verifying connectivity

As a first step it’s good to ping the IP address in the extended network which is assigned to the Nutnix Gateway appliance. We can verify that the “Local IP Address” is configured for the gateway VM by navigating to “VM” in Prism Central and checking that “10.42.3.3” shows up as an IP for the gateway

Pinging this IP address from a VM in the on-premises VMware environment shows that it can reach the gateway appliance across the extended network without problems. The local VM has an IP of 10.42.3.99, in the same VLAN which has been extended. Latency is about 5ms across the S2S VPN + L2 extension.

As a next step I have migrated a VM using Move from the on-prem VMware environment to NC2. After migration it was assigned an IP of “10.42.3.106” as per the screenshot below

Pinging this VM from on-prem also works just fine

Conclusion

That concludes the walkthrough of configuring L2 extension from on-premises VMware with Cisco CSR1000V to Nutanix Cloud Clusters (NC2). Hopefully this was helpful.

For reference, please refer to the below links to Nutanix and Cisco pages about L2 extension

How to find OEM ESXi installer ISOs on Broadcom’s webpage

If you’re using Dell servers for your DC or home lab, there used to be an option to download the ESXi ISO file directly from the Dell support site under “Enterprise Software”. This year the ability to download ISO images pre-loaded with Dell drivers has been removed from the Dell support site and moved to Broadcom.

It can be a challenge to find these, so please refer to the below steps to get your ISO images downloaded:

Step 1: Navigate to the Broadcom Support portal.  Log in or create an account if you have not already done so.

Step 2: Nexxt to your username on the top right-hand side, click the cloud icon and select VMware Cloud Foundation.

Step 3: You should now have a side-menu to the left. Click on My Downloads.

Step 4: We actually have to search here to find the downloads we’re looking for. Use the top right-hand side search bard and enter “VMware vSphere”. Choose VMware vSphere when the results show up.

Step 5: Select ESXi and then navigate to the Custom ISOs tab

Step 6: Download your favorite ISO image with drivers pre-loaded for your particular brand of server 🙂

Still on 7.0? What in vSphere 8.0 would make the shift worthwhile?

For those who have remained on 7.0, perhaps it is time to look at what new features were introduced in 8.0 to see if an upgrade is in order. This is especially pertinent as the 7.0 u3 release will be EOL in October of 2024. So what’s new? The 8.0 release introduces a good few updates and improvements. Please refer to the below for an overview:

1. Performance Enhancements

  • Distributed Services Engine: vSphere 8 introduces support for Data Processing Units (DPUs), which offload infrastructure tasks like networking and security from the CPU to DPUs, improving efficiency and performance.
  • NVMe Over Fabrics (NVMe-oF): Enhanced support for NVMe devices boosts storage performance for modern workloads.
  • vSphere Memory Monitoring and Remediation (vMMR): Provides improved memory utilization insights for better troubleshooting and optimization.

2. Scalability and Operations

  • Increased Limits: The release supports up to 10,000 active vSphere Pods in a vSphere Cluster, enhancing scalability for modern applications.
  • vSphere Lifecycle Manager (vLCM): Updates include streamlined host upgrade workflows and enhanced compatibility management.

3. Security Enhancements

  • Enhanced TPM Support: Virtual TPM devices are automatically replaced during cloning or deployment, aiding in Windows 11 compliance.
  • Deprecation of Legacy Protocols: Integrated Windows Authentication (IWA) and NPIV are being phased out in favor of more secure alternatives.

4. Upgrade Considerations

  • Deprecation of Legacy BIOS: UEFI is now strongly recommended for ESXi hosts, as some may fail to boot in legacy BIOS mode.
  • Driver and VIB Compatibility: Certain drivers (e.g., nmlx4_en and older lpfc devices) and legacy VIBs are no longer supported, requiring replacements before upgrading.

5. Application Modernization

  • Kubernetes and Containers: With the enhanced vSphere with Tanzu, support for Kubernetes-native workloads is better integrated, aligning with enterprise modernization goals.

6. Simplified Licensing

  • A point of contention for many as some have seen price hikes: VMware introduced a new licensing structure tailored for organizations to leverage flexible consumption models, reducing overhead for hybrid and multi-cloud deployments