VMware home lab: 6 months with the new setup

In spring of 2021 I wanted a proper VMware lab setup at home. The primary reason was, and still is, having an environment in which to learn and experiment with the latest VMware and AWS solutions. I strongly believe that actual hands-on experience is the gateway to real knowledge, despite how well the documentation may be written.

To that end I went about listing up what would be needed to make this dream of a home lab come true. The lack of space meant that the setup would end up in my bedroom and therefore needed to be quiet. That removed most 2nd hand enterprise servers from the list. Possibly with the exception of the VRTX chassis from Dell, which I would still REALLY want for a home lab, but it’s way to expensive – even 2nd hand.

Requirements:

  • As compatible with the VMware HCL as possible (as-is or via Flings)
  • Quiet (no enterprise servers)
  • Energy efficient
  • Not too big (another nail in the coffin for full-depth 19″ servers)
  • Reasonable performance
  • Ability to run vSAN
  • 10Gbps networking

Server hardware

Initially I considered the Intel NUCs and Skull / Ghost Canyon mini-PCs as these are very popular among home-lab enthusiasts. However, the 10Gbps requirement necessitated a PCIe slot and the models supporting this from Intel are very expensive.

The SuperMicro E300-9D was also on the list but they too tend to get expensive and a bit hard to get on short notice where I live.

Therefore, going with a custom build sounded more and more in line with what would work for this setup. In the end I settled on the below. The list contain all the parts used for the ESXi nodes, minus the network cards which are listed separately in the networking section below.

PartBrandCost (JPY)
MoboASRock Intel H410M-ITX/ac I219V12,980link
CPUCore i5 10400 BOX (6c w. graphics)20,290link
RAMTEAM DDR4 2666Mhz PC4-21300 (2×32)33780link
m.2 cacheWD Black 500Gb SSD M.2-2280 SN7509,580link
2.5″ driveSanDisk 2.5″ SSD Ultra 3D 1TB13,110link
PSUThermaltake Smart 500W -STANDARD4,756link
CaseCooler Master H100 Mini Tower7,023link
Total101,519

Mainboard and case

The choice of mainboard came down to the onboard network chipset. It had to be possible to run the ESXi installer and it won’t work if it can’t find the network. Initially I only had the onboard NIC and no 10Gbps cards. Unfortunately the release of vSphere version 7.x restricted the hardware support significantly. This time I was going to make an AMD build, but most of their mainboards come with Realtek onboard NICs and they are no longer recognized by the ESXi installer. Another consideration was size and expansion options. An ITX formfactor meant that the size of the PC case could be reduced while still having a PCIe slot for a 10Gbps NIC.

The Cooler Master H100 case has a single big fan which makes it pretty quiet. Its small size also makes it an ideal case for this small-footprint lab environment. It even comes with LEDs in the fan which are hooked up to the reset button on the case to switch between colors (or to turn it off completely).

CPU

Due to the onboard NIC support the build was restricted to an Intel CPU. Gen 11 had been released but Gen 10 CPUs were still perfectly fine and could be had for less money. Obviously, there was no plan to add a discreet GPU so the CPU also had to come with built-in graphics. The Core i5 10400 seemed to meet all criteria while having a good cost / performance balance.

Memory

The little ASRock H410M-ITX/ac mainboard supports up to 64Gb of RAM and I filled it up from the start. One can never have too much RAM. With three nodes we get a total of 192Gb which will be sufficient for most tasks. Likely there will come a day later when a single workload (looking at you NSX-T!!) will require more. This is the only area which I feel could become a limitation soon. For that day I’ll likely have to add a box with more memory specifically for covering that workload.

Storage

A vSAN environment was one of the goals for the lab and with an NVME PCIe SSD as the cache tier and a 2.5″ drive as the capacity tier this was accomplished. It was a bit scary ordering these parts without knowing if they would be recognized in vCenter as usable for vSAN, but in the end there was no issue at all. They were all recognized immediately and could be assigned to the vSAN storage pool.

For the actual ESXi install I was going to use a USB disk initially but ended up re-using some old 2.5″ and 3.5″ spinning rust drives for the hypervisor install. These are not part of the cost calculation above as I just used whatever was laying around at home. The cost of these is negligible though.

Performance of the vSAN cluster isn’t too bad for using consumer hardware 🙂

Network hardware

To ensure vSAN performance and to support the 10Gbps internet router uplink a 10Gbps managed switch was required. Copper ports become very expensive so SFP+ would be the way to go. Mikrotik has a good 8+1 port switch / router in their CRS309-1G-8S+IN model. In the end this was a good fit for the home lab because not only does it have 8x 10Gbps SFP+ ports, it is also fanless and the software support several advanced features, like BGP.

I’m still happy with the choice 6 months later. It’s a great switch but it took a while to get used to it. Most of us probably come from a Cisco or Juniper background. The configuration for the Mikrotik is completely different and won’t be intuitive for the majority of users.

CRS309-1G-8S+IN

On the server side I wanted something which would be guaranteed to work with ESXi, so a 10Gbps card which is on the HCL was a must. Intel has a lot of cards on the list and their X520 series can be found pretty easily. In the end I got three X520-DP2 (dual port) cards and they have worked perfectly so far.

There is also a 1Gbps managed Dell x1026p switch to allow for additional networking options with NSX-T. With the Mikrotik 10Gbps switch there the Dell switch is more an addition for corner cases. It does help when attaching other devices which doesn’t support 10Gbps though.

The Mikrotik has a permanent VPN connection to an AWS Transit Gateway and from there to various VPCs and sometimes the odd VMware Cloud on AWS SDDC.

Installation media etc.

These servers still require custom installation media to be created for the installation to work. Primarily for the onboard Intel networking and the USB network Fling. An explanation for how to create custom media can be found here.

vCenter is hosted on an NFS share from a separate server. This is done so it could be on shared storage for the cluster while simultaneously being separate from the vSAN while the environment is being built.

ESXi is installed over PXE to allow for fully automated installations.

Conclusion

That’s it – a fully functional VMware lab. Quiet and with reasonably high performance. Also, RGB LEDs adds at least 20% extra performance – a bit like red paint on a sports car 😉

Resizing a Linux partition: Photon OS VM on vSphere

Adding disk space to a Linux VM can be a lot more complex than expected. Please find below an explanation on how to extend the size of the root partition of a Photon OS VM running on vSphere. The resize is done without unmounting the partition (but there is a reboot done initially). This is made possible in part because the filesystem is Ext4. The VM does need to be rebooted after changing the disk size in vSphere however. Otherwise it won’t realize it now has a larger disk.

Process

  • Increase size of disk in vSphere
  • Reboot the VM so it recognizes the new disk size
  • Use fdisk to delete and re-create the root partition
  • Use resize2fs to expand the partition size
  • Update fstab and grub with the new partition ID (or the VM won’t boot)

For Photon OS this process is extra easy as the root partition is at the end of the filesystem table and it doesn’t use an “Extended” partition. It’s possible to resize partitions with an Extended partition as well, but it takes a bit more work.

Note: These commands can easily break your system. Don’t try it on a machine where you value the data unless you have a solid backup of everything before attempting a resize.

Video covering the steps shown below

Step one is to change the disk size in vCenter

Bumped up the VM disk size from 80 to 375GB in vCenter

Reboot

In order for the Linux VM to recognize that it has a larger disk it needs to be rebooted.

root@stress-vm-01 [ ~ ]# reboot

Prior to modifying the partitions, verify which disk to modify

After rebooting, log back into the VM. We want to modify the root “/” partition and with “lsblk” we can verify that it is labeled “sda3”

root@stress-vm-01 [ ~ ]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  375G  0 disk 
├─sda1   8:1    0    4M  0 part 
├─sda2   8:2    0   10M  0 part /boot/efi
└─sda3   8:3    0   80G  0 part /

Launch fdisk

We use “fdisk” to modify the partitions and tell it to look at “/dev/sda” rather than “/dev/sda3”. This is because we want to see the entire disk, not just the partition we will modify

root@stress-vm-01 [ ~ ]# fdisk /dev/sda

Welcome to fdisk (util-linux 2.36).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

GPT PMBR size mismatch (167772159 != 786431999) will be corrected by write.

Command (m for help):

Print partition information

We can see that the partition we want to modify (“/dev/sda3”) is at the end of the partition table. This makes it easy as we don’t have to shift any other partitions around to make space for the new, larger partition.

Command (m for help): p

Disk /dev/sda: 375 GiB, 402653184000 bytes, 786432000 sectors
Disk model: Virtual disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 2C13B474-2D24-4FE6-9905-D3A52DB28C9E

Device     Start       End   Sectors Size Type
/dev/sda1   2048     10239      8192   4M BIOS boot
/dev/sda2  10240     30719     20480  10M EFI System
/dev/sda3  30720 167772126 167741407  80G Linux filesystem

Command (m for help):

Delete the last partition (number 3)

Command (m for help): d
Partition number (1-3, default 3): 

Partition 3 has been deleted.

Command (m for help):

Recreate the partition

Here we use “n” to create a new partition, starting it at the exact same place as the old partition: “307020”. Fdisk will automatically suggest we end the new partition at the end of the disk: “786431966”. Pressing enter will accept this value and create the partition.

We can also see that the partition contains an ext4 signature – this is why we can resize the partition while it still is mounted.

Command (m for help): n
Partition number (3-128, default 3): 
First sector (30720-786431966, default 30720): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (30720-786431966, default 786431966): 

Created a new partition 3 of type 'Linux filesystem' and of size 375 GiB.
Partition #3 contains a ext4 signature.

Do you want to remove the signature? [Y]es/[N]o: N

Command (m for help):

Print the updated partition table

Note that it is not yet written to disk, this is just a preview

Command (m for help): p

Disk /dev/sda: 375 GiB, 402653184000 bytes, 786432000 sectors
Disk model: Virtual disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 2C13B474-2D24-4FE6-9905-D3A52DB28C9E

Device     Start       End   Sectors  Size Type
/dev/sda1   2048     10239      8192    4M BIOS boot
/dev/sda2  10240     30719     20480   10M EFI System
/dev/sda3  30720 786431966 786401247  375G Linux filesystem

Command (m for help):

Writing the partition table to disk

Command (m for help): w
The partition table has been altered.
Syncing disks.

Verifying the current size of the root “/” partition

root@stress-vm-01 [ ~ ]# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        79G  1.1G   74G   2% /
root@stress-vm-01 [ ~ ]# 

Resizing on the fly (without unmounting)

root@stress-vm-01 [ ~ ]# resize2fs /dev/sda3
resize2fs 1.45.6 (20-Mar-2020)
Filesystem at /dev/sda3 is mounted on /; on-line resizing required
old_desc_blocks = 10, new_desc_blocks = 47
The filesystem on /dev/sda3 is now 98300155 (4k) blocks long.

Verifying the new partition size

root@stress-vm-01 [ ~ ]# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       369G  1.1G  352G   1% /

Verify the new partition ID (“PARTUUID”)

root@stress-vm-01 [ ~ ]# blkid
/dev/sda2: SEC_TYPE="msdos" UUID="53EC-9755" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="0a2847cf-9e9d-4d1a-9393-490e1b2459bf"
/dev/sda3: UUID="9cb30e86-d563-478d-8eeb-16f2449cb608" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="5e0b1089-595c-4f42-8d4b-4b06220cd6c7"
/dev/sda1: PARTUUID="d2bf275a-1df1-4aa6-adbf-8b5f6c4cac3a"

Update /etc/fstab and /boot/grub/grub.conf

Use your favorite editor (vi / vim / nano). Look for the partition UUID and update to match the new partition ID. Note that grub.conf may have a slightly different name or location if you aren’t using Photon OS.

root@stress-vm-01 [ ~ ]# vi /etc/fstab 
root@stress-vm-01 [ ~ ]# vi /boot/grub/grub.cfg 

All done!

Showing the before and after size of the root partition after a successful resize

Create custom ESXi installer image with additional network drivers

If one tries to install ESXi on a server equipped with network cards which are not on the official VMware HCL (Hardware Compatibility List), the installer will stop at the network card detection step and refuse to continue with a happy “No network adapters detected”. This is very common for those of us running VMware home labs on non-enterprise hardware.

However, VMware does provide additional network drivers in the shape of Flings. They can be downloaded but how does one create a custom ESXi installer ISO file using those drivers?

To make this process easy and fully automated, I created a PowerShell script which downloads the ESXi image (pick whichever version required) and also the community network drivers as well as the USB network driver fling. It then bundles everything up into an ISO file which can then be written to a USB stick or CD / DVD for installation.

What if you need a different version of ESXi or different drivers?

No problems, the script can be customized to accommodate this as well. Please refer to the YouTube video below for details.

Script for ESXi ISO creation

Location of the script on GitHub: link

Video showing how to use and customize the script

Using NSX Autonomous Edge to extend L2 networks from on-prem to VMware Cloud on AWS

This is a quick, practical and unofficial guide showing how to use NSX Autonomous Edge to do L2 extension / stretching of VLANs from on-prem to VMware Cloud on AWS.

Note: The guide covers how to do L2 network extension using NSX Autonomous Edge. It doesn’t cover the deployment or use of HCX or HLM for migrations.

Why do L2 extension?

One use case for L2 extension is for live migration of workloads to the cloud. If the on-prem network is L2 extended / stretched there will be no interruption to service while migrating and no need to change IP or MAC address on the VM being migrated.

Why NSX Autonomous Edge?

VMware offers a very powerful tool – HCX (Hybrid Cloud Extension) to make both L2 extension and migrations of workloads a breeze. It is also provided free of charge when purchasing VMware Cloud on AWS. Why would one use another solution?

  1. No need to have a vSphere Enterprise Plus license

Because L2 extension with HCX requires Distributed vSwitches and those in turn are only available with the top level vSphere Enterprise Plus license. Many customers only have the Standard vSphere license and therefore can’t use HCX for L2 extension (although they can use it for the migration itself which will be shown later in this post). NSX Autonomous Edge works just fine with standard vSwitches and therefore the standard vSphere license is enough

  1. Active / standby HA capabilities

Because HCX doesn’t include active / standby redundancy. Sure, you can enable HA and even FT on the cluster, but FT maxes out at 4 VMs / cluster and HA might not be enough if your VMs are completely reliant on HCX for connectivity. NSX Autonomous Edge allows two appliances to be deployed in a HA configuration.

Configuration diagram (what are we creating?)

We have an on-prem environment with multiple VLANs, two of which we want to stretch to VMware Cloud on AWS and then migrate a VM across, verifying that it can be used throughout the migration. In this case we use NSX Autonomous Edge for the L2 extension of the networks while using HCX for the actual migration.

End state, prior to removing on-prem networks

Prerequisites

  • A deployed VMware Cloud on AWS SDDC environment
  • Open firewall rules on your SDDC to allow traffic from your on-prem DC network (create a management GW firewall rule and add your IP as allowed to access vCenter, HCX, etc.)
  • If HCX is used for vMotion: A deployed HCX environment and service mesh (configuration of HCX is out of scope for this guide)

Summary of configuration steps

  • Enable L2 VPN on your VMC on AWS SDDC
  • Download the NSX Autonomous Edge appliance from VMware
  • Download the L2 VPN Peer code from the VMC on AWS console
  • Create two new port groups for the NSX Autonomous Edge appliance
  • Deploy the NSX Autonomous Edge appliance
  • L2 VPN link-up
  • Add the extended network segments in the VMC on AWS console
  • VM migration using HCX (HCX deployment not shown in this guide)

Video of the setup process

As an alternative / addition to the guide below, feel free to refer to the video below. It covers the same steps but quicker and in a slightly different order. The outcome is the same however.

Enable L2 VPN on your VMC on AWS SDDC

  1. Navigate to “Networking & Security”, Click “VPN” and go to the “Layer 2” tab
  2. Click “Add VPN tunnel”
  3. Set the “Local IP address” to be the VMC on AWS public IP
  4. Set the “Remote public IP” to be the public IP address of your on-prem network
  5. Set the “Remote private IP” to be the internal IP you intend to assign the NSX Autonomous Edge appliance when deploying it in a later step
Configuring the L2 VPN server side in VMC on AWS

Download the NSX Autonomous Edge appliance

After deploying the L2 VPN in VMC on AWS there will be a pop-up with links for downloading the virtual appliance files as well as a link to a deployment guide

Download the L2 VPN Peer code from the VMC on AWS console

Download the Peer code for your new L2 VPN from the “Download config” link on the L2 VPN page in the VMware Cloud on AWS console. It will be available after creating the VPN in the previous step and can be saved as a text file

Create two new port groups for the NSX Autonomous Edge appliance

This is for your on-prem vSphere environment. The official VMware deployment guide suggests creating a port group for the “uplink” and another for the “trunk”. The uplink provides internet access through which the L2 VPN is created. The “trunk” port connects to all VLANs you wish to extend.

In this case I used an existing port group with internet access for the uplink and only created a new one for the trunk.

For the “trunk” port group: Since this PG need to talk to all VLANs you wish to extend, please create it under a vSwitch with uplinks which has those VLANs tagged.

A port group would normally only have a single VLAN set. How do we “catch them all”? Simply set “4095” as the VLAN number. Also set the port group to “Accept” the following:

  • Promiscuous mode
  • MAC Address changes
  • Forged transmits

Deploy the NSX Autonomous Edge appliance

The NSX Autonomous Edge can be deployed as an OVF template. In the on-prem vSphere environment, select deploy OVF template

Browse to where you downloaded the NSX Autonomous Edge appliance from the VMware support page. The downloaded appliance files will likely contain several appliance types using the same base disks. I used the “NSX-l2t-client-large” appliance:

For the network settings:

  • Use any network with a route to the internet and good throughput as the “Public” network.
  • The “Trunk” network should be the port group with VLAN 4095 and the changed security settings we created earlier.
  • The “HA Interface” should be whatever network you wish to use for HA. In this case HA wasn’t used as it was a test deployment, so the same network as “Public” was selected.

For the Customize template part, enter the following:

  • Passwords: Desired passwords
  • Uplink interface: Set the IP you wish the appliance to have on your local network (match with what you set for the “Remote Private IP” in the L2 VPN settings in VMC on AWS at the beginning)
  • L2T: Set the public IP address shown in VMC on AWS console for your L2 VPN and use the Peering code downloaded when creating the L2 VPN at the start.
Use the Public IP (“Local IP Address”) as the “Peer address” when deploying the appliance
Peering code: Make sure to copy the whole thing. It often ends in equal marks and they have to be copied in too(!)
Example of template customization for our test deployment

Enable TCP Loose Setting: To keep any existing connections alive during migration, check this box. For example, if you have an SSH session to the VM you want to migrate which you wish to keep alive.

The Sub interfaces: This is the most vital part and the easiest place to make mistakes. For the Sub interfaces, add your VLAN number followed by the tunnel ID in brackets. This will assign each VLAN a tunnel ID and we will use it on the other end (the cloud side) to separate out the VLANs.

They should be written as: VLAN(tunnel-ID). Example for VLAN 100 and tunnel ID 22 would look like this: 100(22). For our lab we extend VLANs 701 and 702 and will also assign them tunnel IDs which match the VLAN number. For multiple VLANs, use comma followed by space to separate them. Don’t use ranges. Enter each VLAN with its respective tunnel ID individually.

HA index: Funny detail – HA is optional but if you don’t set the HA index on your initial appliance anyway it won’t boot. Even if you don’t intend to use HA, please set this to “0”. HA section is not marked as “required” by the wizard when deploying, so it is fully possible to deploy a non-functioning appliance.

L2 VPN link-up

The L2 VPN tunnel will connect automatically using the settings provided when deploying the appliance. Open the console of the L2 VPN appliance, log in with “admin / <your password>” and issue the command “show service l2vpn”. After a moment the link will come up (provided that the settings used during deployment were correct).

Viewing the L2 VPN tunnel status from the appliance console in vCenter on-prem

In the VMC on AWS console the VPN can also be seen to change status from “Down” to “Success”

Add the extended network segments in the VMC on AWS console

Under the L2 VPN settings tab in the VMC on AWS console it is now time to add the VLANs we want to extend from on-prem. In this example we will add the single VLAN 702 which we gave the tunnel ID “702” during the NSX Autonomous Edge deployment

Adding the VLANs we wish to extend from on-prem in the VMC on AWS console

The extended network can now be viewd under “Segments” in the VMC on AWS console and will be listed as type “Extended”

VM migration using HCX

Now the network has been extended and we can test it by migrating a VM from on-prem to VMC on AWS. To verify if it works we’ll be running Xonotic – an open source shooter – on the VM and run a game throughout the migration.

Starting a nice round of Xonotic deathmatch from our local machine to the VM we are about to migrate to the cloud

Verifying that our HCX link to the VMC on AWS environment is up

Starting the migration by right-clicking the VM in our local on-prem vCenter environment

Selecting our L2 extended network segment as the target network for the virtual machine

Monitoring the migration from the HCX console

If the VM is pinged continuously during migration: Once the migration is complete the ping time will go from sub-millisec to around 35ms (migrating from Tokyo to Seoul in this case)

Throughout migration – and of course after the migration is done – our Xonotic game session is still running, albeit with a new 35ms lag after migration 🙂

Conclusion

That’s it – the network is now extended from on-prem. VMs can be migrated using vMotion via HCX or using HLM (Hybrid Linked Mode) with their existing IPs and uninterrupted service.

Any VMs migrated can be pinged throughout migration and if the “Enable TCP loose Setting” was checked any existing TCP sessions would continue uninterrupted.

Also, any VMs deployed to the extended network on the VMC on AWS side would be able to use DHCP, DNS, etc. served on-prem through the L2 tunnel.

if you followed along this far: Thank you and I hope you now have a fully working L2 extended network to the cloud!

AWS switch role “Invalid information in one or more fields. Check your information or contact your administrator”

You are trying to switch roles in the AWS Console using the role “OrganizationAccountAccessRole” but get the error “Invalid information in one or more fields. Check your information or contact your administrator”.

The role “OrganizationAccountAccessRole” is only added automatically if the account was created from AWS Organizations. If instead you have invited a pre-existing AWS account to join an org you have to create the “OrganizationAccountAccessRole” manually and specify “Another AWS Account” and the account ID of the account used to assume the role from.

Official instructions here: https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_accounts_access.html