Migrating VMs from on-premises vSphere to VMware Cloud on AWS using NetApp SnapMirror

Note: This blog post is part of the 2022 edition of the vExpert Japan Advent Calendar series for the 9th of December.

Migration from an on-premises environment to VMware Cloud on AWS can be done in a variety of ways. The most commonly used (and also recommended) method is Hybrid Cloud Extensions – HCX. However, if VMs are stored on a NetApp ONTAP appliance in the on-prem environment, the volume the VMs reside on can easily be copied to the cloud using SnapMirror. Once copied, the volume can be mounted to VMware Cloud on AWS and the VMs imported. This may be a useful method of migration provided some downtime is acceptable.

Tip: If you are just testing things out, NetApp offers a downloadable virtual ONTAP appliance which can be deployed with all features enabled for 60 days.

Prerequisites

  • Since SnapMirror is a licensed feature, please make sure a license is available on the on-prem environment. FSx for NetApp ONTAP includes SnapMirror functionality
  • SnapMirror only works between a limited range of ONTAP versions. Verify that the on-prem array is compatible with FSxN. The version of FSxN at the time of writing is “NetApp Release 9.11.1P3”. Verify your version (“version” command from CLI) and compare with the list for “SnapMirror DR relationships” provided by NetApp here: https://docs.netapp.com/us-en/ontap/data-protection/compatible-ontap-versions-snapmirror-concept.html#snapmirror-synchronous-relationships
  • Ensure the FSxN ENIs have a security group assigned allowing ICMP and TCP (in and outbound) on ports 11104 and 11105

Outline of steps

  1. Create an FSx for NetApp ONTAP (FSxN) file system
  2. Create a target volume in FSxN
  3. Set up cluster peering between on-prem ONTAP and FSxN
  4. Set up Storage VM (SVM) peering between on-prem ONTAP and FSxN
  5. Configure SnapMirror and Initialize the data sync
  6. Break the mirror (we’ll show deal with the 7 years of bad luck in a future blog post)
  7. Add an NFS mount point for the FSxN volume
  8. Mount the volume on VMware Cloud on AWS
  9. Import the VMs into vCenter
  10. Configure network for the VMs

Architecture diagram

The peering relationship between NetApp ONTAP on-prem and in FSxN requires private connectivity. The diagram shows Direct Connect, but a VPN terminating at the TGW can also be used

Video of the process

This video shows all the steps outlined previously with the exception of creating the FSxN file system – although that is a very simple process and hardly worth covering in detail regardless

Commands

Open SSH sessions to both the on-premises ONTAP array and FSxN. The FSxN username will be “fsxadmin”. If not known, the password can be (re)set through the “Actions” menu under “Update file system” after selecting the FSxN file system in the AWS Console.

Step 1: [FSxN] Create the file system in AWS

The steps for this are straight-forward and already covered in detail here: https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/getting-started-step1.html

Step 2: [FSxN] Create the target volume

Note that the volume is listed as “DP” for Data Protection. This is required for SnapMirror.

FsxId0e4a2ca9c02326f50::> vol create -vserver svm-fsxn-multi-az-2 -volume snapmirrorDest -aggregate aggr1 -size 200g -type DP -tiering-policy all
[Job 1097] Job succeeded: Successful

FsxId0e4a2ca9c02326f50::>
FsxId0e4a2ca9c02326f50::>
FsxId0e4a2ca9c02326f50::> vol show
Vserver   Volume       Aggregate    State      Type       Size  Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----

svm-fsxn-multi-az-2
          onprem_vm_volume_clone
                       aggr1        online     RW         40GB    36.64GB    3%
svm-fsxn-multi-az-2
          snapmirrorDest
                       aggr1        online     DP        200GB    200.0GB    0%
svm-fsxn-multi-az-2
          svm_fsxn_multi_az_2_root
                       aggr1        online     RW          1GB    972.1MB    0%
8 entries were displayed.

FsxId0e4a2ca9c02326f50::>
FsxId0e4a2ca9c02326f50::>

Step 3a: [On-prem] Create the cluster peering relationship

Get the intercluster IP addresses from the on-prem environment

JWR-ONTAP::> network interface show -role intercluster
            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
JWR-ONTAP
            Intercluster-IF-1
                         up/up    10.70.1.121/24     JWR-ONTAP-01  e0a     true
            Intercluster-IF-2
                         up/up    10.70.1.122/24     JWR-ONTAP-01  e0b     true
2 entries were displayed.

Step 3b: [FSxN] Create the cluster peering relationship

FsxId0e4a2ca9c02326f50::> cluster peer create -address-family ipv4 -peer-addrs 10.70.1.121, 10.70.1.122

Notice: Use a generated passphrase or choose a passphrase of 8 or more characters. To ensure the authenticity of the peering relationship, use a phrase or sequence of characters that would be hard to guess.

Enter the passphrase:
Confirm the passphrase:

Notice: Now use the same passphrase in the "cluster peer create" command in the other cluster.

FsxId0e4a2ca9c02326f50::> cluster peer show
Peer Cluster Name         Cluster Serial Number Availability   Authentication
------------------------- --------------------- -------------- --------------
JWR-ONTAP                 1-80-000011           Available      ok

Step 3c: [FSxN] Create the cluster peering relationship

Get the intercluster IP addresses from the FSxN environment

FsxId0e4a2ca9c02326f50::> network interface show -role intercluster
            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
FsxId0e4a2ca9c02326f50
            inter_1      up/up    172.16.0.163/24    FsxId0e4a2ca9c02326f50-01
                                                                   e0e     true
            inter_2      up/up    172.16.1.169/24    FsxId0e4a2ca9c02326f50-02
                                                                   e0e     true
2 entries were displayed.

Step 3d: [On-prem] Create the cluster peering relationship

Use the same passphrase as when using the cluster peer create command on the FSxN side in Step 3b

JWR-ONTAP::> cluster peer create -address-family ipv4 -peer-addrs 172.16.0.163, 172.16.0.163

Step 4a: [FSxN] Create the Storage VM (SVM) peering relationship

FsxId0e4a2ca9c02326f50::> vserver peer create -vserver svm-fsxn-multi-az-2 -peer-vserver svm0 -peer-cluster JWR-ONTAP -applications snapmirror -local-name onprem

Info: [Job 145] 'vserver peer create' job queued

Step 4b: [On-prem] Create the Storage VM (SVM) peering relationship

After the peer accept command completes, verify the relationship using “vserver peer show-all”.

JWR-ONTAP::> vserver peer accept -vserver svm0 -peer-vserver svm-fsxn-multi-az-2 -local-name fsxn-peer

Step 5a: [FSxN] Create the SnapMirror relationship

FsxId0e4a2ca9c02326f50::> snapmirror create -source-path onprem:vmware -destination-path svm-fsxn-multi-az-2:snapmirrorDest -vserver svm-fsxn-multi-az-2 -throttle unlimited
Operation succeeded: snapmirror create for the relationship with destination "svm-fsxn-multi-az-2:snapmirrorDest".

FsxId0e4a2ca9c02326f50::> snapmirror show
                                                                       Progress
Source            Destination Mirror  Relationship   Total             Last
Path        Type  Path        State   Status         Progress  Healthy Updated
----------- ---- ------------ ------- -------------- --------- ------- --------
onprem:vmware
XDP  svm-fsxn-multi-az-2:snapmirrorDest
Uninitialized
Idle           -         true    -

Step 5b: [FSxN] Initialize the SnapMirror relationship

This will start the data copy from on-prem to AWS

FsxId0e4a2ca9c02326f50::> snapmirror initialize -destination-path svm-fsxn-multi-az-2:snapmirrorDest -source-path onprem:vmware
Operation is queued: snapmirror initialize of destination "svm-fsxn-multi-az-2:snapmirrorDest".

FsxId0e4a2ca9c02326f50::> snapmirror show
                                                                     Progress
Source            Destination Mirror  Relationship   Total             Last
Path        Type  Path        State   Status         Progress  Healthy Updated
----------- ---- ------------ ------- -------------- --------- ------- --------
onprem:vmware
          XDP  svm-fsxn-multi-az-2:snapmirrorDest
                            Uninitialized
                                    Transferring   0B        true    09/20 08:55:05

FsxId0e4a2ca9c02326f50::> snapmirror show
                                                                     Progress
Source            Destination Mirror  Relationship   Total             Last
Path        Type  Path        State   Status         Progress  Healthy Updated
----------- ---- ------------ ------- -------------- --------- ------- --------
onprem:vmware
          XDP  svm-fsxn-multi-az-2:snapmirrorDest
                            Snapmirrored
                                    Finalizing     0B        true    09/20 08:58:46

FsxId0e4a2ca9c02326f50::> snapmirror show
                                                                     Progress
Source            Destination Mirror  Relationship   Total             Last
Path        Type  Path        State   Status         Progress  Healthy Updated
----------- ---- ------------ ------- -------------- --------- ------- --------
onprem:vmware
          XDP  svm-fsxn-multi-az-2:snapmirrorDest
                            Snapmirrored
                                    Idle           -         true    -

FsxId0e4a2ca9c02326f50::>

Step 6: [FSxN] Break the mirror

FsxId0e4a2ca9c02326f50::> snapmirror break -destination-path svm-fsxn-multi-az-2:snapmirrorDest
Operation succeeded: snapmirror break for destination "svm-fsxn-multi-az-2:snapmirrorDest".

Step 7: [FSxN] Add an NFS mount point for the FSxN volume

FsxId0e4a2ca9c02326f50::> volume mount -volume snapmirrorDest -junction-path /fsxn-snapmirror-volume

Step 8: [VMC] Mount the FSxN volume in VMware Cloud on AWS

Step 9: [VMC] Import the VMs into vCenter in VMware Cloud on AWS

This can be done manually as per the screenshot below, or automated with a script

Manual import of VMs from the FSxN volume into VMware Cloud on AWS

Importing using a Python script (initial release – may have rough edges): https://github.com/jonas-werner/vmware-vm-import-from-datastore/blob/main/registerVm.py

Video on how to use the script can be found here:

Step 10: [VMC] Configure the VM network prior to powering on

Wrap-up

That’s all there is to migrating VMs using SnapMirror between on-prem VMware and VMware Cloud on AWS environments. Hopefully this has been useful. Thank you for reading!

日本VMUGミーティングのプレゼン

11月29日日本VMUG(VMware User Group)ミーティングでプレゼンしました。タイトルは:”塩漬け OS を如何に保護するか、VMware Cloud on AWS の仮想マシンのセキュリティを高めるには – Jonas Werner”でした。録画は以下になります。

他のセッションもいっぱいありました。もし興味ありましたらそこのレコーディングもご覧ください:https://github.com/gowatana/japan-vmug-vexpert-talks/blob/main/README.md#21-2022%E5%B9%B411%E6%9C%8829%E6%97%A5%E7%81%AB1800—2000-recording-playlist

Tutorial for deploying and configuring VMware HCX in both on-premises and VMware Cloud on AWS with service mesh creation and L2 extension

Deploying HCX (VMware Hybrid Cloud Extensions) is considered to be complex and difficult by most. It doesn’t help that it’s usually one of those things you’d only do once so it doesn’t pay to spend a lot of effort to learn. However, as with everything it’s not hard once you know how to do it. This video aims to show how to deploy HCX both in VMC (VMware Cloud on AWS) and in the on-premises DC or lab.

It uses both the method of creating the service mesh over the internet as well as how to create it over a private connection, like DX (AWS Direct Connect) or a VPN.

A VPN cannot be used for L2 Extension if it is terminated on the VMC SDDC. In this tutorial I’ll use a VPN which is terminated on an AWS TGW which is in turn peered with a VTGW connected to the SDDC we’re attaching to.

Video chapters

  1. Switching vCenter to private IP and deploying HCX Cloud in VMC: https://youtu.be/ho2DY-TP-SA?t=43
  2. Initial SDDC firewall configuration: https://youtu.be/ho2DY-TP-SA?t=97
  3. Switching HCX to private IP and adding HCX firewall rules: https://youtu.be/ho2DY-TP-SA?t=405
  4. Downloading and deploying HCX for the on-prem DC side: https://youtu.be/ho2DY-TP-SA?t=585
  5. Adding HCX license, linking on-prem HCX with vCenter: https://youtu.be/ho2DY-TP-SA?t=740
  6. HCX site pairing between HCX Connector and HCX Cloud: https://youtu.be/ho2DY-TP-SA?t=959
  7. Creating HCX Network and Compute profiles: https://youtu.be/ho2DY-TP-SA?t=1011
  8. Choice: Deploy service mesh over public IP or private IP: https://youtu.be/ho2DY-TP-SA?t=1374
  9. Deploy service mesh over public IP: https://youtu.be/ho2DY-TP-SA?t=1399
  10. Live migrating a VM to AWS: https://youtu.be/ho2DY-TP-SA?t=1679
  11. Deploy service mesh over private IP (DX, VPN to TGW): https://youtu.be/ho2DY-TP-SA?t=1789

Some architecture diagrams for reference

Connecting all over the public internet is one method
The best performance may be had over a dedicated DX Private VIF to the SDDC
Separating the management traffic over a VPN while doing the L2 Extension over the internet is a bit of a hybrid
For the setup used in the tutorial I use a VPN to a TGW which is peered with a VTGW

Migrate VMware VMs from an on-prem DC to VMware Cloud on AWS (VMC) using Veeam Backup and Replication

When migrating from an on-premises DC to VMware Cloud on AWS it is usually recommended to use Hybrid Cloud Extension (HCX) from VMware. However, in some cases the IT team managing the on-prem DC is already using Veeam for backup and want to use their solution also for the migration.

They may also prefer Veeam over HCX as HCX often requires professional services assistance for setup and migration planning. In addition, since HCX is primarily a tool for migrations, the customer is unlikely to have had experience setting it up in the past and while it is an excellent tool there is a learning curve to get started.

Migrating with Veeam vs. Migrating with HCX

Veeam Backup & RecoveryVMware Hybrid Cloud Extension (HCX)
Licensed (non-free) solutionFree with VMware Cloud on AWS
Arguably easy to set up and configureArguably challenging to set up and configure
Can do offline migrations of VMs, single or in bulkCan do online migrations (no downtime), offline migrations, bulk migrations and online migrations in bulk (RAV), etc.
Can not do L2 extensionCan do L2 extension of VLANs if they are connected to a vDS
Can be used for backup of VMs after they have been migratedIs primarily used for migration. Does not have backup functionality
Support for migrating from older on-prem vSphere environmentsAt time of writing, full support for on-prem vSphere 6.5 or newer. Limited support for vSphere 6.0 up to March 12th 2023

What we are building

This guide covers installing and configuring a single Veeam Backup and Recovery installation in the on-prem VMware environment and linking it to both vCenter on-prem as well as in VMware Cloud on AWS. Finally we do an offline migration of a VM to the cloud to prove it that it works.

Prerequisites

The guide assumes the following is already set up and available

  • On-premises vSphere environment with admin access (7.0 used in this example)
  • Windows Server VM to be used for Veeam install
    • Min spec here
    • Windows Server 2019 was used for this guide
    • Note: I initially used 2 vCPU, 4GB RAM and 60 GB HDD for my Veeam VM but during the first migration the entire thing stalled and wouldn’t finish. After changing to 4 vCPU, 32Gb RAM and 170 GB HDD the migration finished quickly and with no errors. Recommend to assign as much resources as is practical to the Veeam VM to facilitate and speed up the migration
  • One VMware Cloud on AWS (VMC) Software Defined Datacenter (SDDC)
  • Private IP connectivity to the VMC SDDC
    • Use Direct Connect (DX) or VPN but it must be private IP connectivity or it won’t work
    • For this setup I used a VPN to a TGW, then a peering to a VMware Transit Connect (VTGW) which had an attachment to the SDDC, but any private connectivity setup will be OK
  • A test VM to use for migration

Downloading and installing Veeam

Unless you already have a licensed copy, sign up for a trial license and then download Veeam Backup and Recovery from here. Version 11.0.1.1216 used in this guide.

In your on-premises vSphere environment, create or select a Windows Server VM to use for the Veeam installation. The VM spec used for this install are as follows:

Run the install with default settings (next, next, next, etc.)

Register the on-prem vCenter in Veeam

Navigate to “Inventory” at the bottom left, then “Virtual Infrastructure” and click “Add Server” to register the on-prem vCenter server

Listing VMs in the on-prem vSphere environment after the vCenter server has been registered in the Veeam Backup & Recovery console

Switching on-prem connectivity to VMware Cloud on AWS SDDC to use private IP addresses

For this setup there is a VPN from the on-premises DC to the SDDC (via a TGW and VTGW in this case) but the SDDC FQDN is still configured to return the public IP address. Let’s verify by pinging the FQDN

Switching the SDDC to return the private IP is easy. In the VMware Cloud on AWS web console, navigate to “Settings” and flip the IP to return from public to private

Ping the vCenter FQDN again to verify that private IP is returned by DNS and that we can ping it successfully over the VPN

All looks good. The private IP is returned. Time to register the VMware Cloud on AWS vCenter instance in the Veeam console

Registering the VMC vCenter instance with Veeam

Just use the same method as used when adding the on-premises vCenter server: Navigate to “Inventory” at the bottom left, then “Virtual Infrastructure” and click “Add Server” to register the on-prem vCenter server with Veeam

Note: If the SDDC vCenter had not been switched to use a private IP there will be an error in listing the data stores. Subsequently when migrating a VM the target data store won’t be listed and the migration can’t be started

After adding the VMware Cloud on AWS SDDC vCenter the resource pools will be visible in the Veeam console

Now both vSphere environments are registered. Time to migrate a VM to the cloud!

Migrating a VM to VMware Cloud on AWS

Below is both a video and a series of screenshots describing the migration / replication job creation for the VM.

Creating some test files on the source VM to be migrated

Navigate to “Inventory” using the bottom left menu, click the on-premises vCenter server / Cluster and locate a VM to migrate in the on-premises DC VM inventory. Right-click the VM to migrate and create a replication job

When selecting the target for the replication, be sure to expand the VMware cloud on AWS cluster and select one of the ESXi servers. Selecting the cluster is not enough to list up the required resources, like storage volumes

Since VMC is a managed environment we want to select the customer-side of the storage, folder and resource pools

If you checked the box for remapping the network is even possible to select a target VLAN for the VM to be connected to on the cloud side!

Select to start the “Run the job when I click finish” and move to the “Home” tab to view the “Running jobs”

The migration of the test VM finished in less than 9 minutes

In the vCenter client for VMware Cloud on AWS we can verify that the replicated VM is present

After logging in and listing the files we can verify that the VM is not only working but also have the test files present in the home directory

Thank you for reading! Hopefully this has provided an easy-to-understand summary of the steps required for a successful migration / replication of VMs to VMC using Veeam

Leveraging OpenStack for Deep Learning & Machine Learning with GPU pass-through

As part of preparing for OpenStack days in Tokyo 2017 I built an environment to show how GPU pass-through can be used on OpenStack as a means of providing instances ready for Machine learning and Deep learning. This is a rundown of the process

Introduction

Deep Learning and Machine Learning have in recent years grown to become increasingly vital in the advancement of humanity in key areas such as life sciences, medicine and artificial intelligence. Traditionally it has been difficult and costly to create scalable, self-service environments to enable developers and end users alike to leverage these technological advancements. In this post we’ll look at the practical steps for the process of enable GPU powered virtual instances on Red Hat OpenStack. These can in turn be utilized by research staff to run in-house or commercial software for Deep Learning and Machine Learning.

Benefits

Virtual instances for Deep Learning and Machine Learning become easy and quick to create and consume. The addition of GPU powered Nova compute nodes can be done smoothly with no impact to existing cloud infrastructure. Users can choose from multiple GPU types and virtual machine types and the Nova Scheduler will be aware of where the required GPU resources reside for instance creation.

Prerequisites

This post describes how to modify key OpenStack services on an already deployed cloud to allow for GPU pass-through and subsequent assignment to virtual instances. As such it assumes an already functional Red Hat OpenStack overcloud is available. The environment used for the example in this document is running Red Hat OSP10 (Newton) on Dell EMC PowerEdge servers. The GPU enabled servers used for this example are PowerEdge C4130’s with NVIDIA M60 GPUs.

Process outline

After a Nova compute node with GPUs has been added to the cluster using Ironic bare-metal provisioning the following steps are taken:

  • Disabling the Nouveau driver on the GPU compute node
  • Enabling IOMMU in the kernel boot options
  • Modifying the Nova compute service to allow PCIe pass-through
  • Modifying the Nova scheduler service to filter on the GPU ID
  • Creating a flavor utilizing the GPU ID

Each step is described in more detail below.

Disabling the Nouveau driver on the GPU compute node

On the Undercloud, list the current Overcloud server nodes

[stack@toksc-osp10b-dir-01 ~]$ nova list

+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks            |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| 8449f79f-fc17-4927-a2f3-5aefc7692154 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.14 |
| ac063e8d-9762-4f2a-bf19-bd90de726be4 | overcloud-cephstorage-1 | ACTIVE | -          | Running     | ctlplane=192.0.2.9  |
| b7410a12-b752-455c-8146-d856f9e6c5ab | overcloud-cephstorage-2 | ACTIVE | -          | Running     | ctlplane=192.0.2.12 |
| 4853962d-4fd8-466d-bcdb-c62df41bd953 | overcloud-cephstorage-3 | ACTIVE | -          | Running     | ctlplane=192.0.2.17 |
| 6ceb66b4-3b70-4171-ba4a-e0eff1f677a9 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=192.0.2.16 |
| 00c7d048-d9dd-4279-9919-7d1c86974c46 | overcloud-compute-1     | ACTIVE | -          | Running     | ctlplane=192.0.2.19 |
| 2700095a-319c-4b5d-8b17-96ddadca96f9 | overcloud-compute-2     | ACTIVE | -          | Running     | ctlplane=192.0.2.21 |
| 0d210259-44a7-4804-b084-f2af1506305b | overcloud-compute-3     | ACTIVE | -          | Running     | ctlplane=192.0.2.15 |
| e469714f-ce40-4b55-921e-bcadcb2ae231 | overcloud-compute-4     | ACTIVE | -          | Running     | ctlplane=192.0.2.10 |
| fefd2dcd-5bf7-4ac5-a7a4-ed9f70c63155 | overcloud-compute-5     | ACTIVE | -          | Running     | ctlplane=192.0.2.13 |
| 085cce69-216b-4090-b825-bdcc4f5d6efa | overcloud-compute-6     | ACTIVE | -          | Running     | ctlplane=192.0.2.20 |
| 64065ea7-9e69-47fe-ad87-ed787f671621 | overcloud-compute-7     | ACTIVE | -          | Running     | ctlplane=192.0.2.18 |
| cff03230-4751-462f-a6b4-6578bd5b9602 | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.0.2.22 |
| 333b84fc-142c-40cb-9b8d-1566f7a6a384 | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.0.2.24 |
| 20ffdd99-330f-4164-831b-394eaa540133 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.0.2.11 |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+

Compute nodes 6 and 7 are equipped with NVIDIA M60 GPU cards. Node 6 will be used for this example.

From the Undercloud, SSH to the GPU compute node:

[stack@toksc-osp10b-dir-01 ~]$ ssh heat-admin@192.0.2.20
Last login: Tue May 30 06:36:38 2017 from gateway
[heat-admin@overcloud-compute-6 ~]$
[heat-admin@overcloud-compute-6 ~]$

Verify that the NVIDIA GPU cards are present and recognized:
[heat-admin@overcloud-compute-6 ~]$ lspci -nn | grep NVIDIA
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Tesla M60] [10de:13f2] (rev a1)
05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Tesla M60] [10de:13f2] (rev a1)
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Tesla M60] [10de:13f2] (rev a1)
85:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Tesla M60] [10de:13f2] (rev a1)

Use the device ID obtained in the previous command to check if the Nouveau driver is currently in use for the GPUs:
[heat-admin@overcloud-compute-6 ~]$ lspci -nnk -d 10de:13f2
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Tesla M60] [10de:13f2] (rev a1)
                Subsystem: NVIDIA Corporation Device [10de:115e]
                Kernel driver in use: nouveau
                Kernel modules: nouveau

 

Disable the Nouveau driver and enable IOMMU in the kernel boot options:

[heat-admin@overcloud-compute-6 ~]$ sudo su -
Last login: 火  5月 30 06:37:02 UTC 2017 on pts/0
[root@overcloud-compute-6 ~]#
[root@overcloud-compute-6 ~]# cd /boot/grub2/

Make a backup of the grub.cfg file before modifying it:
[root@overcloud-compute-6 grub2]# cp -p grub.cfg grub.cfg.orig.`date +%Y-%m-%d_%H-%M`
[root@overcloud-compute-6 grub2]# vi grub.cfg

Modify the following line and append the Noveau blacklist and Intel IOMMU options:
linux16 /boot/vmlinuz-3.10.0-514.2.2.el7.x86_64 root=UUID=a69bf0c7-8d41-42c5-b1f0-e64719aa7ffb ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet

After modification:
linux16 /boot/vmlinuz-3.10.0-514.2.2.el7.x86_64 root=UUID=a69bf0c7-8d41-42c5-b1f0-e64719aa7ffb ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet modprobe.blacklist=nouveau intel_iommu=on iommu=pt

Also modify the rescue boot option:
linux16 /boot/vmlinuz-0-rescue-e1622fe8eb7d44d0a2d57ce6991b2120 root=UUID=a69bf0c7-8d41-42c5-b1f0-e64719aa7ffb ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet

After modification:
linux16 /boot/vmlinuz-0-rescue-e1622fe8eb7d44d0a2d57ce6991b2120 root=UUID=a69bf0c7-8d41-42c5-b1f0-e64719aa7ffb ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet modprobe.blacklist=nouveau intel_iommu=on iommu=pt

Make the same modifications to “/etc/defaults/grub”:
[heat-admin@overcloud-compute-6 ~]$ vi /etc/default/grub

Re-generate the GRUB configuration files with grub2-mkconfig:
[root@overcloud-compute-6 grub2]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-514.2.2.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-514.2.2.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-e1622fe8eb7d44d0a2d57ce6991b2120
Found initrd image: /boot/initramfs-0-rescue-e1622fe8eb7d44d0a2d57ce6991b2120.img
done

Reboot the Nova compute node:
[root@overcloud-compute-6 grub2]# reboot
PolicyKit daemon disconnected from the bus.
We are no longer a registered authentication agent.
Connection to 192.0.2.20 closed by remote host.
Connection to 192.0.2.20 closed.<\pre>

After the reboot is complete, SSH to the node to verify that the Nouveau module is no longer active for the GPUs:

[stack@toksc-osp10b-dir-01 ~]$ ssh heat-admin@192.0.2.20
Last login: Tue May 30 07:39:42 2017 from 192.0.2.1
[heat-admin@overcloud-compute-6 ~]$
[heat-admin@overcloud-compute-6 ~]$
[heat-admin@overcloud-compute-6 ~]$
[heat-admin@overcloud-compute-6 ~]$ lspci -nnk -d 10de:13f2
04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Tesla M60] [10de:13f2] (rev a1)
                Subsystem: NVIDIA Corporation Device [10de:115e]
                Kernel modules: nouveau

The Kernel module is present but not listed as being active. PCIe pass-through is now possible.

Modifying the Nova compute service to allow PCIe pass-through

From the Undercloud, SSH to the compute node and become root with sudo:

[stack@toksc-osp10b-dir-01 ~]$ ssh heat-admin@192.0.2.20
[heat-admin@overcloud-compute-6 ~]$ sudo su -
Last login: 火  5月 30 07:40:13 UTC 2017 on pts/0

Backup the nova.conf file and edit the configuration file:
[root@overcloud-compute-6 ~]# cd /etc/nova
[root@overcloud-compute-6 nova]# cp -p nova.conf nova.conf.orig.`date +%Y-%m-%d_%H-%M`
[root@overcloud-compute-6 nova]# vi nova.conf

Add the following two lines at the beginning of the “[DEFAULT]” section:
pci_alias = { "vendor_id":"10de", "product_id":"13f2", "device_type":"type-PCI", "name":"m60" }
pci_passthrough_whitelist = { "vendor_id": "10de", "product_id": "13f2" }

Note:
The values for “vendor_id” and “product_id” can be found in the output of “lspci -nn | grep NVIDIA” as shown earlier. Note that the PCIe alias and whitelist is made on a Vendor / Product basis. This means no specific data for each PCIe device is required and new cards of the same type can be added and used without having to modify the configuration file.

The value for “name” is arbitrary and can be anything. However, it will be used to filter on the GPU type later and a brief, descriptive name is suggested as best-practice. A value of “m60” is used in this example.

Restart the Nova compute service:

[root@overcloud-compute-6 nova]# systemctl restart openstack-nova-compute.service

Modifying the Nova scheduler service to filter on the GPU ID

On each of the Nova Controller nodes, perform the following steps:
From the Undercloud, SSH to the controller nodes and become root with sudo:

[stack@toksc-osp10b-dir-01 ~]$ ssh heat-admin@192.0.2.20
[heat-admin@overcloud-compute-6 ~]$ sudo su -
Last login: 火  5月 30 07:40:13 UTC 2017 on pts/0

Create a backup and then modify the nova.conf configuration file:
[root@ overcloud-controller-0 ~]# cd /etc/nova
[root@ overcloud-controller-0 ~]# cp -p nova.conf nova.conf.orig.`date +%Y-%m-%d_%H-%M`
[root@ overcloud-controller-0 ~]# vi nova.conf

Add the following three lines at the beginning of the “[DEFAULT]” section:
pci_alias = { "vendor_id":"10de", "product_id":"13f2", "device_type":"type-PCI", "name":"m60" }
pci_passthrough_whitelist = { "vendor_id": "10de", "product_id": "13f2" }
scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, DiskFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter

Note: Ensure to match the values for “vendor_id”, “product_id” and “name” with those used while modifying the nova.conf file on the Nova compute node.

Note: Also change “scheduler_use_baremetal_filters” from “False” to “True”

Restart the nova-scheduler service:

[root@ overcloud-controller-0 ~]# systemctl restart openstack-nova-scheduler.service

Creating a flavor utilizing the GPU ID

The only step remaining is to create a flavor to utilize the GPU. For this a flavor containing a PCIe filter matching the “name” value in the nova.conf files will be created.
Create the base flavor without PCIe passthrough alias:

[stack@toksc-osp10b-dir-01 ~]$ openstack flavor create gpu-mid-01 --ram 4096 --disk 15 --vcpus 4
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| disk                       | 15                                   |
| id                         | 04447428-3944-4909-99d5-d5eaf6e83191 |
| name                       | gpu-mid-01                           |
| os-flavor-access:is_public | True                                 |
| properties                 |                                      |
| ram                        | 4096                                 |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 4                                    |
+----------------------------+--------------------------------------+

Check that the flavor has been created correctly:
[stack@toksc-osp10b-dir-01 ~]$ openstack flavor list
+--------------------------------------+------------+------+------+-----------+-------+-----------+
| ID                                   | Name       |  RAM | Disk | Ephemeral | VCPUs | Is Public |
+--------------------------------------+------------+------+------+-----------+-------+-----------+
| 04447428-3944-4909-99d5-d5eaf6e83191 | gpu-mid-01 | 4096 |   15 |         0 |     4 | True      |
+--------------------------------------+------------+------+------+-----------+-------+-----------+

Add the PCIe passthrough alias information to the flavor:
[stack@toksc-osp10b-dir-01 ~]$ openstack flavor set gpu-mid-01 --property "pci_passthrough:alias"="m60:1"

Note: The “m60:1” indicate that one (1) of the specified resource – in this case a GPU, is requested. If more than one GPU is required for a particular flavor, just modify the value. For example: “m60:2” for a dual-GPU flavor.

Verify that the flavor has been modified correctly:

[stack@toksc-osp10b-dir-01 ~]$ nova flavor-show gpu-mid-01
+----------------------------+--------------------------------------+
| Property                   | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| disk                       | 15                                   |
| extra_specs                | {"pci_passthrough:alias": "m60"}     |
| id                         | 04447428-3944-4909-99d5-d5eaf6e83191 |
| name                       | gpu-mid-01                           |
| os-flavor-access:is_public | True                                 |
| ram                        | 4096                                 |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 4                                    |
+----------------------------+--------------------------------------+

That is all. Instances with the GPU flavor can now be created via the command line or the Horizon web interface.