Category Archives: VMware

Datrium Design – Architecture Matters

Lame Joke: What do you get when you stick NVMe-based SSD onto an All-Flash Array or Hyper-Converged Node?

Genuine Answer: A Bottleneck of course!

As flash technologies advance and increase in performance, existing (and upcoming) network infrastructure cannot meet the demands of Next-Gen NAND technologies, such as 3DXPoint.
This chart compares saturation rates of 10GbE, 40GbE, and 100GbE with various flash offerings.

 

Datrium was founded by Ex-Founders and Principal Architects of Companies like Data Domain and VMware, so it’s safe to say they know a thing or two about architecture. Their approach to overcoming some of the shortcoming in Traditional Converged and HyperConverged (HCI) platforms boils down to the following shift in architecture design:

Move the I/O Processing to a stateless compute nodes

Architectural Overview
There are basically two components to Datrium’s Open Convergence architecture.

Compute Nodes
Computer Nodes are Servers of any brand the customer would like to use. The more RAM and Flash these servers have, the more powerful the overall architecture. Each Server Node get’s Datrium’s DVX software installed into the userspace on the hypervisor.
Every compute node is responsible for data services (Deduplication, Compression, Erasure Coding, and Encryption). These nodes pull copies of data from Data Nodes (the next component we will address shortly), and keep that data in a stateless fashion, before the data is sent to the Data Nodes.

Data Nodes
The DVX Data Nodes are Hybrid or All-Flash Disk Enclosures that are purchased from Datrium.  (You can’t use your own Data Nodes). Since all data is processed on the server nodes, there is no data processing happening at the data node layer. This allows the data nodes to keep data that is only accessed if the data copies are not available in flash/cache on the compute nodes. The data that resides on the data nodes is heavily protected for resiliency.

Open Convergence is Datrium’s marketing term for this improved architecture, but taking the marketing out of the discussion, here is how Datrium solves for business outcomes:

  1. Simpler than HyperConverged
    – Zero HCI Cluster configuration or cluster sprawl
    – Independently and Simply provision compute or storage
    – Flexibly support any mix of hosts or hypervisors
    – No vendor lock-in on compute resources. Use existing compute hardware
  2. Faster than All-Flash Arrays
    – Flash is on the server, where is performs much faster
    – No Controller Bottlenecks
    – Performance scales with each server
  3. No Backup Silos
    – One console for VM consolidation and data protection
    – Reduce Management time for Backup, DR, Copy Data Management
    – Eliminate dedicated backup devices

Image result for datrium architecture

If you need a lightning fast, resilient, scalable, cloud-enabled architecture, Datrium might be exactly what you need. Because in the end,  Architecture Matters.

 

pRDM and vRDM to VMDK Migrations

I was assisting an amazing client in moving some VMs off an older storage array and onto a newer storage platform. They had some VMs that had Physical RDMs (pRDM) attached to the VMs, and we wanted them living as VMDKs on the new SAN.
Traditionally, I have always shutdown the VM, remove the pRDM, re-add with vRDM, and then do the migration, but found an awesome write-up on a few separate ways in doing this.
(Credit of the following content goes to Cormac Hogan of VMware)

VM with Physical (Pass-Thru) RDMs (Powered On – Storage vMotion):

  • If I try to change the format to thin or thick, then no Storage vMotion allowed.
  • If I chose not to do any conversion, only the pRDM mapping file is moved from the source VMFS datastore to the destination VMFS datastore – the data stays on the original LUN.

 

VM with Virtual (non Pass-Thru) RDMs (Power On – Storage vMotion):

  • On a migrate, if I chose to covert the format in the advanced view, the vRDM is converted to a VMDK on the destination VMFS datastore.
  • If I chose not to do any conversion, only the vRDM mapping file is moved from the source VMFS datastore to the destination VMFS datastore – the data stays on the original LUN (same behaviour as pRDM)

 

VM with Physical (Pass-Thru) RDMs (Powered Off – Cold Migration):

  • On a migrate, if I chose to change the format (via the advanced view), the pRDM is converted to a VMDKon the destination VMFS datastore.
  • If I chose not to do any conversion, only the pRDM mapping file is moved from the source VMFS datastore to the destination VMFS datastore – the data stays on the original LUN

 

VM with Virtual (non Pass-Thru) RDMs (Power Off – Cold Migration):

  • On a migrate, if I chose to covert the format in the advanced view, the vRDM is converted to a VMDK on the destination VMFS datastore.
  • If I chose not to do any conversion, only the vRDM mapping file is moved from the source VMFS datastore to the destination VMFS datastore – the data stays on the original LUN (same behaviour as pRDM).

vSphere Web Client Integration Plugin Not Working

When trying to manage your vSphere environment using the web client (or forced to in 6.5+), the Web Client Integration plugin is required to make use of many features the web client has to offer, like remote console, enhanced authentication, and deploying OVF appliances.

If you have downloaded and installed the plugin, but IE, Chrome, or Firefox do not activate the plugin, it can most likely be resolved by doing one of the following:

  1. Add the vCenter FQDN to the trusted site list:
    For vSphere 6.0-6.5: https://vCenter_FQDN
    For vSphere 5.5: https://vCenter_FQDN:9443
  2. Add the vCenter FQDN to the Local Intranet list (IE & Chrome)
  3. Uninstall Plugin, Clear Cache/Cookies, Reinstall Plugin, and Repeat option 1

 

HPE Proliant G7 Servers and vSphere 6.5 Purple Screen of Death

Upgrading VMware to ESXi 6.5 on HP G7 Servers will crash and cause you to scream and will require you to waste your time building a custom ISO that HPE could have easily done.
Best practice is to use the vendor’s custom ISO’s that have the hardware drivers integrated, so I used HPE’s latest Custom ISO.

HPE G7 Server support is being dropped by both HPE and VMware. In fact, vSphere 6.5 is supposedly the last version that will support the G7s. Knowing this info, I assumed upgrading from ESXi 6.0 to 6.5 on G7 would work, but I found out quickly that after the upgrade the hosts would “Purple Screen of Death” (PSOD) right after boot.

The Error: “PF Exception 14 in world 67667:sfcb-smx IP 0x0 addr 0x0″

The Issue: There are incompatible driver(s) in the customized ISO from HPE. Yes, there are more than one driver with issues.

The Workarounds: There are various workarounds that I have personally found to work, while others have been resolutions I have read about after I dealt with this, so I was not able to verify that they do indeed work, but I will list them nevertheless. Upgrading the firmware, BIOS, etc did not resolve the issue.
Note: All these workaround require a fresh install of ESXi. Running an Upgrade does not remove the incompatible drivers, and the host doesn’t stay alive long enough before crashing to manually remove them via SSH.

Solution 1: Use VMware’s Standard ISO Media
While this goes against many best practices, VMware doesnt offer too many vendor drivers in their ISO builds, so the offending drivers do not get installed and crash the system. While you can certainly use this method, you will want to follow-up and manually install the appropriate driver VIBs from HPE.

Solution 2: Build your own Custom ISO
This takes a bit more work, but is probably the most comprehensive path to resolution. You will basically need to remove drivers from the HPE Customized 6.5 ISO and inject those from the 6.0 ISO. The following are instructions on doing this.

Create Custom VMware ESXi Media

Prerequisites:

Instructions:

  • Launch vSphere PowerCLI

  • Add the HP ESXi 6.5 image bundle
    Add-EsxSoftwareDepot -DepotUrl C:\ESXi\HPE-6_5.zip

  • Check the Profile
    Get-EsxImageProfile

  • Copy the Profile
    New-EsxImageProfile -CloneProfile HPE-ESXi-6.5.0-OS-Release-6* -Name “G7-ESXi”


    Use “HPE Custom” for Vendor

  • Check the Profile
    Get-EsxImageProfile

  • Remove the driver from the image
    Remove-EsxSoftwarePackage G7-ESXi hpe-smx-provider

  • Add the HP ESXi 6.0 image bundle
    Add-EsxSoftwareDepot -DepotUrl C:\ESXi\HPE-6_0.zip
  • Check the Profile
    Get-EsxImageProfile

  • View both drivers in the two bundles
    Get-EsxSoftwarePackage | findstr smx

  • Add the necessary driver into the custom build
    add-esxsoftwarepackage -imageprofile G7-ESXi -softwarepackage “hpe-smx-provider 600.03.11.00.9-2768847”

  • Convert your custom bundle to ISO
    Export-EsxImageProfile -ImageProfile G7-ESXi -ExportToIso -filepath “C:\ESXi\G7-ESXi.iso”

  • Now take that ISO file that was created and use it to do a FRESH INSTALL. (Remember, upgrade will not work).

While attempting to upgrade a ESXi host from 6.0.0.3073146 to the latest 6.x build (6.0.0.update02-4192238) via CLI (see my post here about pathcing via CLI)

I got the following error:

[DependencyError]
VIB VMware_bootbank_esx-base_6.0.0-2.43.4192238 requires vsan >= 6.0.0-2.43, but the requirement cannot be satisfied within the ImageProfile.
VIB VMware_bootbank_esx-base_6.0.0-2.43.4192238 requires vsan << 6.0.0-2.44, but the requirement cannot be satisfied within the ImageProfile.
Please refer to the log file for more details.

The exact build on the error may be different on yours, but the issue is the same. I found this KB from VMware and decided to make a post that gets right to the point: VMware KB

This error occurs because the newest version of VSAN (which is built into ESXi) is looking for a specific base hypervisor build (esx-base). In order to run the update successfully, you’ll need to define the update profile for the VIB you are using. Its actually a lot easier than it may sound.

First, lets find the software profile the VIB you will be using contains. Run the following command, pointing the destination to the .zip VIB you uploaded to a datastore on the host.

esxcli software sources profile list -d <location_of_the_esxi_zip_bundle_on_the_datastore>

It will output something similiar to this:

That Name is the Profile you will need to add to your update command.
So in my case, the update command would look like this (highlighting added for emphasis):

esxcli software profile update -d /vmfs/volumes/datastore1/VMware-ESXi-6.0.0.update02-4192238.x86_64-Dell_Customized-offline-bundle-A04.zip -p Dell-ESXi-6.0U2-4192238-A04

It should update and finish with no errors:

The final step is to issue a reboot command, and you are done.

Create Bootable VMware ESXi Installer USB Drive

Getting ESXi installed on a server today is more often done through the servers BMC (iLO, iDRAC, CMC, etc). But this guide might be helpful when installing vSphere on a standalone server. The tool of choice for any bootable USB is my friend Rufus.

There are three things you will need to do this:

  • Download Rufus Here
  • Download whatever .iSO image you want to be bootable (whether its WIndows, ESXi, or Linux).
  • Use a somewhat quality USB Flash Drive (1GB or larger). For some reason, I will run into some cheap-o thumb drives that do not boot anything. If your boot drive doesn’t work, try a different flash drive

 

Here are the easy steps:

  • Insert your blank (or soon to be formatted) flash drive into your PC
  • Open Rufus

  • Under Device, select the flash drive you wish to format and use
  • Select MBR partition Scheme for BIOS or UEFI
  • Filesystem = Fat32
  • Use default Cluster Size (4096 bytes)
  • Click the icon next to FreeDOS and select your ISO image
  • Rename the New Volume Label to whatever you wish to see when you insert the flash drive into a PC
  • Click Start

  • When prompted to replace menu.c32, select Yes

  • Finally, click Yes to the warning that this flash drive will be formatted (destroyed)

 

That’s it. It will take a couple of mins, but you should have a bootable flash drive.

System logs are stored on non-persistent storage

As customer start to deploy ESXi on smaller SD Cards or Boot from SAN, they encounter the following error after installing a new host:

“System logs are stored on non-persistent storage”

This error just indicates that you need to save your scratch logs to another location, (shared storage or local disk). The process is super easy. To change the location, use on of the following methods:

Verifying the Location of System Logs in vSphere Client

To verify the location:

  1. In vSphere Client, select the host in the inventory panel.
  2. Click the Configuration tab, then click Advanced Settings under Software.
  3. Ensure that Syslog.global.logDir points to a persistent location.The directory should be specified as [datastorename] path_to_file where the path is relative to the datastore. For example, [datastore1] /systemlogs.
  4. If the Syslog.global.logDir field is empty or explicitly points to a scratch partition, make sure that the field ScratchConfig.CurrentScratchLocation shows a location on persistent storage.

Verifying the Location of System Logs in vSphere Web Client

To verify the location:

  1. Browse to the host in the vSphere Web Client navigator.
  2. Click the Manage tab, then click Settings.
  3. Under System, click Advanced System Settings.
  4. Ensure that Syslog.global.logDir points to a persistent location.
  5. If the field Syslog.global.logDir is empty or points to a scratch partition, make sure that the field ScratchConfig.CurrentScratchLocation shows a location on persistent storage.

No image profile is found on the host or image profile is empty. An image profile is required to install or remove VIBs. To install an image profile, use the esxcli image profile install command

While upgrade an ESXi 6 host for a customer last night, I ran into the following error when trying to patch via Update Manager:
No image profile is found on the host or image profile is empty. An image profile is required to install or remove VIBs. To install an image profile, use the esxcli image profile install command.”

I tried various things such as rebooting the host, and manually patching via esxcli. (See my previous post on patching via CLI) but nothing seemed to work.

The server was a Dell R620, and after some searching, I found that it had a corrupt profile image. This can be fixed by replacing the corrupt image file and replacing with a known good one from another host. (The hosts dont have to be the same server version, but I would try to keep to same CPU families (Intel vs AMD). Here is how to do it.

  1. On the working ESXi host, copy the following image file: imgdb.tgz
    cp /bootbank/imgdb.tgz /vmfs/volumes/<An Accessible LUN>

  2.  On the corrupt host, copy the file imgdb.tgz from the working host to /tmp:
    cp /vmfs/volumes/<An Accessible LUN>/imgdb.tgz /tmp

  3. Change Directories to /tmp
    cd /tmp

  4. Extract file you just copied
    tar -xzf imgdb.tgz

  5. Copy the working profile files to the profile directory
    cp /tmp/var/db/esximg/profiles/* /var/db/esximg/profiles/

  6. Copy the working VIBs to the VIB repository
    cp /tmp/var/db/esximg/vibs/* /var/db/esximg/vibs/

  7. Remove the corrupt imgdb.tgz from the bootbank
    rm /bootbank/imgdb.tgz

  8. Move the working copy of imgdb.tgz into the bootbank
    cp /tmp/imgdb.tgz /bootbank/

  9. Make Config Backup
    /sbin/auto-backup.sh

  10. Reboot the host
    reboot
  11. Update host using Update Manager again

VMware vMotion Error: The Operation is not Supported on the Object

While trying to vMotion (Host and Storage), I kept getting the following error:
“The operation is not supported on this object”.

I noticed their switches were negotiating the vMotion NICs at 100/mb, which is unsupported by VMware. After messing with the customer’s switch, I was able to set those ports to 1000-full. But after doing so, I was still getting this error, and nobody online had a solution. After messing with it for 45 mins, I was able to resolve this by disabling vMotion on the NIC, and then re-enabling it. I assume the vMotion setting needed to be reset now that the NICs were set to 1000/mb.

Hope this helps

ESXi “Error loading /k.b00” “Fatal error: 33 (Inconsistent Data)

I was deploying ESXi 6 on a new server, booting off USB thumb-drive where I put the ESXi installer. (Installer creatred with Rufus), I got the following error just a few seconds into the install

Error loading /k.b00
Compressed MD5: 23a1XXXXXXXXXX
Decompressed MD5: 00000000000000000000000000
Fatal error: 33 (Inconsistent data)

Turned out to be a bad USB drive.
Bad (usually cheap generic drives) work well for storing files, but in my experience, lack the ability to be used as install media or “Live CD’s”. I am not sure what makes one drive work over the other, but assume it has to do with the controller interface on those drives.