Category Archives: Storage

Finding Raw Device Mappings (RDMs) used in your VMware vSphere Environment

Cleaning up legacy storage and vSphere environments is always fun, especially when you think you have everything moved off an old array, only to find that your production database goes offline when that array is unplugged -totally made up scenario, did not happen to me  🙂

The slow way to approach this would be to go through every VM, one by one, and check the disks associated with the VM, and then reference LUN numbers on the SAN, etc. OR, you could use PowerCLI and find that info in a snap.

For instructions on how to install PowerCLI, see my previous post here

  1. Connect to you vCenter Server through PowerCLI by using the following command and entering appropriate vSphere Credentials

connect-viserver YOUR IP ADDRESS

If you see the following error, you will need to set PowerCLI to disregard Self-Signed Certs

Set-PowerCLIConfiguration -InvalidCertificateAction ignore -confirm:$false

  2. Run the following command to produce a list of VMs with RDMs

Get-VM | Get-HardDisk -DiskType "RawPhysical","RawVirtual" | Select Parent,Name,DiskType,ScsiCanonicalName,DeviceName | fl

The output will look similar to this (sorry, I didnt have any additional RDMs when making this tutorial for a real screenshot)

  3. Finally, if you would like to save the output to a file, use the following command

Get-VM | Get-HardDisk -DiskType "RawPhysical","RawVirtual" | Select Parent,Name,DiskType,ScsiCanonicalName,DeviceName | fl | Out-File –FilePath RDM-list.txt

 

Infinio Accelerator: Server-Side Caching for Insane Acceleration

Server Side Caching isn’t a totally new concept, but it is a hot market right now as storage providers try and push the speed limits of their perspective platforms. The 3DXPoint water cooler talk is all the craze, even if the product isn’t available to its full potential.

Infinio is a server-side caching solution I have been benchmarking as a potential offering to customers, and I have been very impressed with the quick results. Being able to reduce Read latency (400% in my case) in as little as 15 mins, is what sold me.

Infinio Accelerator is built on three fundamental principles:

  1. The highest performance storage architecture is one where the
    hottest data is co-located with applications in the server
    As storage media has become increasingly faster, culminating in the
    ubiquity of flash devices, the network has become the new bottleneck. An
    architecture that serves I/O server-side provides performance that is
    significantly better than relying on lengthy round-trips to and from even
    the highest performing network-based storage. By serving most I/O with
    server-side speed, as well as reducing demands on centralized arrays,
    Infinio can deliver 10X the IOPS and 20X lower latency of typical storage
    environments.
  2. A “memory-first” architecture is required to realize the best
    storage performance
    RAM is orders of magnitude faster than flash and SSDs, but is price prohibitive
    for most datasets. Infinio’s solution to this problem is a
    content-based architecture, whose inline deduplication enables RAM to
    cache 5X-10X more data than its physical capacity. The option of evicting
    from RAM to a server-side flash tier (which may comprise PCIe flash, SSDs,
    or NVMe devices) offers additional caching capacity. By creating a tiered
    cache such as this, Infinio makes it practical to reduce the storage
    requirements on the server side to just 10% of the dataset. Long-term
    industry trends such as storage-class memory are another indication that a
    memory-first architecture is appropriate for this application.
  3. Delivering storage performance should be 100% headache-free
    Infinio’s software enables the use of server-side RAM and flash to be
    transparent to storage environments, supporting the use of native storage features like snapshots and clones, as well as VMware integrations like
    VAAI and DRS. The introduction of Infinio begins to provide value
    immediately after a non-disruptive, no reboot, 15 minute installation. This
    is in sharp contrast to server-side flash devices used alone, which can
    provide impressive performance results, but require significant
    maintenance and cumbersome data protection.

What does Infinio do exactly?

Infinio Accelerator is a software-based server-side cache that provides high
performance to any storage system in a VMware environment. It increases
IOPS and decreases latency by caching a copy of the hottest data on serverside
resources such as RAM and flash devices. Native inline deduplication
ensures that all local storage resources are used as efficiently as possible,
reducing the cost of performance. Results can be seen instantly following the
non-disruptive, 15-minute installation that doesn’t require any downtime, data
migration, or reboots. 70% of I/O request are Reads (on average), most of your I/O Reads will come directly from super-fast Ram

How does it actually work?

Infinio is built on VMware’s VAIO (vSphere APIs for I/O Filters) framework,
which is the fastest and most secure way to intercept I/O coming from a virtual
machine. Its benefits can be realized on any storage that VMware supports; in
addition, integration with VMware features like DRS, SDRS, VAAI and vMotion
all continue to function the same way once Infinio is installed. Finally, future
storage innovation that VMware releases will be available immediately through
I/O Filter integration.

In short, Infinio is the most cost-effective and easiest way to add storage
performance to a VMware environment. By bringing performance closer to
applications, Infinio delivers:
20X decrease in latency
10X increase in throughput
Reduced storage performance costs ($/IOPS) and capacity costs ($/GB)

Final Thoughts

Honestly, there could not be an easier solution that provides as dramatic results as Server-Side caching. Deploying Ininfio when you are in a performance jam provides immediate relief, and should be part of your performance enhancing arsenal. There is a free trial as well, and remember, there is no downtime to install or uninstall Infinio in your environment.

Please reach out to myself, or your Solution Provider to learn more and test drive Infinio Accelerator. NetWize IT Solutions.

Datrium Design – Architecture Matters

Lame Joke: What do you get when you stick NVMe-based SSD onto an All-Flash Array or Hyper-Converged Node?

Genuine Answer: A Bottleneck of course!

As flash technologies advance and increase in performance, existing (and upcoming) network infrastructure cannot meet the demands of Next-Gen NAND technologies, such as 3DXPoint.
This chart compares saturation rates of 10GbE, 40GbE, and 100GbE with various flash offerings.

 

Datrium was founded by Ex-Founders and Principal Architects of Companies like Data Domain and VMware, so it’s safe to say they know a thing or two about architecture. Their approach to overcoming some of the shortcoming in Traditional Converged and HyperConverged (HCI) platforms boils down to the following shift in architecture design:

Move the I/O Processing to a stateless compute nodes

Architectural Overview
There are basically two components to Datrium’s Open Convergence architecture.

Compute Nodes
Computer Nodes are Servers of any brand the customer would like to use. The more RAM and Flash these servers have, the more powerful the overall architecture. Each Server Node get’s Datrium’s DVX software installed into the userspace on the hypervisor.
Every compute node is responsible for data services (Deduplication, Compression, Erasure Coding, and Encryption). These nodes pull copies of data from Data Nodes (the next component we will address shortly), and keep that data in a stateless fashion, before the data is sent to the Data Nodes.

Data Nodes
The DVX Data Nodes are Hybrid or All-Flash Disk Enclosures that are purchased from Datrium.  (You can’t use your own Data Nodes). Since all data is processed on the server nodes, there is no data processing happening at the data node layer. This allows the data nodes to keep data that is only accessed if the data copies are not available in flash/cache on the compute nodes. The data that resides on the data nodes is heavily protected for resiliency.

Open Convergence is Datrium’s marketing term for this improved architecture, but taking the marketing out of the discussion, here is how Datrium solves for business outcomes:

  1. Simpler than HyperConverged
    – Zero HCI Cluster configuration or cluster sprawl
    – Independently and Simply provision compute or storage
    – Flexibly support any mix of hosts or hypervisors
    – No vendor lock-in on compute resources. Use existing compute hardware
  2. Faster than All-Flash Arrays
    – Flash is on the server, where is performs much faster
    – No Controller Bottlenecks
    – Performance scales with each server
  3. No Backup Silos
    – One console for VM consolidation and data protection
    – Reduce Management time for Backup, DR, Copy Data Management
    – Eliminate dedicated backup devices

Image result for datrium architecture

If you need a lightning fast, resilient, scalable, cloud-enabled architecture, Datrium might be exactly what you need. Because in the end,  Architecture Matters.

 

pRDM and vRDM to VMDK Migrations

I was assisting an amazing client in moving some VMs off an older storage array and onto a newer storage platform. They had some VMs that had Physical RDMs (pRDM) attached to the VMs, and we wanted them living as VMDKs on the new SAN.
Traditionally, I have always shutdown the VM, remove the pRDM, re-add with vRDM, and then do the migration, but found an awesome write-up on a few separate ways in doing this.
(Credit of the following content goes to Cormac Hogan of VMware)

VM with Physical (Pass-Thru) RDMs (Powered On – Storage vMotion):

  • If I try to change the format to thin or thick, then no Storage vMotion allowed.
  • If I chose not to do any conversion, only the pRDM mapping file is moved from the source VMFS datastore to the destination VMFS datastore – the data stays on the original LUN.

 

VM with Virtual (non Pass-Thru) RDMs (Power On – Storage vMotion):

  • On a migrate, if I chose to covert the format in the advanced view, the vRDM is converted to a VMDK on the destination VMFS datastore.
  • If I chose not to do any conversion, only the vRDM mapping file is moved from the source VMFS datastore to the destination VMFS datastore – the data stays on the original LUN (same behaviour as pRDM)

 

VM with Physical (Pass-Thru) RDMs (Powered Off – Cold Migration):

  • On a migrate, if I chose to change the format (via the advanced view), the pRDM is converted to a VMDKon the destination VMFS datastore.
  • If I chose not to do any conversion, only the pRDM mapping file is moved from the source VMFS datastore to the destination VMFS datastore – the data stays on the original LUN

 

VM with Virtual (non Pass-Thru) RDMs (Power Off – Cold Migration):

  • On a migrate, if I chose to covert the format in the advanced view, the vRDM is converted to a VMDK on the destination VMFS datastore.
  • If I chose not to do any conversion, only the vRDM mapping file is moved from the source VMFS datastore to the destination VMFS datastore – the data stays on the original LUN (same behaviour as pRDM).

How to collect EMC SP NAR Data/Logs

Collecting EMC NAR Data

 

Login to Unisphere

Select your system from the List

Hover over System and click Statistics

 

 Enable Performance and Data Logging


Select box to enable Periodic Archiving
Select Box to stop after 6 or 7 days
Click Start (and then Yes/OK to following prompts, and click X to close Windows after it starts)

After the 6-7 days of running, retrieve the NAR file by doing the following:

Go back to the Statistics Page where you started the data logging, and select Retrieve Archive

 

 You will need to get files from both SPs, so start with SP A.
Click the files that were created during the date range you ran the logging, browse to someone on your computer to save them, and select retrieve

 

Repeat those steps on SP B.

 

After you have the files saved, I will get you an FTP link to upload them to be analyzed.

VMware Storage I/O Control (SIOC) – A Blessing and a Curse

I am taking this content straight from an email I just sent a customer, so the content isn’t well polished. But the email took me long enough to write that I decided to post it here for others.

Storage I/O Control (SIOC) is a mechanism to prevent one VM to hog all the I/O resources making the other VMs wait for their I/O request to be completed. By default, it gives every VM on a Datastore fair and Equal I/O Shares. It is able to gauge and determine fairness based off latency. So if you have two VMs (VM1 and VM2) and VM1’s latency hits a specified threshold (30ms is default), then it will actually SLOW VM2’s I/O access and give the scheduler resources back to VM1 until fair sharing is equalized again. This is different than QoS, but I’m sure you see some similarities.

So that sounds great, right? (Really, it is great). But this is not very effective and can be detrimental in certain circumstances. I’ll try to explain.

First let me preface this by explaining two concepts, which  you may already be aware of.

  1. Hypervisors work via scheduled process. Every VM waits its turn to receive the CPU Cycle or Memory Page it requested until it is his turn in the scheduler.
  2. Every Volume you create and map to a host is given a LUN ID (this volume is the LUN) and each LUN has access to schedulers. All the VMs in this Volume/LUN take their turn for I/O requests. This is why best practice dictates you put a maximum of 10-15 VMs per Volume, or much less if those are resource intensive VMs. The more VMs in the LUN, the longer each VM has to wait for its I/O requests. (Note- Setting resource shares doesn’t solve this, it just guarantees one VM will have priority over another)

There are certain scenarios where SIOC can possibly make things worse. The scenario you might be running into is the following:
You have a SAN capable of tiered storage, which is really amazing when you think about how that all works. What’s even more incredible is that you are able to have different RAID types be striped across the same physical disk. (Hot data lives on 15k drives in a RAID 10 stripe, and as I becomes warm, it moves into a RAID 5 stripe across those same physical 15k drives).

Lets take our VM1 and VM2, both reside on the same LUN. We have enabled SIOC on that LUN. VM1 is a high resource VM that is crucial to your business and VM2 is just a Test/Dev Server. Most of VM1’s blocks reside in your 15k disks RAID10, but a few of its less hot blocks have moved to RAID 5, but still on those 15k drives. Again, data on VM1 is almost always hot.
VM2 on the other hand, has some of its blocks on the 15k drive, and some reside on the 7k slower drives since that data is hardly ever accessed.

One day you log into VM2, and fires up an application who’s data is on those 7k drives. That data takes longer to retrieve, naturally, since its sitting on the slowest media, and the time it takes to queue up and process that I/O request (latency) is much greater than the time its taking VM1 to process its requests.

What happens is the SIOC’s mechanism kicks in and because the latency on retrieving data for VM2 is impeding on its “fair access” functionality. So it throttles down the I/O of VM1 (your production server) to try and decrease the latency VM2 is having. You have essentially killed the performance of the VM that needs it the most. Now imagine this happening for all your VMs, VMDKs, bits, blocks, whatever you want to include, it has become a traffic nightmare. It can throttle a VM down so much, waiting for the latency to decrease on the other VMs, that everything is timing out, whereas if you weren’t using SIOC, things would be humming along as usual, and VM2 will just take its sweet time processing data from the slow drives.

I am sure you were aware of most of these concepts, and what I have described is somewhat over-simplified, but hopefully that makes sense. Sharing workloads across the same physical drives can make SICO a nightmare. If you are careful in what workloads you place in what LUN, then SIOC can be great, even on tiered storage. If you take an old EMC or Netapp where you used to carve out specific disks for specific volumes, SIOC would also be great.

Dell Compellent’s Best Practice is to use this with caution, just as other have stated as well on this feature.

 

Dell Storage Manager (DSM) Deployment

Dell Compellent’s Enterprise Manager is growing up and has been rebranded Dell Storage Manager, since it can now manager SC and PS storage. DSM is available as a VMware Appliance, and that is what we will use to deploy DSM.

First things first – You’ll need to get the download link from CoPilot, as it is not publicly available in Knowledge Center.

Once you have the DSMVirtualAppliance-16.xxxx.zip file, extract it and deploy the OVF file as you would any other appliance in VMware.

Once deployed and running, you have a few options:
1- Download the Client, Admin Guide, etc (Do this by going to https://appliance_IP)
2- Run the Setup (https://appliance_IP/setup

We are going to run the setup
Start by hitting the URL https://appliance_IP/setup

Username: config
Password: dell

 

Add you existing SC and PS Storage Systems and you’re ready to rock and roll

Active Directory Integration Enterprise Manager

If I had to list the top 10 questions asked from new Compellent customers after a deployment, if the ability to login via Active Directory credentials is available would certainly be one. The answer is yes. And luckily nowadays, it’s an easy yes. In the past, it would have been easy to lie and say it’s not possible due to the complexity of the setup requirements, but now it is super straightforward. If you are looking to for AD authentication, here we go..

Prereqs
– Each Controller should have a FQDN
– Each Controller should have an A Record in DNS
– Each Controller’s A Record should have Reverse Lookup and PRT

I am assuming most can do the basic DNS prereqs which is why I am not outlining those, but I may add those to the step-by-step guide in the future.

Step 1 – Make Sure each Controller has DNS entries to your internal AD DNS Server

Open Storage Center

Expand Controllers

Right Click on Controller and select Properties

Click IP Tab and go to DNS – Make sure your internal DNS servers are entered there

Repeat this step for the other controller

Step 2 – Configure AD Authentication Services

In Storage Center, go to Storage ManagementSystem – AccessConfigure Authentication

Enable External Directory Services and enter the FQDN of each controller, separated by spaces

  • In the Directory Type dropdown, choose Active Directory.
  • In the URI field, make sure the FQDN name of the AD Domain Server(s) are entered. Each FQDN should be prefaced by “ldap://” and names should be separated by spaces. i.e.: “ldap://JS24.EXLab.local ldap://JS25.EXLab.local” Note: Storage Center AD Integration is not site aware, meaning it cannot automatically detect a domain and associated domain controllers To use a specific domain controller it must be defined in the URI field. Storage Center will try to authenticate to domain controllers in the order they are defined in this field. If a domain controller becomes inaccessible, Storage Center will try the next domain controller in the list.
  • Note: Storage Center AD Integration supports authentication against a ReadOnly Domain Controller (RODC).
  • In the Server Connection Timeout field enter 30
  • In the Base DN field enter the canonical name of the domain. For example, if your domain is EXLab.local, the canonical name is “dc=EXLab,dc=local”.
  • (Optional) In the Relative Base field enter the canonical location of where the Storage Center Active Directory object should be created. Default is CN=Computers.
  • In the Storage Center Hostname field enter the Storage Center name followed by the domain name. This will be the FQDN of the Storage Center (i.e. SC22.EXLab.local).
  • In the LDAP Domain field enter the name of the domain (i.e. EXLab.local).
  • In the Auth Bind Username field enter the AD service account with rights to search the directory created prior to setup. The format of this field is [email protected] (i.e. [email protected]).
  • In the Auth Bind Password field enter service account password.

Test – If test fails, troubleshoot DNS, the Continue

Configure Kerberos Authentication
The values displayed will be the default values, and in most cases, can be left as is. If the defaults are modified, all values should be entered in UPPERCASE.

  • In the Domain Realms field enter the domain name (i.e. EXLAB.LOCAL)
  • In the KDC Hostname field specify a Kerberos server (this is usually a domain controller).
  • In the Password Renew Rate (Days) field leave the value at 15
  • Continue

Enter credentials for a domain user that has rights to join objects to the domain. This one-time operation does not require a service account

Click Join Now and then Finish Now

Dell DPACK 2.0

For those not familiar with the “Dell Performance Analysis Collection Kit” (DPACK), it is a pretty incredible tool that allows you to visualize the current storage and server workloads from the perspective of the host. It is a great tool for planning Capacity and I/O requirements, and can even be used to troubleshoot problems with your storage and storage network. DPACK measures the following:
– Disk I/0
– Throughput (Which is more important than I/O when we’re talking FLASH)
– Capacity
– Memory Consumption on your Servers
– CPU Utilization on your Servers
– Network Traffic
-Queue Depths

Version 2.0 allows real-time analysis statistics that you can view, instead of having to wait the 24 hrs you were required to wait and let it run previously. Version 2.0 gives you better views into your data for additional insight, and presents this data in a form that Executives can appreciate when you go to them with a PO Request.

Some other things to know about DPACK 2.0 are:
– Uses HTML 5 for viewing real-time data in a browser
– Generate PDFs of data collected to present to management
– DPACK compresses analyzed data and transmits to server every 5 mins
– Uses secure SSL on port 443
– DPACK will continue to run and collect data in the event it loses ability to upload
– DPACK isn’t performance impacting and can be (and should be) run during business hrs

So how do you get started with DPACK 2.0? Its best to call up your local Dell Storage Team and discuss DPACK. Netwize, is a great resource for any Dell Data Center needs, helps customers run DPACK all the time and are a great resource for you to reach out to. I work for Netwize so please feel free to reach out to us and we will get you set up.

Running Dell DPACK longer than 24hrs

If you have ever used the DELL DPACK utility to analyze your storage, you’ll find that the application only gives you the ability to scan for 24hrs. Dell does this because they claim “statistically” the DPACK results don’t vary much from any particular day to day. Having run hundreds and hundreds of DPACK scans for many customers, I find that more often than not, the results from various days of running are different enough to warrant longer monitoring times. My advice is to run your DPACK for 3-4 days. And here is how you do it

First download the DPACK tool from http://dell.com/dpack
Extract the Software
Open a Command Prompt and change directory to the extracted DPACK folder
Run the following command: dellpack.exe /extended
CMD_Prompt__BLUR2_.png

Now you should be able to change the monitoring duration:
Time_Duration.png