Skip navigation

If you are running version 4.2(1)SV2(1.1) or later of the Nexus 1000v switch, consider enabling the vTracker feature.  It is available for both the essentials (free) version, and the advanced version of the 1kv.  vTracker gives network administrators a high level of visibility into the virtual environment, without having to log into the vSphere client.

Cisco documentation on vTracker.

If vTracker has not yet been enabled, log into the Nexus 1000v and run the following command:

feature vtracker

Make sure to save the running-config after enabling the feature.

Here are some examples from my home lab showing the kind of info you can retrieve with vTracker.  I tried to paste these as preformatted text, but WordPress didn’t preserve the spacing correctly so I used images.

show vtracker module-view pnic

show vtracker upstream-view

show vtracker vm-view vnic

show vtracker vm-view info

show vtracker vmotion-view last 5

Last week I had some time to set up VMware VSA (vSphere Storage Appliance) in my home lab.  VSA allows you to use local host storage (SATA or SAS drives) as shared storage with a built-in NFS server and automatic replication between cluster hosts.

To be officially supported by VMware, the storage controller in your servers has to be on the hardware compatibility list.  There are many controllers from LSI, Dell, HP, and others listed. From what I could determine, software-based onboard RAID is not supported.  The hosts must have a minimum of 4 gigabit NICs, 6GB of RAM, and a maximum of 72GB of RAM, although higher amounts of RAM should work but is not tested by VMware.  2.0GHz dual core is the lowest supported CPU configuration.

For my lab environment, I already have 2 HP servers configured with shared iSCSI storage running ESXi 5.1, but they do not have local RAID controllers nor 4 NICs each (only 2 NICs).  I decided to create a “nested ESXi” install so that I could create 3 nodes for a VSA cluster.  I deployed 3 new VMs with 2 vCPU sockets, 6GB RAM, 4 x E1000 NICs, and 250GB of disk space (thin).  I created a new VLAN on my lab switch and Nexus 1000v distributed switch to use for the VSA back-end network.  I also created a new port profile on the Nexus 1000v that was configured as a VLAN trunk instead of access port.  That way my nested ESXi guest VMs can access all of the VLANs with dot1q tagging.

To allow nested ESXi to work with 64-bit guests, I had to enable hardware virtualization in the VM settings through the vSphere Web Client.


I then installed ESXi 5.1 on each of the 3 VMs, using the local 250GB storage.  Assigned static management IP addresses and hostnames, added DNS entries for the new servers.  Once the 3 servers were online, I added the first host to my existing vCenter server temporarily.  That way I could use my existing Server 2008 R2 template to deploy a new VM to use for the VSA vCenter.  You don’t need to set up a new vCenter just for VSA, you can use an existing one that is version 5.1.  I wanted to create a new one to simulate a typical customer deployment where there is not an existing vCenter.  If you don’t have an existing template, you can skip adding the host to an existing vCenter and just create a new empty VM directly on the server using the vSphere Client.  Install Server 2008 R2 follow standard procedures for preparing a server for vCenter 5.1.  Make sure you join the VM to the domain.

I installed a new vCenter (version 5.1.0b) onto the 2008 R2 VM using the simple install method, since this was just a test environment.  For a production install you may want to use the traditional component-based install method.  After vCenter and its relevant components were installed, I installed the VSA manager software on the server.

Before I could proceed to creating the new VSA cluster, I had to make a change in the C:\Program Files\VMware\Infrastructure\tomcat\webapps\VSAManager\WEB-INF\classes\ file on my VSA vCenter server.  Since this was a nested ESXi environment, EVC mode will not work.  By default VSA requires EVC to be enabled, but you can override it in the properties file.  I had to change the evc.config value from “true” to “false”.  For brownfield deployments (where VMs are already running on the VSA hosts), you have to change the evc.config.baseline value to “highest” instead of “lowest”.  This includes environments where vCenter has already been installed onto one of the VSA hosts, like in my lab example here.  Since I was disabling EVC altogether, I didn’t have to change that value.  Greenfield deployments do not require any changes, but that means you can’t have any VMs running on the hosts that are to be configured for VSA (vCenter must be physical or hosted on another vSphere host outside of VSA).

After changing the file, I had to restart the VirtualCenter Management Webservices and VirtualCenter Server.  Then I added my 3 hosts to vCenter in a new Datacenter object.  I did not create a cluster yet, just left the hosts as standalone.  There was one more setting that I needed to change before I could run the VSA wizard.  I had to change the VMFS heap size on the host that was running my vCenter 5.1 VM.  This must be done for any host that you are adding to the VSA cluster, that is already running VMs.  If the host does not have any running VMs, you can let the wizard change it automatically for you and it will reboot the host.  To change the heap size, go to the Advanced System Settings for the host and filter for VMFS.  Change the parameter to 256.

Now I was ready to create the VSA cluster.  To get to the VSA installer wizard, open the regular vSphere Client and connect to vCenter.  Select the datacenter that contains the 3 hosts (or 2 hosts if you are making a 2-node cluster) for VSA, then click on the VSA Manager tab at the far right.  If you don’t see the tab, the plugin may not be enabled yet.  Check in the Manage Plugins window to see if it is disabled.  The installer will ask you to select your hosts, and will let you know if there is a configuration problem.  For brownfield deployments where you have existing VM networks, vMotion, fault tolerance etc enabled you will have to make sure the network configuration is prepared ahead of time.  In my case, I had not changed the default ESXi network configuration (Management Network and VM Network only) on vmnic0, so I let the wizard do the work for me.

If you get a pop-up warning about VSA deleting local data on the hosts, this does not refer to the local VMFS datastore.  It only refers to previously configured VSA storage that may exist on the host.  See the VMware KB article for more information:

The network configuration for a 3-node VSA cluster requires 10 static IP addresses, or 7 static IP addresses + 3 DHCP addresses.  I did not use DHCP for my lab build.  This does not include the static IP for vCenter and for the ESXi host management (total of 14 if you include those).  For each host you will need a VSA management IP, vSphere Feature (vMotion) IP, and a back-end network IP.  The back-end network should be a separate VLAN if possible with different subnet.  You also need one VSA cluster network IP address.

You will have to decide how much storage to allocate to VSA as part of the installer wizard.  If there are no local VMs on the hosts, you will be able to use most of the storage for VSA (I had 13GB free on each local datastore when VSA was maxed out).  Since I had a 40GB vCenter VM on host 1, I could not use the full amount initially for VSA.  The size can be increased later, although during the process there will be performance impact to any VMs running on the VSA datastores.


After deciding on a cluster size, the installer automatically takes care of the rest of the process.  It will make the heap size change to any hosts that still need it, and reboot them.  Then the process picks up when the hosts are connected back to vCenter.  It automatically sets up the vSwitches, VM networks, and vMotion.  You will have a new cluster called VSA HA Cluster that has HA enabled (but DRS is not automatically enabled).  EVC mode will be enabled unless you changed the properties file to disable it.  It will set HA admission control to 33% reserved CPU and Memory for a 3-node cluster, and it also changes the HA restart priority for the VSA appliances to High.  The VSA wizard also enables VM Monitoring with a failure interval of 60 seconds, minimum uptime of 120 seconds, and maximum per-VM resets 3 in a 72 hour window.  VM monitoring is not something that I normally enable unless there are VMs that “lock up” on a regular basis, so it was interesting to see that VMware turns it on by default here.

During the installer process, I saw a triggered alarm in the Web Client that said “vSphere HA virtual machine failover failed”.  I’m not sure why I got this alarm, but there was no apparent host failure.  If there is a critical error during the install process, it will back out the changes made to the hosts (with the exception of the heap size, and vSwitch0 had vmnic2 as the uplink instead of vmnic0).  I went through this process once, because I initially forgot to add the back-end network to my lab switch.  The VSA appliances couldn’t talk to each other over the back-end network.  There is no progress indicator for the back-out process, you just have to wait about 10 minutes while VSA finishes cleaning up everything.  The second time I ran the installer, after adding the VLAN, everything went smoothly.


The VSA appliances run SUSE Linux Enterprise Server 11 SP2 and they have VMware tools installed.  Each appliance exports one NFS shared datastore to the cluster, which is half the size of the local VSA space on the host.  The other half of the space is used as a mirror destination for another VSA host.  If one appliance fails, the replica for its NFS share is mounted read/write on the surviving appliance.

After my VSA cluster was online, I used storage vMotion to migrate the vCenter VM to one of the new datastores.  This allows the VSA cluster size to be increased to use the local VMFS space that vCenter was previously consuming.  It also provides vMotion and HA capabilities for vCenter.  However, there are some downsides to moving vCenter to the VSA datastore.  Should the VSA datastore that stores the vCenter VM files fail to come online, there will be no way to access VSA Manager.  Any troubleshooting would have to be done with the command line directly on the VSA appliance.  I did test a full power-down of the VSA cluster after moving vCenter to the datastore.  I shut down the vCenter VM first, then shut down one VSA appliance at a time.  With the VSA appliances offline, all VMs that are using those NFS datastores will show as inaccessible.  I then powered the VSA appliances and waited for the NFS datastores to become available.  I then powered up vCenter.

Another issue with moving vCenter to the VSA datastore is that you cannot change the networking configuration of the cluster without moving vCenter off to a different location.  All VMs except for the VSA appliances have to be powered off on the VSA datastores before you can change the network configuration.

In order to do maintenance work on one of the VSA cluster hosts, you first have to put the VSA appliance on that particular host in appliance maintenance mode.  This is selected from VSA Manager in the appliances view.  You can only put one appliance into maintenance mode at a time; this is true for both 2-node and 3-node clusters.  When the appliance enters maintenance mode, it will shut down and the datastore replica will become active on the paired appliance that is still running.  You can then move any remaining VMs off the host (or shut them down), and put the host into maintenance mode the usual way.  The datastores that are affected by the powered-off appliance will show degraded operation in VSA Manager:

Degraded datastores are still accessible by hosts, but they are not being replicated so there is risk of data loss while the VSA appliance remains in maintenance mode.  When the maintenance on the host is completed, first take the host out of maintenance mode.  Then you will have to power on the VSA appliance that was previously shut down.  Wait for the appliance to boot which takes a few minutes.  Then you can go to the VSA Manager interface and take the appliance out of maintenance mode.

Shortly after the appliance exits maintenance mode, you will see a data synchronization task appear in the recent tasks pane.  VSA has to sync all of the data block changes that occurred to the datastore during the maintenance window.  Until this synchronization is completed, the datastore will still be exported by the appliance that was online during maintenance mode.

Although it may be tempting to put the next appliance into maintenance mode while the synchronization is happening, I would not recommend trying this.  In my lab environment I had one of the NFS datastores go offline when I put another appliance into maintenance mode before the sync had completed.  Make sure you allow plenty of time for each host maintenance, especially if there is a high change rate on the datastores which means synchronization will take longer.

This week I had to set up a couple new Catalyst 6504-E switches with VSS Sup720, and after configuring VSS successfully I wanted to upgrade the IOS software to the latest 12.2 release.

First I copied the .bin file from scp: to sup-bootdisk: and then from sup-bootdisk: to slavesup-bootdisk:

Then, following the eFSU instructions, I ran

issu loadversion sup-bootdisk:s72033-ipbasek9-mz.122-33.SXJ5.bin

which failed with the following error message:

% CV [ bootdisk:s72033-ipbasek9-mz.122-33.SXJ5.bin ] must be named first in BOOT [ bootdisk:s72033-ipbasek9-mz.122-33.SXJ5.bin ]

After some searches on Google I found the suggestion on Cisco support forums that the VSS switch configuration may not have a proper boot variable.  I checked, and sure enough there was no “boot system” line configured at all.  This was a brand new switch pair that had just been set up to use VSS, so I don’t know why the boot variable would have been missing but it was.

I added the line “boot system flash sup-bootdisk:s72033-ipbasek9-mz.122-33.SXJ4.bin”, saved the running config to startup-config, and tried the issu command again.  This time, loadversion worked and the standby switch rebooted with the new IOS version.

Next, I ran “issu runversion” which makes the standby switch active (running the new IOS) and the previously-active switch reloads into standby mode (still running old IOS)

After that I ran “issu acceptversion” to stop the 45-minute countdown clock that will automatically revert the IOS upgrade.  That gives you time to verify the switch functionality.

Next, I ran “issu commitversion” which reloads the standby switch with the new IOS software.  After this, both are running the same version but the active switch was previously the standby switch prior to the upgrade.

Lastly, I ran “redundancy force-switchover” which swapped the roles of active and standby so that switch 1 was active again.

Throughout this process I used “show issu state detail” to check on the progress as well as “show redundancy”

I noticed an abnormality after the upgrade was completed.  When I showed the running-config there were two “boot system” entries instead of one.  They showed as:

boot system sup-bootdisk:s72033-ipbasek9-mz.122-33.SXJ5.bin
boot system flash sup-bootdisk:s72033-ipbasek9-mz.122-33.SXJ4.bin

I removed the old version line with “no boot system flash sup-bootdisk:s72033-ipbasek9-mz.122-33.SXJ4.bin” and saved the running config.

I was validating an ESXi environment configuration this morning for a customer and had a problem with one of the vmnic uplinks from a host.  The switch port was configured for VLAN trunking the same as the ports used for the other vmnic uplinks, but for some reason the ESXi host would only show a single observed network instead of multiple networks.  The CDP information pop-up for that vmnic said the switch port VLAN was 10 when it should have said 1 (native VLAN 1 is the default for Cisco trunk ports).  Oddly, CDP also kept showing different Port ID every few seconds.  In my past experience I had never seen the Port ID change dynamically unless somebody moved the uplink cable to a different switch port.

I tried removing the vmnic from the vSwitch and added it back, but that did not help fix the problem.  I tried rebooting the host, and even disabling CDP and enabling bidirectional CDP from the ESXi command line interface.  We tested the uplink by removing all the other vmnics from the vSwitch, and ping was not working for one of the trunked VLANs.  So clearly there was a network configuration issue somewhere.

We checked the switch configuration again and noticed on the interface status table the switch port for the vmnic in question showed “monitoring” instead of “connected.”  This tipped us off to the fact that there was an active SPAN session that was using that interface.  “show run | inc destination” will also show you if there are any interfaces or VLANs set up as SPAN destinations.  Once we deleted the SPAN session, CDP behaved normally on the ESXi host and we were able to send VM traffic through that uplink.

So, if you happen to encounter a vmnic with seemingly random changes of the Port ID value from the Cisco Discovery Protocol, check to make sure the port is not configured as a monitor destination.  Even though the running configuration (show run int Gi##) for each port may appear to be the same that doesn’t necessary mean each port is behaving the same way.

A couple weeks ago I was setting up 3 new rack-mount UCS servers (C210 M2) with the LSI 1064E mezzanine adapter.  I was unable to create any RAID volumes on the server using the LSI option ROM utility, and had to instead use the Cisco server configuration CD.  This is the first time I have had any problems using the built-in LSI RAID configuration interface.  Read on for more details.

The servers were brand new and had identical hardware configurations: Xeon E5649, 48GB RAM, 2 x 146GB SAS 15k drive, Broadcom 4 port expansion iSCSI NIC.  They shipped with firmware package 1.4.2.

I went through my normal process for setting up new C-series servers, which starts with configuring the CIMC (Cisco Integrated Management Controller) by attaching a monitor and keyboard to the server and pressing F8 during POST.  After that is set up I disconnect the physical peripherals and use the CIMC KVM to control the server.

Next I went into the server BIOS and made a couple changes to optimize for ESXi 5.  I left the mass storage configuration in Enhanced more, which is what Cisco requires if you are using an add-on RAID controller with a SATA optical drive.  Some C-series configurations do not have any add-in adapter and can only use the built-in motherboard software RAID, which requires changing the mass storage configuration in the BIOS.

I installed the latest firmware 1.4.3c.2 using the Cisco host upgrade utility ISO, which can sequentially upgrade all the components including the BIOS, CIMC, NICs, LOM, and RAID controller.  There were no issues with the upgrade, some components like the NICs and RAID controller did not have a newer revision available on the ISO.

After that I waited for the LSI option ROM during POST and entered the utility by pressing CTRL+C when prompted.  I was able to view the controller configuration, view the attached SAS disks with model numbers and sizes.  However, the menu setting for RAID properties was grayed out and unavailable:

I tried power cycling the server and re-entering the utility, but had the same problem.  I tried changing the mass storage configuration in the BIOS to AHCI, Legacy, and RAID, but none of those made a difference so I set it back to Enhanced.  I tried installing the LSI firmware from an older C-series host upgrade ISO but that did not make any difference.

After doing some searching online and not finding any useful tips I decided to try using the Cisco server configuration CD that ships in the box with the C210 servers.  It boots into a GUI environment:

Once the utility loaded, I clicked on Server Configuration and then RAID Configuration.  I used the default automatic setup with redundancy, which created a RAID-1 volume with the two SAS drives.

I verified in the system logs that the RAID volume was created successfully.  After that I rebooted the server, removed the configuration CD, and continued with the installation of ESXi 5.0 update 1.

I had the opportunity today to set up NIC teaming on a UCS B200 M2 blade with the Cisco M81KR adapter.  In the past, software NIC teaming was not an option with the Cisco VIC adapter.  You could use the fabric failover feature of the VIC and provision a single vNIC that had redundancy, but if you provisioned multiple vNICs to Windows there was no way to combine them into a team.  Traditional server NICs made by Intel, Broadcom, etc usually have teaming software available because there is no hardware level failover capability of the adapter.

Cisco support forum post by Robert Burns about the driver:

Cisco installation guide:

The blade I worked on today was running firmware version 2.0(1w) and had Windows 2008 R2 SP1 installed for an operating system.  After installing Windows I then installed the Cisco VIC drivers from the 2.0(1f) driver ISO available on

Next, you will need the Windows Utility ISO 2.0(1b) available on the Cisco download site.  The utility ISO also has the LSI RAID management utility which is useful if you are using local disks.

Launch a command prompt with Administrator privileges, change to the folder with the teaming files then run:

enictool -l (displays available NICs for teaming)

enictool -i “nic name” (display advanced information about a specific NIC)

enictool -p “F:\Network\Cisco\M81KR\NicTeaming\W2K8R2\x64” (install the teaming driver INF)

enictool -c “nic0-a” “nic0-b” -m 3 (create team of 2 NICs with Active/Active TLB mode)

Then run enictool -l again to verify the new team was created.

Make sure you retain a local copy of the teaming utility on the server in case you need to add/delete teams in the future, or if you need to uninstall the driver.

The installation guide explains some of the different options for enictool but is not comprehensive.  If you just run enictool with no flags it will also display all of the different options.

Configure TCP/IP settings as necessary for the new adapter.  You can also rename it from the default name that is assigned.

I haven’t tested failover yet with the software teaming driver but most likely I will be doing so tomorrow.

Last week I had the great opportunity to attend VMware Partner Exchange 2012 in Las Vegas.  I was joined by a number of other Varrow employees from management, pre-sales, and post-sales.

I signed up for sessions before the conference but did not have any scheduled activities or boot camps on Monday the 13th so I used the day to take advantage of VMware’s hands-on labs.  They had 28 different labs to choose from, covering the whole range of VMware products and solutions.  I tried to focus on labs where I had little previous knowledge or exposure, and therefore got a lot of value out of some quality hands-on time with real life scenarios presented by VMware.

The five labs I attended on Monday were:

  • HOL09 – Improve Troubleshooting and Performance Tuning for Your Virtual Environment
  • HOL10 – Advanced Troubleshooting and Performance Tuning for Your Virtual Environment
  • HOL01 – Building Your Hybrid Cloud
  • HOL07 – Using Virtual Distributed Switches and Network I/O Control in Your Network
  • HOL25 – Cisco – Deploying vCloud Director with Nexus 1000V


I also attended 2 more labs on Thursday:

  • HOL28 – Simplifying Patch and IT Tasks on Your Physical and Virtual Machines
  • HOL05 – Datacenter Migration and Disaster Recovery Protection for Your Virtual Environment


The labs were my first hands-on experience with vCloud Director and vCenter Protect.  Also my knowledge of SRM was very limited so I appreciated getting to run through some example scenarios with the latest SRM 5.0.  vCenter Orchestrator and vCenter Chargeback were also briefly featured in the HOL01 lab.  Getting to see NetFlow in action on a vCenter Distributed Switch 5.0 was very cool, and I can see why a lot of network administrators will enjoy having that in environments that aren’t using the Nexus 1000V.  Most of the labs were timed very well, and I usually only had about 10 minutes left over working at a steady pace.

Most of the rest of my time at Partner Exchange was spent in the break-out sessions.  These were available on Tuesday, Wednesday, and Thursday.  There were general sessions on Tuesday and Wednesday for all attendees that were both informative and motivating.  On Tuesday I attended these sessions:

  • SRM 5 Demo – New Features in Action and Q&A
  • Metering and Billing in Cloud with vCenter Chargeback
  • Selling vSphere Storage Appliance (VSA) Successfully
  • Oracle Databases on vSphere5 Best Practices
  • Everything Back-up – VMware vSphere, vCloud, and View


I learned the most in the Chargeback and VSA sessions, since I had never used either of these products before in a lab or production environment.  As more companies are looking to deliver IT as a service, Chargeback really is a great way to account for those costs based on usage.  VSA is a very new product for VMware and as such has some limitations, but the new 1.5 version will address the biggest ones and hopefully result in more deployments in the SMB market where a full-fledged networked storage array is not required.

On Wednesday I attended the following sessions:

  • Design, Deploy, Optimized SQL Server on VMware vSphere 5
  • Virtualizing Unified Communications Systems with vSphere and View
  • Design, Deploy and Optimize Exchange 2010 on vSphere


All three of these sessions were Tier 1 application focused, which is one of the areas VMware hopes to grow in 2012.  A lot of companies still rely on dedicated physical servers for their tier 1 mission-critical applications.  In these sessions VMware wanted to share with their partners that they have extensively tested and benchmarked tier 1 applications on vSphere 5.0 and they are confident it can handle the performance and uptime demands.

On Wednesday I also passed my VCP-510 exam with PearsonVue, which was held right there at the conference.  Those who currently have a VCP4 certification can test for VCP5 before February 29th without having to attend a VMware education class.

On Thursday I attended these two sessions:

  • Compliance and Security: A holistic approach from the bottom up
  • Up and Running with vSphere vCenter Server Appliance (VCSA)


The compliance session talked mainly about vShield and vCenter Configuration Manager with respect to ITIL and PCI-DSS requirements.  The VCSA session was enlightening, because I do not have much experience with the vCenter appliance  and was not fully aware of all the limitations with the current version.  There is some elegance in the simplicity of an appliance-based vCenter but for a lot of environments it will not be the right choice, at least in the currently offered version.

This week while working on a VM migration project I was initially confounded by an extra port group showing up on VMs that only had a single NIC listed in their configuration.  We had been migrating VMs from ESX 4.0 hosts to new ESXi 5.0 hosts, upgrading VMware Tools and VM hardware, and migrating the VM networking from standard vSwitch to a new distributed vSwitch.

We tried deleting the NIC from the VM and adding it back, but still two networks appeared on the summary page:





Next we tried deleting the NIC and leaving it off, but the VM still showed the standard vSwitch network.  The dvSwitch network had been removed but not the standard one.  Next we removed the VM from inventory and added it back using the datastore browser, but the standard port group still appeared.

After puzzling over this for a while I realized the common factor among the VMs that were showing an extra standard port group.  They had all had a snapshot taken (prior to VM hardware upgrade) before their network port group was migrated from the standard vSwitch to the dvSwitch.  The VMs that did not have an extra network had been migrated to the dvSwitch before we did the VMware Tools and VM hardware upgrade.

After we deleted the existing snapshots on some of the VMs that were showing an extra network, they displayed just the single distributed port group on the summary page.  I’m sure this behavior is by design, so that a VMware administrator won’t accidentally delete a port group that may still be needed if a VM is rolled back to a previous snapshot.  If we had migrated all the VMs to the dvSwitch before starting the other upgrades, we would not have seen any VMs with extra port groups.

A warning to those who are running a Cisco UCS blade system and using a maintenance policy that requires user acknowledgement:

Normally the maintenance policy will prompt for user confirmation before any configuration change is applied that requires a blade power cycle.  Like changing BIOS or boot policies, adding vNICs or vHBAs, etc.  This is especially important when you are using an updating service profile template that is bound to multiple service profiles.  Any change at the template level will propagate to the bound profiles.

In my work with UCS internally and at customer sites, I have found 2 scenarios (so far) that will not prompt for any user confirmation and will just immediately change the power state of the service profile:

  1. Modifying the “desired power state” of the service profile template.  If you change this setting from “on” to “off” the associated blade will just shut off with no warning.  Also if you have an updating template with desired state “off” any time you make changes to the template, all of the bound service profiles will revert their power state to off without warning.  My recommendation is to always leave this set to “on” so that you don’t have any unexpected shutdowns.
  2. Adding new vNICs or vHBAs, saving the change, and then changing the vNIC/vHBA placement order.  If you only add new adapters and don’t modify the order, then UCS Manager prompts for confirmation as you would expect.  If you add the adapters and then modify the placement order without clicking the Save button first, then you are also prompted for confirmation.  However, if you add the adapters, save changes, and then change the placement order, UCS Manager will just reset the blades to apply the changes without warning.


This can be a very bad thing if you are changing a template that is bound, for example, to all of your VMware ESXi blades or all of your Citrix Xenserver blades.

In the earlier days of UCS, there was no maintenance policy option and some customers inadvertently reset all of their blades at the same time without realizing the impact.  Back then the standard procedure was to unbind all of the service profiles from the updating template, modify the template, and then one-by-one you could bind the service profiles back to the template after putting the particular host into maintenance mode and/or shutting it down.

This week I had to set up Windows 2008 R2 Enterprise Edition (with SP1) on a Cisco UCS B200 M2 blade, and ran into a problem with the Windows installation.  This blade was being installed in a boot-from-SAN environment, even though it had 2 internal disk drives.  I have set up a blade like this before, but it was several months ago.

I reviewed the UCS Windows Installation Guide before starting, to make sure I didn’t forget any steps.  Basically you are required to only present 1 path to the OS installation drive during the installation, and you must load the Cisco VIC storage driver from the UCS B-series driver ISO before Windows will be able to discover the SAN LUNs.  Once the installation is done you can present the remaining paths and enable Windows MPIO (or install a 3rd party multipathing driver like PowerPath).

In this case, I was able to get the blade to discover the boot LUN, but Windows setup would not let me use it for installation.  I was able to create and format partitions, but could not get the setup program to allow the partition to be used for Windows.

In the process of troubleshooting, I tried a few things to try to get the installation to work.

  1. Removed all LUNs from the storage group that were not needed for the Windows OS.  In this case there was 1 additional LUN that was removed from the group.
  2. Changed the host initiator registration on the EMC VNX array from failover mode 4 (ALUA enabled) to failover mode 1 (legacy active/passive mode).
  3. Verified that the OS boot LUN was owned by the correct storage processor based on the single-path initiator registration.
  4. Tried using the previous version Cisco B-series driver ISO (1.4.2 instead of 2.0.1)
  5. Changed the local storage policy so that the 2 internal drives were configured for RAID-1 instead of standalone drives (Windows installation only saw 1 logical disk instead of 2 disks).


Finally, I got the installation to work by using an older ISO driver download (1.4.1g).  I don’t know why the installation program did not like the driver provided with the 1.4.2 or 2.0 release ISO.  Perhaps because the Cisco M1K8R adapter in this blade was still using a 1.4.1 firmware release.

Once Windows was installed, I installed the Unisphere host agent which automatically registered the other 3 initiator paths.  I then enabled the Windows feature for native MPIO, and enabled management of the “DGC VRAID” device class.  I ran into a problem though: MPIO was only showing 3 paths instead of the expected 4 paths.  I tried doing some manual SP and HBA failover tests to see if it would “pick up” the missing path, but all I managed to do was crash the operating system.  Obviously MPIO was not happy.

I reviewed EMC’s host connectivity guide for Windows (available on under Support -> Technical Documentation and Advisories -> Host Connectivity/HBAs -> Installation/Configuration) to make sure I wasn’t missing any steps.  The VNX array was already running block OE Flare 31 or later, and when I checked the connectivity status screen for the host it showed failover mode 4 (ALUA) which is required for Server 2008 native MPIO.  However, I suspected that not all 4 paths were using failover mode 4, since I had previous changed the initial path (for Windows OS installation) to failover mode 1.  That would explain why MPIO was only showing 3 paths instead of 4, and having some unpredictable results.  With the blade powered off I changed the host registration to failover mode 1, then back to failover mode 4 so that it would configure all 4 paths the same way.

After that change, I powered the blade back up and verified that MPIO was seeing all 4 paths.  I was then able to successfully test SP failover by trespassing the LUN, making sure the OS was still functional, and then trespassing the LUN back to the original owner.  I also tested HBA/fabric failover by removing the WWPN zone configuration for that blade from the fabric A SAN switch and making sure MPIO was communicating using fabric B.  I then restored the original configuration and repeated the test for the fabric B SAN switch.