Skip navigation

Last week I had the great opportunity to attend VMware Partner Exchange 2012 in Las Vegas.  I was joined by a number of other Varrow employees from management, pre-sales, and post-sales.

I signed up for sessions before the conference but did not have any scheduled activities or boot camps on Monday the 13th so I used the day to take advantage of VMware’s hands-on labs.  They had 28 different labs to choose from, covering the whole range of VMware products and solutions.  I tried to focus on labs where I had little previous knowledge or exposure, and therefore got a lot of value out of some quality hands-on time with real life scenarios presented by VMware.

The five labs I attended on Monday were:

  • HOL09 – Improve Troubleshooting and Performance Tuning for Your Virtual Environment
  • HOL10 – Advanced Troubleshooting and Performance Tuning for Your Virtual Environment
  • HOL01 – Building Your Hybrid Cloud
  • HOL07 – Using Virtual Distributed Switches and Network I/O Control in Your Network
  • HOL25 – Cisco – Deploying vCloud Director with Nexus 1000V

 

I also attended 2 more labs on Thursday:

  • HOL28 – Simplifying Patch and IT Tasks on Your Physical and Virtual Machines
  • HOL05 – Datacenter Migration and Disaster Recovery Protection for Your Virtual Environment

 

The labs were my first hands-on experience with vCloud Director and vCenter Protect.  Also my knowledge of SRM was very limited so I appreciated getting to run through some example scenarios with the latest SRM 5.0.  vCenter Orchestrator and vCenter Chargeback were also briefly featured in the HOL01 lab.  Getting to see NetFlow in action on a vCenter Distributed Switch 5.0 was very cool, and I can see why a lot of network administrators will enjoy having that in environments that aren’t using the Nexus 1000V.  Most of the labs were timed very well, and I usually only had about 10 minutes left over working at a steady pace.

Most of the rest of my time at Partner Exchange was spent in the break-out sessions.  These were available on Tuesday, Wednesday, and Thursday.  There were general sessions on Tuesday and Wednesday for all attendees that were both informative and motivating.  On Tuesday I attended these sessions:

  • SRM 5 Demo – New Features in Action and Q&A
  • Metering and Billing in Cloud with vCenter Chargeback
  • Selling vSphere Storage Appliance (VSA) Successfully
  • Oracle Databases on vSphere5 Best Practices
  • Everything Back-up – VMware vSphere, vCloud, and View

 

I learned the most in the Chargeback and VSA sessions, since I had never used either of these products before in a lab or production environment.  As more companies are looking to deliver IT as a service, Chargeback really is a great way to account for those costs based on usage.  VSA is a very new product for VMware and as such has some limitations, but the new 1.5 version will address the biggest ones and hopefully result in more deployments in the SMB market where a full-fledged networked storage array is not required.

On Wednesday I attended the following sessions:

  • Design, Deploy, Optimized SQL Server on VMware vSphere 5
  • Virtualizing Unified Communications Systems with vSphere and View
  • Design, Deploy and Optimize Exchange 2010 on vSphere

 

All three of these sessions were Tier 1 application focused, which is one of the areas VMware hopes to grow in 2012.  A lot of companies still rely on dedicated physical servers for their tier 1 mission-critical applications.  In these sessions VMware wanted to share with their partners that they have extensively tested and benchmarked tier 1 applications on vSphere 5.0 and they are confident it can handle the performance and uptime demands.

On Wednesday I also passed my VCP-510 exam with PearsonVue, which was held right there at the conference.  Those who currently have a VCP4 certification can test for VCP5 before February 29th without having to attend a VMware education class.

On Thursday I attended these two sessions:

  • Compliance and Security: A holistic approach from the bottom up
  • Up and Running with vSphere vCenter Server Appliance (VCSA)

 

The compliance session talked mainly about vShield and vCenter Configuration Manager with respect to ITIL and PCI-DSS requirements.  The VCSA session was enlightening, because I do not have much experience with the vCenter appliance  and was not fully aware of all the limitations with the current version.  There is some elegance in the simplicity of an appliance-based vCenter but for a lot of environments it will not be the right choice, at least in the currently offered version.

This week while working on a VM migration project I was initially confounded by an extra port group showing up on VMs that only had a single NIC listed in their configuration.  We had been migrating VMs from ESX 4.0 hosts to new ESXi 5.0 hosts, upgrading VMware Tools and VM hardware, and migrating the VM networking from standard vSwitch to a new distributed vSwitch.

We tried deleting the NIC from the VM and adding it back, but still two networks appeared on the summary page:

 

 

 

 

Next we tried deleting the NIC and leaving it off, but the VM still showed the standard vSwitch network.  The dvSwitch network had been removed but not the standard one.  Next we removed the VM from inventory and added it back using the datastore browser, but the standard port group still appeared.

After puzzling over this for a while I realized the common factor among the VMs that were showing an extra standard port group.  They had all had a snapshot taken (prior to VM hardware upgrade) before their network port group was migrated from the standard vSwitch to the dvSwitch.  The VMs that did not have an extra network had been migrated to the dvSwitch before we did the VMware Tools and VM hardware upgrade.

After we deleted the existing snapshots on some of the VMs that were showing an extra network, they displayed just the single distributed port group on the summary page.  I’m sure this behavior is by design, so that a VMware administrator won’t accidentally delete a port group that may still be needed if a VM is rolled back to a previous snapshot.  If we had migrated all the VMs to the dvSwitch before starting the other upgrades, we would not have seen any VMs with extra port groups.

A warning to those who are running a Cisco UCS blade system and using a maintenance policy that requires user acknowledgement:

Normally the maintenance policy will prompt for user confirmation before any configuration change is applied that requires a blade power cycle.  Like changing BIOS or boot policies, adding vNICs or vHBAs, etc.  This is especially important when you are using an updating service profile template that is bound to multiple service profiles.  Any change at the template level will propagate to the bound profiles.

In my work with UCS internally and at customer sites, I have found 2 scenarios (so far) that will not prompt for any user confirmation and will just immediately change the power state of the service profile:

  1. Modifying the “desired power state” of the service profile template.  If you change this setting from “on” to “off” the associated blade will just shut off with no warning.  Also if you have an updating template with desired state “off” any time you make changes to the template, all of the bound service profiles will revert their power state to off without warning.  My recommendation is to always leave this set to “on” so that you don’t have any unexpected shutdowns.
  2. Adding new vNICs or vHBAs, saving the change, and then changing the vNIC/vHBA placement order.  If you only add new adapters and don’t modify the order, then UCS Manager prompts for confirmation as you would expect.  If you add the adapters and then modify the placement order without clicking the Save button first, then you are also prompted for confirmation.  However, if you add the adapters, save changes, and then change the placement order, UCS Manager will just reset the blades to apply the changes without warning.

 

This can be a very bad thing if you are changing a template that is bound, for example, to all of your VMware ESXi blades or all of your Citrix Xenserver blades.

In the earlier days of UCS, there was no maintenance policy option and some customers inadvertently reset all of their blades at the same time without realizing the impact.  Back then the standard procedure was to unbind all of the service profiles from the updating template, modify the template, and then one-by-one you could bind the service profiles back to the template after putting the particular host into maintenance mode and/or shutting it down.

This week I had to set up Windows 2008 R2 Enterprise Edition (with SP1) on a Cisco UCS B200 M2 blade, and ran into a problem with the Windows installation.  This blade was being installed in a boot-from-SAN environment, even though it had 2 internal disk drives.  I have set up a blade like this before, but it was several months ago.

I reviewed the UCS Windows Installation Guide before starting, to make sure I didn’t forget any steps.  Basically you are required to only present 1 path to the OS installation drive during the installation, and you must load the Cisco VIC storage driver from the UCS B-series driver ISO before Windows will be able to discover the SAN LUNs.  Once the installation is done you can present the remaining paths and enable Windows MPIO (or install a 3rd party multipathing driver like PowerPath).

In this case, I was able to get the blade to discover the boot LUN, but Windows setup would not let me use it for installation.  I was able to create and format partitions, but could not get the setup program to allow the partition to be used for Windows.

In the process of troubleshooting, I tried a few things to try to get the installation to work.

  1. Removed all LUNs from the storage group that were not needed for the Windows OS.  In this case there was 1 additional LUN that was removed from the group.
  2. Changed the host initiator registration on the EMC VNX array from failover mode 4 (ALUA enabled) to failover mode 1 (legacy active/passive mode).
  3. Verified that the OS boot LUN was owned by the correct storage processor based on the single-path initiator registration.
  4. Tried using the previous version Cisco B-series driver ISO (1.4.2 instead of 2.0.1)
  5. Changed the local storage policy so that the 2 internal drives were configured for RAID-1 instead of standalone drives (Windows installation only saw 1 logical disk instead of 2 disks).

 

Finally, I got the installation to work by using an older ISO driver download (1.4.1g).  I don’t know why the installation program did not like the driver provided with the 1.4.2 or 2.0 release ISO.  Perhaps because the Cisco M1K8R adapter in this blade was still using a 1.4.1 firmware release.

Once Windows was installed, I installed the Unisphere host agent which automatically registered the other 3 initiator paths.  I then enabled the Windows feature for native MPIO, and enabled management of the “DGC VRAID” device class.  I ran into a problem though: MPIO was only showing 3 paths instead of the expected 4 paths.  I tried doing some manual SP and HBA failover tests to see if it would “pick up” the missing path, but all I managed to do was crash the operating system.  Obviously MPIO was not happy.

I reviewed EMC’s host connectivity guide for Windows (available on powerlink.emc.com under Support -> Technical Documentation and Advisories -> Host Connectivity/HBAs -> Installation/Configuration) to make sure I wasn’t missing any steps.  The VNX array was already running block OE Flare 31 or later, and when I checked the connectivity status screen for the host it showed failover mode 4 (ALUA) which is required for Server 2008 native MPIO.  However, I suspected that not all 4 paths were using failover mode 4, since I had previous changed the initial path (for Windows OS installation) to failover mode 1.  That would explain why MPIO was only showing 3 paths instead of 4, and having some unpredictable results.  With the blade powered off I changed the host registration to failover mode 1, then back to failover mode 4 so that it would configure all 4 paths the same way.

After that change, I powered the blade back up and verified that MPIO was seeing all 4 paths.  I was then able to successfully test SP failover by trespassing the LUN, making sure the OS was still functional, and then trespassing the LUN back to the original owner.  I also tested HBA/fabric failover by removing the WWPN zone configuration for that blade from the fabric A SAN switch and making sure MPIO was communicating using fabric B.  I then restored the original configuration and repeated the test for the fabric B SAN switch.

This week I did an upgrade from vSphere 4.1 to vSphere 5.0 and encountered a few issues that were fortunately quite easy to fix, but wanted to mention them here in case anyone else runs into the same situation.

There’s a component of the vCenter installation media that has to be installed for License Reporting to work, called vSphere Web Client (Server). You have to install Adobe Flash on the server that runs the Web Client Server (doesn’t have to be the same machine as vCenter).  Once that is installed, you have to register the vCenter Server with the Web Client Server (even if they coexist on the same host).

See http://virtualisedreality.com/2011/07/15/vsphere-5-new-vsphere-web-client/ for screenshots of the registration process.

Another thing that I ran into this week after upgrading from vCenter 4.1 to vCenter 5 is that the vCenter health status showed a red error for converter (even after uninstalling the vCenter Converter 4.1 from the system).  I had to follow the steps in this blog post to remove the error: http://blog.alanrocks.com/?p=171.  There’s no vCenter Converter 5 plug-in (only standalone) so you can’t just install a new version to fix the service status condition.

Setting up syslog collector (can change the host destination through GUI, CLI or host profile)
http://blogs.vmware.com/esxi/2011/07/setting-up-the-esxi-syslog-collector.html

Setting up ESXi coredump collector (can change the host destination through CLI or host profile)
http://blogs.vmware.com/esxi/2011/07/setting-up-the-esxi-50-dump-collector.html

I found it helpful to create desktop shortcuts to the data directories, since both of them by default are inside the hidden ProgramData directory (on 2008 R2 at least).  The folder for syslogs is displayed when you click on the syslog screen in vCenter, but the dump collector location is not readily found after installation.

Lastly for the ESXi hosts that were upgraded from 4.1 to 5.0 I had to set the Advanced Setting UserVars.SuppressShellWarning = 1 to hide the yellow caution for ESXi Shell and SSH enabled.  These hosts previously had the local shell and SSH access enabled, and in vCenter 4.1 the warning would automatically disappear when the host was rebooted.  In vCenter 5.0 that warning is persistent unless you change the setting mentioned above.

I worked in the lab today trying out the new auto-deploy feature of vSphere 5.  I used the official VMware documentation as a reference and also followed a couple great blog posts that walk through the setup process.  Gabe’s has screen shots as well which was excellent.

http://www.gabesvirtualworld.com/vsphere-5-how-to-run-esxi-stateless-with-vsphere-auto-deploy/

http://www.yellow-bricks.com/2011/08/25/using-vsphere-5-auto-deploy-in-your-home-lab/

The lab equipment I was working with included a Cisco 3750 switch as DHCP server and gateway, and a pair of Cisco UCS B200 M1 blades connected up to an EMC fiber channel array.  Both of the blades were running ESXi 4.1 before I got started, using boot from SAN, and we already had a vCenter 5.0 VM running on one of the blades.

The first thing I did was make sure no VMs were running on the blade that I intended to switch from SAN boot to PXE auto-deploy boot.  Then I shut down the blade and modified the boot policy in UCS manager so that the service profile was configured to boot from LAN instead of the EMC SAN targets.

The next few steps were easy, installing the VMware vSphere Auto Deploy software on our vCenter 5.0 server and setting up a free Solarwinds TFTP server.  After I extracted the TFTP boot files into the C:\TFTP_Root directory I did a quick test from my laptop with a tftp GET command to make sure the server was running and accessible.

After that the Cisco 3750 configuration had to be modified to point the DHCP clients to the correct IP and bootfile for TFTP.  The TFTP server address is specified by “next-server” and the filename is specified by “bootfile” in IOS.  If your DHCP server is in a different subnet than the TFTP server and ESXi management NICs you will need ip helper address specified.  For this test I did not reserve an IP address for the host but that is typically what would be done (by MAC address association) so that the ESXi server will get the same IP address every time from the DHCP server.

At that point I was able to get the UCS blade to run the PXE bootloader, but ESXi 5.0 will not automatically load until the other configuration steps are completed.  I went ahead and installed the latest PowerCLI for vSphere 5 onto the vCenter server and added in the depot image and configured a profile as indicated in the instructions and the other blog posts above.  After that I expected the blade to boot right up into ESXi 5.0 but that was not the case.  I was getting the same screen that I had before there was any image profile.

After some frustrated troubleshooting I decided to try deleting the host from vCenter.  Since it had previously been loaded with ESXi 4.1 and added to vCenter, it showed up as a disconnected host in the inventory.  Once I had deleted it from vCenter, the next time the blade attempted a PXE boot it made it through the ESXi 5.0 boot process.

Once ESXi 5 was running, the host was automatically added to the vCenter inventory.  I only had 1 datacenter configured so the host was placed there, but not as part of any cluster.  I used the KVM console to set the root password and enable ESXi console and SSH service.  Then I configured the usual settings from vCenter like NFS datastores, vSwitch networking, vMotion, NTP, and DNS.

In order to provide network syslog and coredump support, I installed those two components on the vCenter server from the vCenter installation ISO using the default ports.  I configured the reference ESXi 5.0 host to use the vCenter IP address for coredumps and syslog.

The next step was to create a host profile, attach it, apply it, and finally check compliance.  Then I had to create a new deploy rule that included the host profile name as well as the cluster name to which I wanted the hosts to belong. The previous deploy rule was deactivated and the new one activated with a pattern that specified IP addresses in the DHCP range for ESXi management.

After rebooting the reference host I found there were a few things that had to be tweaked.  My host profile didn’t have a hard-coded root password so the host defaulted back to a blank root password.  Also the SSH service was disabled again even though I had enabled it before creating the profile.  Both of those were corrected by editing the host profile directly.

When I get more time I want to test out the pooled licensing, PowerPath VE 5.7, and the latest Nexus 1000v that supports stateless ESXi 5.

I have one of those USB to serial adapters with the (very common) Prolific PL-2303 chip inside, and it has worked great in Windows 7 for serial connections to Cisco devices.  I’ve used it for MDS, Catalyst, and UCS equipment.  However, while working on a Nexus 7010 switch today I was having some problems with it.  For some reason while copying files from the onboard Nexus USB port to bootflash, my PC would “blue screen” and Windows crashed.

I decided to try using the Prolific adapter on my Macbook instead of the Thinkpad, and had to install the OS X drivers here (listed for Snow Leopard, but they worked on Lion).  After rebooting, I had a new /dev/tty.usbserial device :)

I did some quick searches on possible serial terminal programs to use on OS X, since there isn’t a binary version of PuTTY for Mac.  Turns out there is a simple program included with Lion called “screen” that you run from a normal Terminal session.  In this case I used the command “screen /dev/tty.usbserial 9600″ to connect at 9600 baud.

After the switch configuration was done, I closed out of the serial session by pressing Control+A and then Control+\ as per the instructions I found online and in the man file.  The screen program sends through every keystroke directly except for Control+A, which is how you interact with the program and not the device connected by serial cable.  If you have to send a break signal through the serial connection (for password recovery, Solaris systems, etc) you press Control+A and then B.

So I came home from the Apple store today with one of those new 13″ Macbook Air laptops that was released last week.  I bought the highest available configuration, with the 1.8 Ghz Core i7 processor and a 256GB SSD.  Go large or go home, especially on a system that has zero upgradable components (including RAM). I lucked out and got one of the units with a Samsung AXM09A1Q drive, which benchmarks faster than the equivalent Toshiba SSD.  I got 248 MB/sec write and 265 MB/sec read on a fresh system using the DiskSpeedTest app (available on the OS X Lion App Store).

I am not completely new to the Mac universe, about 2 years ago I had a Macbook Pro 17″ model that was configured to dual boot with Windows Vista (and later Windows 7).  I sold that laptop, because I wasn’t using OS X all that much and the laptop was overkill for my needs anyway. I learned enough about OS X to customize my system the way I wanted it, but never got to an “expert” level of skill.

My personal laptop for the last ~1.5 years has been the ultraportable Dell Adamo, which was originally released in 2009 as a Macbook Air competitor but initially had a very high sticker price for a Dell branded system. I waited and bought one when they dipped below $1000, and it has been a solid machine with no functional problems running Windows 7 64-bit. The 128GB Samsung SSD in that system makes it feel a lot faster than you would expect of a Core 2 Duo 1.2 GHz system with only 2GB of RAM.

Technology has advanced a lot since 2009, and the mid-2011 Macbook Air refresh provides Intel Sandy Bridge architecture with the Core i5/i7 hyperthreaded, turbo boost enabled CPUs and 10 Gb/sec Thunderbolt. There have been a lot of benchmark articles already showing the incredible performance of these ultra-low volt processors. The integrated graphics controller is still a bit limited, but the target audience for this system is not gamers. I’m still curious to see whether Thunderbolt ends up becoming a popular connection bus, or if it dies out after a couple years when Apple moves on to the next great technology. The new 27″ Apple display that uses Thunderbolt is quite tempting, because it essentially combines docking station and LCD into one package with minimal cabling required. The price was better than I expected given the additional electronics required inside to support that.

I’ve only installed a few applications today on OS X Lion, but a couple of the first on my list was the Microsoft Remote Desktop client and the Live Mesh client. I couldn’t believe it, but the Live Mesh client for Mac is better for syncing files than the Windows version of the client! You can actually specify what destination folder to use, instead of Live Mesh just assuming that you want to place it into the My Documents folder. Also you get some nice progress bar indicators while the folders are syncing from your other PCs (or from the Live cloud SkyDrive). Unfortunately, you can’t use the Remote Connect feature from OS X, so that means you can’t remotely manage systems running Windows 7 Home or Home Premium. The regular RDP client will work for the other editions of Windows.

Paul Thurrotts’s supersite for Windows also has a nice blog post about Live Mesh on Mac OS X Lion.

This week I was on-site at a customer that uses Cisco Fabric Manager quite extensively for monitoring their SAN environment.  We were adding a new UCS cluster to the SAN and they wanted to be able to use Fabric Manager to monitor the status of the UCS equipment.

Since Fabric Manager uses SNMP to communicate with the UCS fabric interconnects, you have to enable SNMP, configure the SNMP community string in UCS Manager, as well as create a SNMP user account (“admin” is not permitted to be a SNMP user, you must pick a different name).  That same SNMP account has to be created as a locally authenticated account with aaa and admin roles for full functionality.  If the existing MDS switches use the “admin” account for SNMP you will have to configure them with the same SNMP user that was created in UCS Manager.  We also specified the Fabric Manager server as a SNMP trap receiver, although that step was probably not necessary.

Cisco SNMP configuration example.

You may have to delete the existing fabric (from Fabric Manager, not from the MDS switches) and rediscover in order for the new credentials to be used.  If performance monitoring is enabled make sure to disable that before attempting to delete the fabric.

We encountered a warning message in Fabric Manager for both of the UCS fabric interconnects that said “No Traps” highlighted in yellow.  That message went away automatically once the UCS hardware had some SNMP traps .  Also keep in mind that UCS only allows read-only operations through SNMP, so if you attempt to change a port configuration it will throw an error message.

We were using Fabric Manager 5.0(4) and UCS firmware 1.4(3l).

One of my previous Computer Science professors at NCSSM recently posted on Facebook about this web site: www.khanacademy.org.  It is an amazing resource; when I visited their site a few days ago I ended up watching a dozen videos in a row.  Salman Khan is the educator behind the voice in the videos, several years ago he quit his job working in hedge funds and now runs Khan Academy as a non-profit organization.  He was on the Colbert Report in June to talk about his site and how lots of school-age kids are now using it as a Youtube-based lecture source.

So far I have focused on the videos about the Credit Crisis and banking/finance, because I find it personally intriguing and because I just recently finished reading Michael Lewis’s book called “The Big Short” that talks about the people who hedged their bets against the failure of the subprime mortgage market back in 2007 – 2008.  There are lots of other videos (~2400 total) that cover topics from Algebra to Biology to Computer Science and so on.

This site is just one example of the way Youtube can be used for educational purposes.  I also enjoy watching useful “how-to” videos on the site, for topics ranging from technology to home improvement.