Last week I had some time to set up VMware VSA (vSphere Storage Appliance) in my home lab. VSA allows you to use local host storage (SATA or SAS drives) as shared storage with a built-in NFS server and automatic replication between cluster hosts.
To be officially supported by VMware, the storage controller in your servers has to be on the hardware compatibility list. There are many controllers from LSI, Dell, HP, and others listed. From what I could determine, software-based onboard RAID is not supported. The hosts must have a minimum of 4 gigabit NICs, 6GB of RAM, and a maximum of 72GB of RAM, although higher amounts of RAM should work but is not tested by VMware. 2.0GHz dual core is the lowest supported CPU configuration.
For my lab environment, I already have 2 HP servers configured with shared iSCSI storage running ESXi 5.1, but they do not have local RAID controllers nor 4 NICs each (only 2 NICs). I decided to create a “nested ESXi” install so that I could create 3 nodes for a VSA cluster. I deployed 3 new VMs with 2 vCPU sockets, 6GB RAM, 4 x E1000 NICs, and 250GB of disk space (thin). I created a new VLAN on my lab switch and Nexus 1000v distributed switch to use for the VSA back-end network. I also created a new port profile on the Nexus 1000v that was configured as a VLAN trunk instead of access port. That way my nested ESXi guest VMs can access all of the VLANs with dot1q tagging.
To allow nested ESXi to work with 64-bit guests, I had to enable hardware virtualization in the VM settings through the vSphere Web Client.
I then installed ESXi 5.1 on each of the 3 VMs, using the local 250GB storage. Assigned static management IP addresses and hostnames, added DNS entries for the new servers. Once the 3 servers were online, I added the first host to my existing vCenter server temporarily. That way I could use my existing Server 2008 R2 template to deploy a new VM to use for the VSA vCenter. You don’t need to set up a new vCenter just for VSA, you can use an existing one that is version 5.1. I wanted to create a new one to simulate a typical customer deployment where there is not an existing vCenter. If you don’t have an existing template, you can skip adding the host to an existing vCenter and just create a new empty VM directly on the server using the vSphere Client. Install Server 2008 R2 follow standard procedures for preparing a server for vCenter 5.1. Make sure you join the VM to the domain.
I installed a new vCenter (version 5.1.0b) onto the 2008 R2 VM using the simple install method, since this was just a test environment. For a production install you may want to use the traditional component-based install method. After vCenter and its relevant components were installed, I installed the VSA manager 126.96.36.199 software on the server.
Before I could proceed to creating the new VSA cluster, I had to make a change in the C:\Program Files\VMware\Infrastructure\tomcat\webapps\VSAManager\WEB-INF\classes\dev.properties file on my VSA vCenter server. Since this was a nested ESXi environment, EVC mode will not work. By default VSA requires EVC to be enabled, but you can override it in the properties file. I had to change the evc.config value from “true” to “false”. For brownfield deployments (where VMs are already running on the VSA hosts), you have to change the evc.config.baseline value to “highest” instead of “lowest”. This includes environments where vCenter has already been installed onto one of the VSA hosts, like in my lab example here. Since I was disabling EVC altogether, I didn’t have to change that value. Greenfield deployments do not require any changes, but that means you can’t have any VMs running on the hosts that are to be configured for VSA (vCenter must be physical or hosted on another vSphere host outside of VSA).
After changing the dev.properties file, I had to restart the VirtualCenter Management Webservices and VirtualCenter Server. Then I added my 3 hosts to vCenter in a new Datacenter object. I did not create a cluster yet, just left the hosts as standalone. There was one more setting that I needed to change before I could run the VSA wizard. I had to change the VMFS heap size on the host that was running my vCenter 5.1 VM. This must be done for any host that you are adding to the VSA cluster, that is already running VMs. If the host does not have any running VMs, you can let the wizard change it automatically for you and it will reboot the host. To change the heap size, go to the Advanced System Settings for the host and filter for VMFS. Change the parameter to 256.
Now I was ready to create the VSA cluster. To get to the VSA installer wizard, open the regular vSphere Client and connect to vCenter. Select the datacenter that contains the 3 hosts (or 2 hosts if you are making a 2-node cluster) for VSA, then click on the VSA Manager tab at the far right. If you don’t see the tab, the plugin may not be enabled yet. Check in the Manage Plugins window to see if it is disabled. The installer will ask you to select your hosts, and will let you know if there is a configuration problem. For brownfield deployments where you have existing VM networks, vMotion, fault tolerance etc enabled you will have to make sure the network configuration is prepared ahead of time. In my case, I had not changed the default ESXi network configuration (Management Network and VM Network only) on vmnic0, so I let the wizard do the work for me.
If you get a pop-up warning about VSA deleting local data on the hosts, this does not refer to the local VMFS datastore. It only refers to previously configured VSA storage that may exist on the host. See the VMware KB article for more information: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2048059
The network configuration for a 3-node VSA cluster requires 10 static IP addresses, or 7 static IP addresses + 3 DHCP addresses. I did not use DHCP for my lab build. This does not include the static IP for vCenter and for the ESXi host management (total of 14 if you include those). For each host you will need a VSA management IP, vSphere Feature (vMotion) IP, and a back-end network IP. The back-end network should be a separate VLAN if possible with different subnet. You also need one VSA cluster network IP address.
You will have to decide how much storage to allocate to VSA as part of the installer wizard. If there are no local VMs on the hosts, you will be able to use most of the storage for VSA (I had 13GB free on each local datastore when VSA was maxed out). Since I had a 40GB vCenter VM on host 1, I could not use the full amount initially for VSA. The size can be increased later, although during the process there will be performance impact to any VMs running on the VSA datastores.
After deciding on a cluster size, the installer automatically takes care of the rest of the process. It will make the heap size change to any hosts that still need it, and reboot them. Then the process picks up when the hosts are connected back to vCenter. It automatically sets up the vSwitches, VM networks, and vMotion. You will have a new cluster called VSA HA Cluster that has HA enabled (but DRS is not automatically enabled). EVC mode will be enabled unless you changed the properties file to disable it. It will set HA admission control to 33% reserved CPU and Memory for a 3-node cluster, and it also changes the HA restart priority for the VSA appliances to High. The VSA wizard also enables VM Monitoring with a failure interval of 60 seconds, minimum uptime of 120 seconds, and maximum per-VM resets 3 in a 72 hour window. VM monitoring is not something that I normally enable unless there are VMs that “lock up” on a regular basis, so it was interesting to see that VMware turns it on by default here.
During the installer process, I saw a triggered alarm in the Web Client that said “vSphere HA virtual machine failover failed”. I’m not sure why I got this alarm, but there was no apparent host failure. If there is a critical error during the install process, it will back out the changes made to the hosts (with the exception of the heap size, and vSwitch0 had vmnic2 as the uplink instead of vmnic0). I went through this process once, because I initially forgot to add the back-end network to my lab switch. The VSA appliances couldn’t talk to each other over the back-end network. There is no progress indicator for the back-out process, you just have to wait about 10 minutes while VSA finishes cleaning up everything. The second time I ran the installer, after adding the VLAN, everything went smoothly.
The VSA appliances run SUSE Linux Enterprise Server 11 SP2 and they have VMware tools installed. Each appliance exports one NFS shared datastore to the cluster, which is half the size of the local VSA space on the host. The other half of the space is used as a mirror destination for another VSA host. If one appliance fails, the replica for its NFS share is mounted read/write on the surviving appliance.
After my VSA cluster was online, I used storage vMotion to migrate the vCenter VM to one of the new datastores. This allows the VSA cluster size to be increased to use the local VMFS space that vCenter was previously consuming. It also provides vMotion and HA capabilities for vCenter. However, there are some downsides to moving vCenter to the VSA datastore. Should the VSA datastore that stores the vCenter VM files fail to come online, there will be no way to access VSA Manager. Any troubleshooting would have to be done with the command line directly on the VSA appliance. I did test a full power-down of the VSA cluster after moving vCenter to the datastore. I shut down the vCenter VM first, then shut down one VSA appliance at a time. With the VSA appliances offline, all VMs that are using those NFS datastores will show as inaccessible. I then powered the VSA appliances and waited for the NFS datastores to become available. I then powered up vCenter.
Another issue with moving vCenter to the VSA datastore is that you cannot change the networking configuration of the cluster without moving vCenter off to a different location. All VMs except for the VSA appliances have to be powered off on the VSA datastores before you can change the network configuration.
In order to do maintenance work on one of the VSA cluster hosts, you first have to put the VSA appliance on that particular host in appliance maintenance mode. This is selected from VSA Manager in the appliances view. You can only put one appliance into maintenance mode at a time; this is true for both 2-node and 3-node clusters. When the appliance enters maintenance mode, it will shut down and the datastore replica will become active on the paired appliance that is still running. You can then move any remaining VMs off the host (or shut them down), and put the host into maintenance mode the usual way. The datastores that are affected by the powered-off appliance will show degraded operation in VSA Manager:
Degraded datastores are still accessible by hosts, but they are not being replicated so there is risk of data loss while the VSA appliance remains in maintenance mode. When the maintenance on the host is completed, first take the host out of maintenance mode. Then you will have to power on the VSA appliance that was previously shut down. Wait for the appliance to boot which takes a few minutes. Then you can go to the VSA Manager interface and take the appliance out of maintenance mode.
Shortly after the appliance exits maintenance mode, you will see a data synchronization task appear in the recent tasks pane. VSA has to sync all of the data block changes that occurred to the datastore during the maintenance window. Until this synchronization is completed, the datastore will still be exported by the appliance that was online during maintenance mode.
Although it may be tempting to put the next appliance into maintenance mode while the synchronization is happening, I would not recommend trying this. In my lab environment I had one of the NFS datastores go offline when I put another appliance into maintenance mode before the sync had completed. Make sure you allow plenty of time for each host maintenance, especially if there is a high change rate on the datastores which means synchronization will take longer.