nsxchecker: Verify the health of your NSX network

screen-grabs-linksys-internet

Recently I got to work with the NSX API and write a tool to do a quick health check of NSX networks.

nsxchecker is a valuable operational tool to quickly report a NSX network’s health.  One of the promises of SDN is automated tooling for operational teams and with the NSX API I was quickly able to deliver.

Screen Shot 2014-10-06 at 17.00.10

nsxchecker accepts a NSX lswitch UUID or a neutron_net_id. Rackspace’s Neutron plugin, quark, tags created lports with a neutron_net_id. nsxchecker requires administrative access to the NSX controllers.

Neutron itself supports probes but it had a couple of drawbacks:

  1. It doesn’t work with all implementations
  2. For a large network, it’s slow

There’s more details in the README on github.

Operating OpenStack: Monitoring RabbitMQ

At the OpenStack Operators meetup the question was asked about monitoring issues that are related to RabbitMQ.  Lots of OpenStack components use a message broker and the most commonly used one among operators is RabbitMQ. For this post I’m going to concentrate on Nova and a couple of scenarios I’ve seen in production.

Screen Shot 2014-08-28 at 11.27.55

It’s important to understand the flow of messages amongst the various components and break things down into a couple of categories:

  • Services which publish messages to queues (arrow pointing toward the queue in the diagram)
  • Services which consume messages from queues (arrow pointing out from the queue in the diagram)

It’s also good to understand what actually happens when a message is consumed. In most cases, the consumer of the queue is writing to a database.

An example would be for an instance reboot, the nova-api publishes a message to a compute node’s queue. The compute service running polls for messages, receives the reboot, sends the reboot to the virtualization layer, and updates the instance’s state to rebooting. 

There are a couple of scenarios queue related issues manifest:

  1. Everything’s broken – easy enough, rebuild or repair the RabbitMQ server. This post does not focus on this scenario because there is a considerable amount of material around hardening RabbitMQ in the OpenStack documentation.
  2. Everything is slow and getting slower – this often points to a queue being published to at a greater rate than it can be consumed. This scenario is more nuanced, and requires an operator to know a couple of things: what queues are shared among many services and what are publish/consume rates during normal operations. 
  3. Some things are slow/not happening – some instance reboot requests go through, some do not. Generally speaking these operations are ‘last mile’ operations that involve a change on the instance itself. This scenario is generally restricted to a single compute node, or possibly a cabinet of compute nodes.

Baselines are very valuble to have in scenarios 2 and 3 to compare normal operations to in terms of RabbitMQ queue size/consumption rate. Without a baseline, it’s difficult to know if the behavior is out of normal operating conditions. 

There are a couple of tools that can help you out:

  • Diamond RabbitMQ collector (code, docs)- Send useful metrics from RabbitMQ to graphite, requires the RabbitMQ management plugin
  • RabbitMQ HTTP API – This enables operators to retrieve specific queue statistics instead of a view into an entire RabbitMQ server.
  • Nagios Rabbit Compute Queues – This is a script used with Nagios to check specified compute queues which helps determine if operations to a specific compute may get stuck. This helps what I referred to earlier as scenario 3. Usually a bounce of the nova-compute service helps these.  The script looks for a local config file which would allow access to the RabbitMQ management plugin. Example config file is in the gist.
  • For very real time/granular insight, run the following command on the RabbitMQ server:
    •   watch -n 0.5 ‘rabbitmqctl -p nova list_queues | sort -rnk2 |head’

Here is an example chart that can be produced with the RabbitMQ diamond collector which can be integrated into an operations dashboard:

Screen Shot 2014-08-28 at 11.19.18Baseline monitoring of the RabbitMQ servers themselves isn’t enough. I recommend an approach that combines the following:

  • Using the RabbitMQ management plugin (required)
  • Nagios checks on specific queues (optional)
  • Diamond RabbitMQ collector to send data to Graphite
  • Dashboard combining RabbitMQ installations statistics

Monitoring Edge Node Network Configuration

Over the last few months I’ve done a bit of work around monitoring, Open vSwitch, and XenServer. This post lists some of the networking/Open vSwitch specific items to monitor on hypervisors.

  • Link StatusNagios SNMP Interfaces plugin works well for reporting a failed link as well as reporting error rates and inbound/outbound bandwidth.
  • Open vSwitch Manager and Controller Status: Transport Node Status is a quick and dirty python script which can be used with extended SNMP to alert when OVS loses a connection to a manager/controller. Beware of an influx of false alarms after upgrading Open vSwitch.
  • Open vSwitch Kernel Modules:  OVS KMOD (XenServer specific) is another quick bash script which can be used to monitor potential OVS kernel mismatch issues detailed in Upgrading Open vSwitch.
  • SDN Integration processes: Nagios SNMP process check. With XenServer, the ovs-xapi-sync process must be running for proper integration between SDN controllers and ofport/vif objects on the hypervisor.

Are there other network-specific things you monitor for hypervisors running OpenStack? Leave ‘em in the comments.

Interested in Open vSwitch? Check the Open vSwitch category for a few more posts.
Interested in Monitoring? Check Managing Nagios Configurations.

On Failure

A couple of interesting research papers around failure, found in The Datacenter as a Computer.

Failure Trends in a Large Disk Drive Population (2007)

Out of all failed drives, over 56% of them have no count in any of the four strong SMART signals, namely scan errors, reallocation count, offline reallocation, and probational count. In other words, models based only on those signals can never predict more than half of the failed drives.

Temperature Management in Data Centers: Why Some (Might) Like It Hot (2012)

Based on our study of data spanning more than a dozen data centers at three different organizations, and covering a broad range of reliability issues, we find that the effect of high data center temperatures on system reliability are smaller than often assumed.

On Working Remote

Screen Shot 2014-04-26 at 18.58.06

In late March I relocated from San Antonio, TX to Lexington,
KY. Same awesome job just with a twist…REMOTE WORK!

I am mainly collaborating via IRC, tmux, 1:1/M:M TeamSpeak, and 1:1/M:M video conferencing.

My takeaways after the first month:

  • Obvious, but it took me a while: When you’re talking, LOOK AT THE CAMERA – multiple monitors and a MacBook make this a little awkward
  • Quality matters: appropriate microphones & video, particularly conference rooms, make a huge difference. Recommend Yeti for conference rooms and Samson Meteor for individuals
  • tmux is great but not great for everything… for long massive jobs redirect the output to a file and just have everyone tail -f the file
  • Google Hangouts Keyboard Shortcuts

 

Upgrading Open vSwitch

Operating Open vSwitch brings a new set of challenges.

One of those challenges is managing Open vSwitch itself and making sure you’re up to date with performance and stability fixes. For example, in late 2013 there were significant performance improvements with the release of 1.11 (flow wildcarding!) and in the 2.x series there are even more improvements coming.

This means everyone running those old versions of OVS (I’m looking at you, <=1.6) should upgrade and get these huge performance gains.

There are a few things to be aware of when upgrading OVS:

  1. Reloading the kernel module is a data plane impacting event. It’s minimal. Most won’t notice, and the ones that do only see a quick blip. The duration of the interruption is a function of the number of ports and number of flows before the upgrade.
  2. Along those lines, if you orchestrate OVS kernel module reloads with parallel-ssh or Ansible or really any other tool, be mindful of the connection timeouts. All traffic on the host will be momentarily dropped, including your SSH connection! Set your SSH timeouts appropriately or bad things happen!
  3. Pay very close attention to kernel upgrades and OVS kernel module upgrades. Failure to do so could mean your host networking does not survive a reboot!
  4. Some OVS related changes you’ve made to objects OVS manages outside of OVS/OVSdb, e.g., manual setup of tc buckets will be destroyed.
  5. If you use XenServer, by upgrading OVS beyond what’s delivered from Citrix directly, you’re likely unsupported.

Here is a rough outline of the OVS upgrade process for an individual hypervisor:

  • Obtain Open vSwitch packages
  • Install Open vSwitch userspace components, kernel module(s) (see #3 and “Where things can really go awry”)
  • Load new Open vSwitch kernel module (/etc/init.d/openvswitch force-kmod-reload)
  • Simplified Ansible Playbook: https://gist.github.com/andyhky/9983421

The INSTALL file provides more detailed upgrade instructions. In the old days, upgrading Open vSwitch meant you had to either reboot your host or rebuild all of your flows because of the kernel module reload. After the introduction of the kernel module reloads, the upgrade process is more durable and less impacting.

Where things can really go awry

If your OS has a new kernel pending, e.g., after a XenServer service pack, you will want to install the packages for both your running kernel module and the one which will be running after reboot. Failing to do so can result in losing connectivity to your machine.

hoserville

It is not a guaranteed loss of networking when the Open vSwitch kernel module doesn’t match the xen kernel module, but it is a best practice to ensure they are in lock-step. The cases I’ve seen happen are usually significant version changes, e.g., 1.6 -> 1.11.

You can check if you’re likely to have a problem by running this code (XenServer only, apologies for quick & dirty bash):

#!/usr/bin/env bash
RUNNING_XEN_KERNEL=`uname -r | sed s/xen//`
PENDING_XEN_KERNEL=`readlink /boot/vmlinuz-2.6-xen  | sed s/xen// | sed s/vmlinuz-//`
OVS_BUILD=`/etc/init.d/openvswitch version | grep ovs-vswitchd | awk '{print $NF}'`
rpm -q openvswitch-modules-xen-$RUNNING_XEN_KERNEL-$OVS_BUILD > /dev/null
if [[ $? == 0 ]]
then
    echo "Current kernel and OVS modules match"
else
    CURRENT_MISMATCH=1
    echo "Current kernel and OVS modules do not match"
fi

rpm -q openvswitch-modules-xen-$PENDING_XEN_KERNEL-$OVS_BUILD > /dev/null
if [[ $? == 0 ]]
then
    echo "Pending kernel and OVS modules match"
else
    PENDING_MISMATCH=1
    echo "Pending kernel and OVS will not match after reboot. This can cause system instability."
fi

if [[ $CURRENT_MISMATCH == 1 || $PENDING_MISMATCH == 1 ]]
then
    exit 1
fi

Luckily, this can be rolled back. Access the host via DRAC/iLO and roll back the vmlinuz-2.6-xen symlink in /boot to one that matches your installed openvswitch-modules RPM. I made a quick and dirty bash script which can roll back, but it won’t be too useful unless you put the script on the server beforehand. Here it is (again, XenServer only):

#!/usr/bin/env bash
# Not guaranteed to work. YMMV and all that.
OVS_KERNEL_MODULES=`rpm -qa 'openvswitch-modules-xen*' | sed s/openvswitch-modules-xen-// | cut -d "-" -f1,2;`
XEN_KERNELS=`find /boot -name "vmlinuz*xen" \! -type l -exec ls -ld {} + | awk '{print $NF}'  | cut -d "-" -f2,3 | sed s/xen//`
COMMON_KERNEL_VERSION=`echo $XEN_KERNELS $OVS_KERNEL_MODULES | tr " " "\n"  | sort | uniq -d`
stat /boot/vmlinuz-${COMMON_KERNEL_VERSION}xen > /dev/null
if [[ $? == 0 ]]
then
    rm /boot/vmlinuz-2.6-xen
    ln -s /boot/vmlinuz-${COMMON_KERNEL_VERSION}xen /boot/vmlinuz-2.6-xen
else
    echo "Unable to find kernel version to roll back to! :(:(:(:("
fi

StatsD and multiple metrics

download

Measure all the things! Graphite & statsd are my weapons of choice. One set of metrics in particular that we wanted to measure are the various TCP stats, including TCP Retransmit rate. We crafted a Python script to send all of the metrics in a single UDP packet and hit a weird scenario.

The python script was all ready to roll except that StatsD was only logging one metric.  All of the metric packets were arriving at the StatsD instance, but only one was being processed.

Turns out this wasn’t always built into StatsD. It was added in 0.4.0 and exists in later versions. Upgrading StatsD fixes this problem.

Deep Dive: OpenStack Retrieving Nova Instance Console URLs with XVP and XenAPI/XenServer

This post is a deep dive into what happens in Nova (and where in the code) when a console URL is retrieved via the nova API for a Nova configuration backed by XVP and XenServer/XenAPI.  Hopefully the methods used in Nova’s code will not change over time, and this guide will remain good starting point.

Example nova client call:

nova get-vnc-console [uuid] xvpvnc

And the call returns:

+--------+-------------------------------------------------------------------------------------------------------+
| Type   | Url                                                                                                   |
+--------+-------------------------------------------------------------------------------------------------------+
| xvpvnc | https://URL:PORT/console?token=TOKEN |
+--------+-------------------------------------------------------------------------------------------------------+

One thing I particularly enjoy about console URL call in Nova is that it is synchronous  and has to reach all the way down to the VM level. Most calls in Nova are asynchronus, so console is a wonderful test of your cloud’s plumbing. If the call takes over rpc_response/rpc_cast_timeout (60/30 sec respectively), a 500 will bubble up to the user.

It helps to understand how XenServer consoles work in general.

  • XVP is an open source project which serves as a proxy to hypervisor console sessions. Interestingly enough, XVP is no longer used in Nova. The underpinnings of Console were changed in vnc-console-cleanup but the code is still around (console/xvp.py).
  • A XenServer VM has a console attribute associated with it. Console is an object in XenAPI.

This Deep Dive has two major sections:

  1. Generation of the console URL
  2. Accessing the console URL

How is the console URL generated?

console_url

 1) nova-api receives and validates the console request, and then makes a request to the compute API.

  • api/openstack/compute/contrib/consoles.py
  • def get_vnc_console

2) The compute RPC API receives the request and does two things: (2a) calls compute RPC API to gather connection information and (2b) call the console authentication service.

  • compute/api.py
  • def get_vnc_console

2a) The compute RPC receives the call from (1).  An authentication token is generated. For XVP consoles, a URL is generated which has FLAGS.xvpvncproxy_base_url and the generated token. driver.get_vnc_console is called.

  • compute/manager.py
  • def get_vnc_console

2a1) driver is an abstraction to the configured virt library, xenapi in this case. This just calls vmops get_vnc_console. XenAPI information is retrieved about the instance. The local to the hypervisor Xen Console URL generated and returned.

  • virt/xenapi/driver.py
  • def get_vnc_console
  • virt/xenapi/vmops.py
  • def get_vnc_console

2b) Taking the details from 2a1, the consoleauth RPC api is called. The token generated in (2a1) is added to memcache with CONF.console_token_ttl.

  • consoleauth/manager.py
  • def authorize_console

What happens when the console URL is accessed?

console_access

1) The request reaches nova-xvpvncproxy and a call to validate the token is made on the Console Auth RPC API

  • vnc/xvp_proxy.py
  • def __call__

2) The token in the request is checked against the token from the previous section (2b). Compute’s RPC API is called to validate the console’s port against the token’s port.

  • consoleauth/manager.py
  • def check_token
  • def _validate_token
  • compute/manager.py
  • def validate_console_port

3) nova-xvpvnc proxies a connection to the console on the hypervisor.

  • vnc/xvp_proxy.py
  • def proxy_connection

The Host Network Stack

This post is a collection of useful articles/videos that I’ve collected about networking on XenServer and Linux.

XenServer

Linux

As you can see, there are a multitude of elements to consider when looking into host networking issues for a Linux VM running on XenServer (which is Linux underneath the covers anyway).

Managing Nagios Configurations

There’s a good talk given by  Gabe Westmaas at the HK OpenStack Summit:

The talk describes what Rackspace monitors in the public cloud OpenStack deployment, how responses are handled, and some of the integration points that are used.  I recommend watching it for OpenStack specific monitoring and a little context around this post.

In this post I am going to discuss how the sausage gets made – how the underlying Nagios configuration is managed.

Some background: We have 3 classes of Nagios servers.

  1. Global – monitors global control plane nodes (e.g., glance-api, nova-api, nova-cells, cell nagios)
  2. Cell – monitors cell control plane nodes, and individual clusters of data plane nodes (e.g., compute nodes/hypervisors)
  3. Mixed – smaller environments – these are a combined cell/global

With Puppet, the Nagios node’s class is based on hostname, then the Nagios install/config puppet module is applied.

The Nagios puppet setup is pretty simple. It performs basic installation and configuration of Nagios along with pulling in a git repository of Nagios config files. The puppet modules/manifests change rarely, but the Nagios configuration itself has to change relatively frequently.

Types of changes to the Nagios configuration:

  1. Systems Lifecycle – normal bulk add/remove of service/host definitions. These are generated with some automation, currently a combination of Ansible and Python scripts which reach into other inventory systems.
  2. Gap Filling – as a result of RCAs or other efforts, gaps in the current monitoring configuration are identified. After the gap is identified, we need to ensure it is fully remediated in all existing datacenters and all new spin ups.
  3. Comestics/Tweaking – we perform analytics on our monitoring to prioritize/identify opportunities to automate remediation and/or deep dive into root causes. We have a logster parser running on each Nagios node which sends what/when/where on alerts to StatsD/Graphite.  Toward the analytics effort, we sometimes make changes to give all services more machine readable names.  We also tune monitoring thresholds for services that are too chatty or not chatty enough.

Changes #2 and #3  were drivers to put Nagios configuration files into a single repository.  Without a single repository, the en masse changes were cumbersome and didn’t get made. The configuration repository is laid out like this:

  • Shared configurations are stored in a common folder, each of which has a corresponding subfolder for the Nagios node class.
  • Service/Host definitions are stored in folders relative to their environments
  • All datacenters/environments are stored within the environments folder

The entire repository is cloned onto the Nagios node, and parts of which are copied and/or symlinked into /etc/nagios3/conf.d/ based on the Nagios node class and the environment.

For example:

  • nagios01.c0001.test.com: nagios class is cell (c0001 in the hostname), environment is test/c0001
  • /etc/nagios3/conf.d/ gets cfg files from the common/cell folder in the config repo
  • environments/test/c0001 is symlinked to  /etc/nagios3/conf.d/c0001/

This setup has been working well for us in production. It’s enabling first responders and engineers to make more meaningful changes faster to the monitoring stack at Rackspace.

Determining with Enabled VLANs from SNMP with Python

Similar to this thread, I wanted to see what VLANs were allowed for a trunked port as reported by SNMP with Python.

With the help of a couple of colleagues, I made some progress.

>>> vlan_value = '000000000020000000000000000000000000200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
>>> for key,value in enumerate(format(int(vlan_value, 16), "0100b").rjust(len(vlan_value) * 4, '0')):
...     if value == '1':
...         print key
...
...
...
42
146
  • Convert the string returned to Hex
  • Convert that to Binary
  • Right fill 0s to the appropriate length to give offset (determined by the size of the string)
  • Loop through the resulting value and each character that is a 1 is an enabled VLAN on the port

In conjunction with LLDP, I’m able to query each switch/port and interface is connected to and determine if the VLANs are set properly on the port.

Personal Backups with Duply


backupstart

A month or two ago I finally went through all the old hard drives I’ve accumulated over the past decade. I mounted each of the disks and moved a bunch of files onto my desktop’s drive. There were lots of photos from the drives that I don’t want to lose so I decided to get a little more serious about backups.

I decided to give Duply a go. Duply is a wrapper for duplicity, which underneath it all uses the tried and trusted rsync.

  • Multiple Locations - I have duply configured to send various data to a USB Drive, Swift (Rackspace Cloud Files), and Another Server. These are easily configured with the .duply/backup scheme. 
  • Encrypted - duply works with GPG encryption
  • Customizable - duply has pre/post hooks which I leverage for notifications on backup success/failures 
  • Efficient - duply is capable of doing incremental backups and using compression

I’ve been really happy with testing restores with duply as well.

An example process that I have setup is as follows:

On my desktop system’s power resume, run an incremental backup to Swift. Notify on start and finish of the backups.

It required a little bit of Python and BASH to accomplish this but I’m happy with the end result. The scripts I used are published to Github under andyhky/duply-scripts. Getting started/installation are in the README.

backupcomplete

 

Network wiring with XenServer and Open vSwitch

In the physical world when you power on a server it’s already cabled (hopefully).

With VMs things are a bit different. Here’s the sequence of events when a VM is started in Nova and what happens on XenServer to wire it up with Open vSwitch.

VM_start

  1. nova-compute starts the VM via XenAPI
  2. XenAPI VM.start creates a domain and creates the VM’s vifs on the hypervisor
  3. The Linux user device manager manages receives this event, and scripts within /etc/udev/rules.d are fired in lexical order
  4. Xen’s vif plug script is fired, which at a minimum creates a port on the relevant virtual switch
    • Newer versions (XS 6.1+) of this plug script also have a setup-vif-rules script which creates several entries in the OpenFlow table (just grabbed from the code comments):
      • Allow DHCP traffic (outgoing UDP on port 67)
      • Filter ARP requests
      • Filter ARP responses
      • Allow traffic from specified ipv4 addresses
      • Neighbour solicitation
      • Neighbour advertisement
      • Allow traffic from specified ipv6 addresses
      • Drop all other neighbour discovery
      • Drop other specific ICMPv6 types
      • Router advertisement
      • Redirect gateway
      • Mobile prefix solicitation
      • Mobile prefix advertisement
      • Multicast router advertisement
      • Multicast router solicitation
      • Multicast router termination
      • Drop everything else
  5. Creation of the port on the virtual switch also adds entries into OVSDB, the database which backs Open vSwitch.
  6. ovs-xapi-sync, which starts on XenAPI/Open vSwitch startup has a local copy of the system’s state in memory. It checks for changes in Bridge/Interface tables, and pulls in XenServer specific data to other columns in those tables.
  7. On many events within OVSDB, including create/update of tables touched in these OVSDB operations, the OVS controller is notified via JSON RPC. Thanks Scott Lowe for clarification on this part.

After all of that happens, the VM boots the guest OS sets up its network stack.

Measuring Virtual Networking Overhead

After discussing [ovs-discuss] ovs performance on ‘worst case scenario’ with ovs-vswitchd up to 100%.  One of my colleagues had a good idea: tcpdump the physical interface and the vif at the same time. The difference between when the packet reaches the vif and the packet reaches the physical device can help measure the amount of time in a userspace->kernelspace transit. Of course, virtual switches aren’t the only culprit in virtual networking overhead- virtual networking is a very complex topic.

I created a new tool to help measure this overhead for certain traffic patterns: netweaver. There’s lots of info in the README, so head on by!

NetWeaver does the following:

  • Retrieve the vif details from the hypervisor
  • Start a traffic generating command on source instance(s)
  • Gather packet capture from destination instance’s hypervisor
  • Analyze the packet captures from the vif and eth devices
  • Perform some basic statistical analysis (average, max, min, stdev) on the result set

I intend on using this for analyzing various configurations with Xen, guest OSes, and Open vSwitch.

Deep Dive: HTB Rate Limiting (QoS) with Open vSwitch and XenServer

DISCLAIMER: I’m still getting my feet wet with Open vSwitch. This post is just a cleaned up version of my scratchpad.

Open vSwitch has a few ways of providing rate limiting – this deep dive will go into the internals of reverse engineering an existing virtual interface’s egress rate limits applied with tc-htb. Hierarchy Token Bucket (htb) is a standard linux packet scheduling implementation. More reading on HTB can be done on the author’s site – I found the implementation and theory pretty interesting.

This is current as of Open vSwitch 1.9.

The information needed to retrieve htb rate limits mostly lives in the ovsdb:

vswitch-schema
Open vSwitch Schema (http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf)

Things can get complex depending on how your vifs plug into your physical interfaces. In my case, OpenStack Quantum requires an integration bridge which I’ve attempted to diagram:

OpenvSwitchQueues

  1. On instance boot, vifs are plugged into xapi0. xapi0’s controller nodes pull down information including flows and logical queues.
  2. The flows pulled from (1) set the destination queue on all traffic for the source IP address for the interface.
  3. The queue which the traffic gets sent to goes to a linux-htb ring where the packets are scheduled.

Let’s take a look at an example. I want to retrieve the rate limit according to the hypervisor for vif2.1 which connects to xapi0, xenbr1, and the physical interface eth1. The IP address is 10.0.0.37.

Steps:

  • Find the QoS used by the physical interface:
    # ovs-vsctl find Port name=eth1 | grep qos
    qos : 678567ed-9f71-432b-99a2-2f28efced79c

  • Determine which queue is being used for your virtual interface. The value after set_queue is our queue_id.
    # ovs-ofctl dump-flows xapi0 | grep 10.0.0.37 | grep "set_queue"
    ... ,nw_src=10.0.0.37 actions=set_queue:13947, ...
  • List the QoS from the first step and its type. NOTE: This command outputs every single OpenFlow queue_id/OVS Queue UUID for the physical interface. The queue_id from the previous step will be the key we’re interested in and the value is our Queue’s UUID
    # ovs-vsctl list Qos 678567ed-9f71-432b-99a2-2f28efced79c | egrep 'queues|type'
    queues : { ... 13947=787b609b-417c-459f-b9df-9fb5b362e815,... }
    type : linux-htb
  • Use the Queue UUID from the previous step to list the Queue:
    # ovs-vsctl list Queue 787b609b-417c-459f-b9df-9fb5b362e815 | grep other_config
    other_config : {... max-rate="614400000" ...}
  • In order to tie it back to tc-htb we have to convert the OpenFlow queue_id+1 to hexadecimal (367c). I think it’s happening here in the OVS code, but I’d love to have a definitive answer.
    # tc -s -d class show dev eth1 | grep 367c | grep ceil # Queue ID + 1 in Hex
    class htb 1:367c ... ceil 614400Kbit

Using Swift and logrotate

Ever have an exchange like this?

Q: What happened <insert very very long time ago> on this service?
A: We can’t keep logs on the server past 2 months.  Those logs are gone.

Just about every IaaS out there has an object store. Amazon offers S3 and OpenStack providers have Swift. Why not just point logrotate at one of those object stores?

That’s just what I’ve done with Swiftrotate. It’s a simple shell script to use with logrotate. Config samples and more are in the project’s README.

NOTE: It doesn’t make a lot of sense to use without using dateext in logrotate. A lot of setups don’t use dateext, so there’s a utility script to rename all of your files to a dateext format.

Home Lab setup

Hardware:

  • Dell XPS 8500
  • Intel i5
  • RAM + SSD upgrades from Crucial
  • Local Storage (1T)

Software:

  • Fedora Core 18 (base OS)
  • VirtualBox
  • Vagrant
  • DevStack
  • XenServer 6

I’m setting up a home lab to do some light coding on OpenStack and for testing implementations of next generation software/hardware deployment tools like BOSH and Razor.

the grep is a lie

grep is a wonderful tool for digging through logs on specific issues, but there are a few cases when people misuse it and claim the logs don’t have the answers when grep didn’t yield an answer.

Here’s an example of Rails application logging from Ruby on Rails Guides:

Processing PostsController#create (for 127.0.0.1 at 2008-09-08 11:52:54) [POST]
  Session ID: BAh7BzoMY3NyZl9pZCIlMDY5MWU1M2I1ZDRjODBlMzkyMWI1OTg2NWQyNzViZjYiCmZsYXNoSUM6J0FjdGl
vbkNvbnRyb2xsZXI6OkZsYXNoOjpGbGFzaEhhc2h7AAY6CkB1c2VkewA=--b18cd92fba90eacf8137e5f6b3b06c4d724596a4
  Parameters: {"commit"=>"Create", "post"=>{"title"=>"Debugging Rails",
 "body"=>"I'm learning how to print in logs!!!", "published"=>"0"},
 "authenticity_token"=>"2059c1286e93402e389127b1153204e0d1e275dd", "action"=>"create", "controller"=>"posts"}
New post: {"updated_at"=>nil, "title"=>"Debugging Rails", "body"=>"I'm learning how to print in logs!!!",
 "published"=>false, "created_at"=>nil}
Post should be valid: true
  Post Create (0.000443)   INSERT INTO "posts" ("updated_at", "title", "body", "published",
 "created_at") VALUES('2008-09-08 14:52:54', 'Debugging Rails',
 'I''m learning how to print in logs!!!', 'f', '2008-09-08 14:52:54')
The post was saved and now the user is going to be redirected...
Redirected to #
Completed in 0.01224 (81 reqs/sec) | DB: 0.00044 (3%) | 302 Found [http://localhost/posts]

Grepping for “learning” will give us just a peek but there’s much more information to be found in the full request.

# grep learning applog.log

 "body"=>"I'm learning how to print in logs!!!", "published"=>"0"},
New post: {"updated_at"=>nil, "title"=>"Debugging Rails", "body"=>"I'm learning how to print in logs!!!",
 'I''m learning how to print in logs!!!', 'f', '2008-09-08 14:52:54')

If you know the exact format of your applications log messages, you can use output context flags within grep (-A -B and -C). However, a lot of the time the exact number of context lines needed is unknown or a particular stack trace could have a varying length.

Rails applications aren’t the only ones – the logging module within Nova also falls to this same issue. Common Log Format seems to get around the problem, but many modern applications or ones in debug mode have multiline/transaction-ID logging which make sole reliance on grep a bad decision.

My preferred technique: Use grep to determine which log file to open in less. Then, use the pattern search within less that I grepped and take a look at the clues provided in the context. Sometimes it’s as simple as two lines later you’ll see a SIGTERM, but you wouldn’t have grepped for SIGTERM.

Another tip with less and pattern matching: if you have a large file and you know relative to the file, your search string is toward the bottom hit G to move less to the bottom of the file, then do your /pattern search, but then press N to find the previous result.

One last thing: if you haven’t given zless/zgrep a try on compressed files, they’re worth their weight in gold.

Deep Dive: OpenStack Nova Snapshot Image Creation with XenAPI/XenServer and Glance

Based on currently available code (nova: a77c0c50166aac04f0707af25946557fbd43ad44 2012-11-02/python-glanceclient: 16aafa728e4b8309b16bcc120b10bc20372883f4 2012-11-07/glance: 9dae32d60fc285d03fdb5586e3368d229485fdb4)

This is a deep dive into what happens (and where in the code) during image creation with a Nova/Glance configuration that is backed by XenServer/XenAPI.  Hopefully the methods used in Glance/Nova’s code will not change over time, and this guide will remain good starting point.

Disclaimer: I am _not_ a developer, and these are really just best guesses. Corrections are welcome.

 

1) nova-api receives an imaging request. The request is validated, checking for a name and making sure the request is within quotas. Instance data is retrieved, as well as block device mappings. If the instance is volume backed, a separate compute API call is made to snapshot (self.compute_api.snapshot_volume_backed). For this deep dive, we’ll assume there is no block device mapping. self.compute_api.snapshot is called. The newly created image UUID is returned.

  • nova/api/openstack/compute/servers.py
  • def _action_create_image

2) The compute API gets the request and calls _create_image.  The instance’s task state is set to IMAGE_SNAPSHOT. Notifications are created of the state change. Several properties are collected about the image, including the minimum RAM, customer, and base image ref.The non inheritable instance_system meta data is also collected. (2a, 2b, 2c) self.image_service.create and (3) self.compute_rpcapi.snapshot_instance are called.

  • nova/compute/api.py
  • def snapshot
  • def _create_image

2a) The collected metadata from 2 is put into a glance-friendly format, and sent to glance. The glance client’s create is called.

  • nova/image/glance.py
  • def create

2b) Glance (client) sends a POST the glance server to /v1/images with the gathered image metadata from (3).

  • glanceclient/v1/images.py
  • def create

2c) Glance (server) receives the POST. Per the code comments:

Upon a successful save of the image data and metadata, a response
containing metadata about the image is returned, including its
opaque identifier.

  • glance/api/v1
  • def create
  • def _handle_source

3) Compute RPC API casts a message to the queue for the instance’s compute node.

  • nova/compute/rpcapi.py
  • def snapshot_instance

4) The instance’s power state is read and updated. (4a) The XenAPI driver’s snapshot() is called. Notification is created for the snapshot’s start and end.

  • nova/compute/manager.py
  • def snapshot_instance

4a) The vmops snapshot is called (4a1).

  • nova/virt/xenapi/driver.py
  • def snapshot

4a1) The snapshot is created in XenServer via (4a1i) vm_utils, and (4a1ii) uploaded to glance. The code’s comments say this:

Steps involved in a XenServer snapshot:

1. XAPI-Snapshot: Snapshotting the instance using XenAPI. This
creates: Snapshot (Template) VM, Snapshot VBD, Snapshot VDI,
Snapshot VHD
2. Wait-for-coalesce: The Snapshot VDI and Instance VDI both point to
a ‘base-copy’ VDI. The base_copy is immutable and may be chained
with other base_copies. If chained, the base_copies
coalesce together, so, we must wait for this coalescing to occur to
get a stable representation of the data on disk.
3. Push-to-glance: Once coalesced, we call a plugin on the XenServer
that will bundle the VHDs together and then push the bundle into
Glance.

  • nova/virt/xenapi/vmops.py
  • def snapshot

4a1i) The instance’s root disk is recorded and its VHD parent is also recorded. The SR is recorded. The instance’s root VDI is snapshotted. Operations are blocked until a coalesce completes in _wait_for_vhd_coalesce (4a1i-1).

  • nova/virt/xenapi/vm_utils.py
  • def snapshot_attached_here

4a1i-1) The end result of this process is outlined in the code comments:

Before coalesce:

* original_parent_vhd
    * parent_vhd
        snapshot

After coalesce:

* parent_vhd
    snapshot

In (4a1i) the original vdi uuid was recorded. The SR is scanned. In a nutshell, the code is ensuring that the desired layout above is met before allowing the snapshot to continue. The code polls CONF.xenapi_vhd_coalesce_max_attempts times and sleeps CONF.xenapi_vhd_coalesce_poll_interval: the SR is scanned. The original_parent_uuid is compared to the parent_uuid… if they don’t match we wait a while and check again for the coalescing to complete.

  • nova/virt/xenapi/vm_utils.py
  • def _wait_for_vhd_coalesce

4a1ii) The glance API servers are retrieved from configuration. The glance upload_vhd XenAPI plugin is called.

  • nova/virt/xenapi/vm_utils.py
  • def upload_image

4a2) A staging area is created, prepared, and _upload_tarball is called.

  • plugins/xenserver/xenapi/etc/xapi.d/plugins/glance
  • def upload_vhd

4a3) The staging area is prepared. This basically symlinks the snapshot VHDs to a temporary folder in the SR.

  • plugins/xenserver/xenapi/etc/xapi.d/plugins/utils.py
  • def prepare_staging_area

4a4) The comments say it best:

Create a tarball of the image and then stream that into Glance
using chunked-transfer-encoded HTTP.

A URL is constructed and a connection is opened to it. The image meta properties (like status) are collected and added as HTTP headers. The tarball is created, and streamed to glance in CHUNK_SIZE increments.  The HTTP stream is terminated, the connection checks for an OK from glance and reports accordingly.

  • plugins/xenserver/xenapi/etc/xapi.d/plugins/glance
  • def _upload_tarball

(Glance Server)

5) I’ve removed some of the obvious URL routing functions in glance to get down to the meat of this process. Basically, the PUT request goes to glance API.  The API interacts with the registry again, but this time there is data to be uploaded.  The image’s metadata is validated for activation, and then _upload_and_activate is called. _upload_and_activate is basically a way to call _upload and ensure that if it works, activate the image.  _upload checks to see if we’re copying, but we’re not. It also checks to see if the HTTP request is application/octet-stream. Then, an object store like swift is inferred from the request or used from the glance configuration (self.get_store_or_400). Finally, the image is added to the object store and its checksum is verified and the glance registry is updated. Notifications are also sent for image.upload.

  • glance/api/v1/images.py
  • def update
  • def _handle_source
  • def _upload_and_activate
  • def _upload

Deep Dive: Openstack Nova Rescue Mode with XenAPI / XenServer

Based on currently available code (a77c0c50166aac04f0707af25946557fbd43ad44 2012-11-02)

This is a deep dive into what happens (and where in the code) during a rescue/unrescue scenario with a Nova configuration that is backed by XenServer/XenAPI.  Hopefully the methods used in Nova’s code will not change over time, and this guide will remain good starting point.

Rescue

1) nova-api receives a rescue request. A new admin password is generated via utils.generate_password meeting FLAGS.password_length length requirement. The API calls rescue on the compute api.

  • nova/api/openstack/compute/contrib/rescue.py
  • def _rescue

2) The compute API updates the vm_state to RESCUING, and calls the compute rpcapi rescue_instance with the same details.

  • nova/compute/api.py
  • def rescue

3) The RPC API casts a rescue_instance message to the compute node’s message queue.

  • nova/compute/rpcapi.py
  • def rescue_instance

4) nova-compute consumes the message in the queue containing the rescue request. The admin password is retrieved, if one was not passed this far one will be generated via utils.generate_password with the same flags as step 1. It then records details about the source instance, like networking and image details. The compute driver rescue function is called. After that (4a-4c) completes, the instance’s vm_state is updated to rescued.

  • nova/compute/manager.py
  • def rescue_instance

4a) This abstraction was skipped over in the last two deepdives, but for the sake of completeness: Driver.rescue is called. This just calls _vmops.rescue, where the real work happens.

  • nova/virt/xenapi/driver.py
  • def rescue

4b) Checks are performed to ensure the instance isn’t in rescue mode already. The original instance is shutdown via XenAPI. The original instance is bootlocked. A new instance is spawned with -rescue in the name-label.

  • nova/virt/xenapi/vmops.py
  • def rescue

4c) A new VM is created just as all other VMs, with the source VM’s metadata. The root volume from the instance we are rescuing is attached as a secondary disk. The instance’s networking is the same, however the new hostname is RESCUE-hostname.

  • nova/virt/xenapi/vmops.py
  • def spawn -> attach_disks_step rescue condition

Unrescue

1) nova-api receives an unrescue request.

  • nova/api/openstack/compute/contrib/rescue.py
  • def _unrescue

2) The compute API updates the vm_state to UNRESCUING, and calls the compute rpcapi unrescue_instance with the same details.

  • nova/compute/api.py
  • def unrescue

3) The RPC API casts an unrescue_instance message to the compute node’s message queue.

  • nova/compute/rpcapi.py
  • def unrescue_instance

4) The compute manager receives the unrescue_instance message and calls the driver’s rescue method.

  • nova/compute/manager.py
  • def unrescue_instance

4a)  Driver.unrescue is called. This just calls _vmops.unrescue, where the real work happens.

  • nova/virt/xenapi/driver.py
  • def unrescue

4b) The rescue VM is found. Checks are done to ensure the VM is in rescue mode. The original VM is found. The rescue instance has _destroy_rescue_instance performed (4b1). After that completes, the source VM’s bootlock is released and the VM is started.

  • nova/virt/xenapi/vmops.py
  • def unrescue

4b1) A hard shutdown is issued on the rescue instance. Via XenAPI, the root disk of the original instance is found. All VDIs attached  to the rescue instance are destroyed omitting the root of the original instance. The rescue VM is destroyed.

  • nova/virt/xenapi/vmops.py
  • def _destroy_rescue_instance

 

Deep dive: OpenStack Nova Resize Down with XenAPI/Xenserver

Based on the currently available code (commit 114109dbf4094ae6b6333d41c84bebf6f85c4e48 – 2012-09-13)

This is a deep dive into what happens (and where in the code) during a resize down  (e.g., flavor 4 to flavor 2) with a Nova configuration that is backed by XenServer/XenAPI.  Hopefully the methods used in Nova’s code will not change over time, and this guide will remain good starting point.

Steps 1-6a are identical to my previous entry “Deep dive: OpenStack Nova Resize Up with XenAPI/Xenserver“. This deep dive will examine the divergence between resize up and resize down in Nova, as there are a few key differences.

6b) The instance resize progress gets an update. The VM is shutdown via XenAPI.

  • ./nova/virt/xenapi/vmops.py
  • def _migrate_disk_resizing_down

6c) The source VDI is copied on the hypervisor via XenAPI VDI.copy.  Then, a different, new VDI is along with a VBD that it is plugged into the compute node.  The partition and filesystem of the new disk are resized via _resize_part_and_fs, using e2fsck, tune2fs,  parted, and tune2fs. The source VDI copy is also attached to nova-compute. Via _sparse_copy, which is configurable but by default true, nova-compute temporarily takes ownership of both devices (source read, dest write) and performs a block level copy, omitting zeroed blocks.

  • ./nova/virt/xenapi/vm_utils.py
  • def _resize_part_and_fs
  • def _sparse_copy
  • nova/utils.py
  • def temporary_chown

6d) Progress is again updated. The devices that were attached are unplugged, and the VHDs are copied in the same fashion as outlined in steps 6a1i-6b2 from the deep dive on resizing up are used, aside from 6b2.

Deep Dive: OpenStack Nova Resize Up with XenAPI/Xenserver

Nova is the Compute engine of the OpenStack project.

Based on the currently available code (commit 114109dbf4094ae6b6333d41c84bebf6f85c4e48 – 2012-09-13)

This is a deep dive into what happens (and where in the code) during a resize up  (e.g., flavor 2 to flavor 4) with a Nova configuration that is backed by XenServer/XenAPI.  Hopefully the methods used in Nova’s code will not change over time, and this guide will remain good starting point.

Some abstractions such as go-between RPC calls and basic XenAPI calls have been deliberately ignored.

Disclaimer: I am _not_ a developer, and this is just my best guess through an overly-caffeinated code dive. Corrections are welcome.

1) API receives a resize request.

2) Request Validations performed.

3) Quota verifications are performed.

  • ./nova/compute/api.py
  • def resize

4) Scheduler is notified to prepare the resize request. A target is selected for the resize and notified.

  • ./nova/scheduler/filter_scheduler.py
  • def schedule_prep_resize

5) Usage notifications are sent as the resize is preparing. A migration entry is created in the nova database with the status pre-migrating.  resize_instance (6) is fired.

  • ./nova/compute/manager.py
  • def prep_resize

6) The migration record is updated to migrating. The instance’s task_state is updated from resize preparation to resize migrating. Usages receive notification that the resize has started. The instance is renamed on the source to have a suffix of -orig. (6a) migrate_disk_and_power_off is invoked. The migration record  is updated to post migrating. The instance’s task_state is updated to resize migrated. (6b) Finish resize is called.

  • ./nova/compute/manager.py
  • def resize_instance
  • def migrate_disk_and_power_off
  • ./nova/virt/xenapi/vmops.py

6a) migrate_disk_and_power_off, where the work begins… Progress is zeroed out. A resize up or down is detected. We’re taking the resize up code path in 6a1.

  • ./nova/virt/xenapi/driver.py -> ./nova/virt/xenapi/vmops.py
  • def migrate_disk_and_power_off

6a1) Snapshot the instance. Via migrate_vhd, transfer the immutable VHDs, this is the base copy or the parent VHD belonging to the instance. Instance resize progress updated. Power down the instance. Again, via migrate_vhd (steps 6a1i-6a1v), transfer the COW VHD, or the changes which have occurred since the snapshot was taken.

  • ./nova/virt/xenapi/vmops.py
  • def _migrate_disk_resizing_up

6a1i) Call the XenAPI plugin on the hypervisor to transfer_vhd

  • ./nova/virt/xenapi/vmops.py
  • def _migrate_vhd

6a1ii) Make a staging on the source server to prepare the VHD transfer.

  • ./plugins/xenserver/xenapi/etc/xapi.d/plugins/migration
  • def transfer_vhd

6a1iii) Hard link the VHD(s) transferred to the staging area.

  • ./plugins/xenserver/xenapi/etc/xapi.d/plugins/utils.py
  • def prepare_staging_area

6a1iv) rsync the VHDs to the destination. The destination path is /images/instance-instance_uuid.

  • ./plugins/xenserver/xenapi/etc/xapi.d/plugins/migration
  • def _rsync_vhds

6a1v) Clean up the staging area which was created in 6d.

  • ./plugins/xenserver/xenapi/etc/xapi.d/plugins/utils.py
  • def cleanup_staging_area

6b) (6b1) Set up the newly transferred disk and turn on the instance on the destination host. Make the tenant’s quota usage reflect the newly resized instance.

  • ./nova/compute/manager.py
  • def finish_resize

6b1) The instance record is updated with the new instance_type (ram, CPU, disk, etc). Networking is set up on the destination (another day- best guess: nova-network, quantum, and the configured quantum plugin(s)  are notified of the networking changes). The instance’s task_state is set to resize finish. Usages are notified of the beginning of the end of the resize process. (6b1i) Finish migration is invoked. The instance record is updated to resized. The migration record is set to finished. Usages are notified that the resize has completed.

  • ./nova/compute/manager.py
  • def _finish_resize

6b1i) (6b1ii) move_disks is called. (6b2) _resize_instance is called. The destination VM is created and started via XenAPI. The resize’s progress is updated.

  • ./nova/virt/xenapi/driver.py -> ./nova/virt/xenapi/vmops.py
  •  def finish_migration

6b1ii) (6b1iii) The XenAPI plugin is called to move_vhds_into_sr. The SR is scanned. The Root VDI’s name-label and name-description are set to reflect the instance’s details.

  • ./nova/virt/xenapi/vm_utils.py
  • def move_disks

6b1iii) Remember the VHD destination from step 6a1iv? I thought so! =) (6b1iv) Call import_vhds on the VHDs in that destination. Cleanup this staging area, just like 6a1v.

  • ./plugins/xenserver/xenapi/etc/xapi.d/plugins/migration
  • def move_vhds_into_sr

6b1iv) Move the VHDs from the staging area to the SR. Staging areas are used because if the SR is unaware of VHDs contained within, it could delete our data. This function assembles our VHD chain’s order, and assigns UUIDs to them. Then they are renamed from #.vhd to UUID.vhd. The VHDs are then linked to each other, starting from parent and moving down the tree. This is done via calls to vhd-util modify. The VDI chain is validated. Swap files are given a similar treatment, and appended to the list of files to move into the SR. All of these files are renamed (python os.rename) from the staging area into the host’s SR.

  • ./plugins/xenserver/xenapi/etc/xapi.d/plugins/utils.py
  • def import_vhds

6b2) The instance’s root disk is resized. The current size of the root disk is retrieved from XenAPI (virtual_size).  XenAPI VDI.resize_online ( XenServer > 5 ) is called to resize the disk to its new size as defined by the instance_type.

UPDATE: (2012-10-28) Added to step 6 the renaming to the instance-uuid-orig on the source.

Cloud and Virtual Infrastructure

Just some observations over time about the difference between applications in the Cloud and apps which run on Virtual Infrastructure. There seems to be some understanding gaps between the two.

Cloud Apps:

  • Load Balancers a must
  • Segmented application data a must via persistent object store for pure storage and CDN for distribution
  • Green field data
  • Agile development cycle
  • Continuous/Push Button Deployment with small incremental changes

Virtual Infrastructure Apps:

  • Waterfall development lifecycle (Off the shelf or behemoth internal app)
  • Maintenance windows with long downtime
  • Move to Virt Consolidation/savings driven
  • Brown field data
  • Large changes in releases, resulting in My-Mom-Cleaned-My-Room syndrome also found in Cloud/SaaS

Both:

  • 100% Uptime expected
  • Unknown architecture is First Blamed when there is a performance / availability issue. Most TTD is spent on ruling out the unknown  components when the problem’s usually within the application.

There are breakdowns when organizations attempt to run Cloud Apps on VI,  or worse…when organizations attempt to run VI apps on Cloud without the proper considerations.

Getting started with Vagrant

What is Vagrant?

From the Vagrant website:

Vagrant is a tool for building and distributing virtualized development environments.

Why would a sysadmin care about building a development environment? Vagrant is a good way to get started with Puppet/Chef without a ton of overhead in setting up a server.

How does using Vagrant mean I can do more with less?

Imagine being able to provide developers with a virtual environment to do testing on their first day.  Changes could be deployed to the virtual environment via source code pulls. Large open source organizations like Mozilla are already doing this.

Your Workstation

You can do most of this work with Windows but most of the literature available on references work that was done on a Linux-based kernel. Since we are diving into Vagrant, which leverages virtualization, this must be your native environment.

  • Mac OS X
  • Ubuntu 10.04 or later

Often times workstation setup is what turns admins away from use of a lot of these new tools. Be patient, ask questions, and be patient. =)

  • Install RVM – not a requirement for RVM, but it will make your life easier in the long run.
  • After RVM is set up:
    rvm install 1.9.2
  • Download and Install VirtualBox.
  • Now install Vagrant:
    gem install vagrant

Here are a couple of cool projects you can get going with now that you’re setup with Vagrant:

Next Generation Knowledge Resources

In 2009 on version 1 of this site I made a post entitled “VMware Knowledge Resources” outlining some of the strategies I used to find knowledge for becoming a successful VMware Administrator.

It’s only appropriate to roll a similar post for version 2. DevOps and Cloud Computing can be very overwhelming for the uninitiated. Where do you go for help? How do you get help? How do you even begin to phrase questions appropriately?

  1. DevOps Weekly – excellent weekly newsletter which culls through a lot of the social media noise and posts quality articles and tools.
  2. Rackspace CloudU – conceptual overview of terms like IaaS, PaaS, and how they fit together in the cloud.
  3. Chef Guides / Puppet Documentation – configuration management is a big topic in the DevOps movement- Chef and Puppet are the leaders in this space.
  4. Social Media and GitHub – social coding
  5. Rails Tutorial / Ruby Koans – next generation tools are written with next generation languages. CloudFoundry, Chef, Puppet are written in Ruby- if you are not familiar with Ruby (or Python) it is well worth your time to dive into a next generation language.

New Direction

About a month ago I moved to Texas to take on a new exciting opportunity and work on cloud computing with Rackspace. While I will no longer be working on VMware products as a part of my day to day, the virtualization concepts are largely the same. With such a large personal and professional change, a rebooting of Virtual Andy is necessary.

Virtual Andy’s new direction can be summarized with a paraphrased interview with Paul Maritz:

Asked guys how many system admins per server Google used, came back to 1 to 1000…VMware ratio was 1:20 or 1:50…have to get to point where enterprises, regardless of where they are consuming IT, have to do it at similar level of efficiency to free up operations spend… If they can’t free up resources they won’t be able to address the legacy code bases

So what’s all of this mean for the site? 

It means exploring new technologies and new tools which will enable sysadmins to do more with less. It means helping sysadmins pitch these tools internally to benefit the business. It means arming sysadmins with the tools to move their organizations to the next generation of technology.

What’s your VM chargeback model?

Over the past few years our organization has been tried to adopt a pricing model for providing virtual machines. We are a small college with several semi-independent units to support. We need to be able to quickly quote, chargeback, and fulfill our incoming requests.

This is a very big update to the post I made in December 2009 entitled VMware Expert System with PowerCLI and Excel. Problem 1: Solved. Problem 2: In progress…

The VMware Expert System has changed significantly. Here’s a summary of the changes:

  • Added deployment via PowerCLI and the OSCustomizationSpec/OSCustomizationNicMapping in PowerCLI 4.0 Update 1
  • Completely rewritten its capacity gathering functionality using SQL queries instead of PowerCLI
  • Added chargeback functionality
  • Made a pretty invoice system

It’s not quite ready for posting… but here’s a sneak preview:

1 – Entering in the systems basic requirements (similar to the expert system screenshots) Users can only select guest OSes with a corresponding template that has been created and maintained by our team.

2 – Viewing the quote/invoice. After this step, the document printed as a PDF and sent to the requesting department.

3 – After departmental approval, the deployment details are opened and more specifics are added

It is important to note that this sheet is pretty flexible. It supports up to 4 vNICs, and 60 disks across any datastore that is available to vCenter. Capacity for each disk is checked before allowing a deployment. The templates we use only have one vNIC and virtual hard disk, so the additional required ones are added after deployment. DEPLOY VIRTUAL MACHINE fires off PowerCLI script with the appropriate parameters to make the deployment happen, validates them, and deploys the VM.

4- Cost data. We had to decide how many vCPUs we would be able to get per CPU.  After that, it became a cost accounting exercise. Another important note: we built this to be the full capacity of the two UCS chassis purchased, even though our initial outlay was 8 B200s spread out over two chassis.   We broke our costs down per GB RAM, GB SAN storage, vCPU, and applied fixed VMware Licensing/support and UCS support costs.

I plan to release the spreadsheet, its underlying VBA source code and the PowerCLI in the coming weeks. I’m particularly interested in other chargeback models for similarly sized and positioned organizations.

Windows 2000 P2V in vCenter Converter 4: No more BSoD

I’ve run into the issue lately where P2Ving a Windows 2000 machine will result in a blue screen when the new virtual machine boots. A google search shows many others with this problem.  The root cause of the issue is related to Update 1 of Service Pack 4.

Luckily, with vCenter Converter, not VMware Converter Standalone, VMware delivers a nice fix for this issue.

Follow normal steps in vCenter Converter – import a machine, setup your source, and as the Converter Agent is installed you’ll be prompted for a location to the VMware SCSI Disk Driver (scsiport.sys, not scsiport.flp) which is accessible by vCenter Server. After that, the P2v works and your Windows 2000 machine will not blue screen.

Higher Ed: Virtualizing Banner Unified Digital Campus with VMware

Banner Unified Digital Campus (Banner UDC) is the most used administrative suite of applications specific to higher education. A specially crafted Google search shows almost 30,000 different universities running Banner UDC!

If you are wondering if Banner UDC can be virtualized the answer is YES. VMware’s website has very little information about virtualizing Banner UDC… there’ just info about virtualizing Banner Xtender, the integrated document imaging solution. Many sysadmins will have difficulty finding a definitive answer to the “Are we supported on VMware?”  question.

Get an account with the Sungard Commons and Sungard Connect and see for yourself. You’ll find Sungard has this to say in FAQ 1-44BOB8:

Banner UDC clients can realize significant cost benefits through the use of virtualization technologies.  Virtualization allows for server consolidation, with potential savings in hardware acquisition and maintenance, operating costs, and administrative time. Virtualization can also play an important role in High Availability and Disaster Recovery strategies. Banner UDC applications are supported with two software solutions, Sun Solaris Containers and VMware ESX Server. Also supported are hardware partitions offered by Solaris Logical Domains (LDoms), HP-UX Virtual Partitions (vPars),  and AIX Logical Partitions (LPARs).  Virtualization is a very dynamic field and additional solutions will be supported in accordance with market demand.

The SunGard Higher Education Support Center will not require clients to replicate every Banner UDC issue in a native supported environment. If we have reason to believe that virtualization is part of the problem, we may ask clients to independently verify that the problem exists in a native environment.  We will also work with virtualization vendors to jointly diagnose and resolve issues. The Support Center is not able to accept virtual images from clients to use as debugging aids. Such images may contain software for which SunGard Higher Education is not licensed.

The note specifically says the Banner Database server is not recommended on vSphere, citing Metalink 249212.1. Google it for yourself: we have found that the Oracle support is just perpetuating fear, uncertainty, and doubt.  Aside from Google, another good place to research is  Sungard’s BORACLE mailing list (customers only) – you’ll find there are several Banner UDC customers large and small running their full suite (including the database) VMware vSphere with no issues.

The support note also mentions ‘Other Third-Party Products’. Many Banner UDC clients use Cognos for reporting/BI. IBM offers Cognos support on VMware. There’s also Luminis, the most popular web portal with Banner UDC, which is also supported as a virtual machine. For more information, see documents titled “Luminis Platform 5″ and “Luminis Platform Virtualization”.

As for our organization, we have been running a couple of parts of the Banner UDC install in virtual machines from the beginning: eVisions Intellecheck and Banner ePrint have given zero problems with regard to virtualization.

Our journey to full suite virtualization started out by performing a large Banner UDC upgrade (Banner 7 to Banner 8) with vSphere. The testing of the upgrade was successful (and so the real upgrade) AND we obtained Oracle DBA buy-in to running Banner UDC on virtual machines.

Since then, the staging environment’s application and database servers are on virtual machines and no one’s noticed.  Once the staging environment pilot is done, the next step is to work toward moving our production servers to vSphere.

PowerCLI: nSeries Guest OS Timeout Scripts

First post in months. Hopefully I’ll have more to post on in the next few weeks. Onto more pressing issues…

“The N Series VMware ESX Host Utilities” have a couple of ISOs that have scripts for tweaking the guest OS disk timeouts.

From the IBM Redbook IBM System Storage N series and VMware vSphere Storage Best Practices:

This is a job for PowerCLI. Inspired from Jase’s script.

Considerations for Linux: Use an account that can do passwordless sudo (or root- you don’t have to have root SSH enabled for Invoke-VMscript).

# Requires VMware Tools to be installed on each guest.
$winisoName = "[datastore] path/to/windows_gos_timeout.iso"
$linisoName = "[datastore] path/to/linux_gos_timeout.iso"
# Get Windows Guest Credentials
Write-Host "Enter Windows Credentials"
$wincred = Get-Credential
# Get Linux Guest Credentials
Write-Host "Enter Linux Credentials"
$lincred = Get-Credential
# Get ESX Host Creds
Write-Host "Enter ESX  Credentials"
$esxcred = Get-Credential
# Get ALL VMs
$vms = Get-VM
foreach($vms as $vm){
	$driveName = "CD/DVD Drive 1"
	$vm = $vm | Get-View
	$dev = $vm.Config.Hardware.Device | where {$_.DeviceInfo.Label -eq $driveName}

	$spec = New-Object VMware.Vim.VirtualMachineConfigSpec
	$spec.deviceChange = New-Object VMware.Vim.VirtualDeviceConfigSpec[] (1)
	$spec.deviceChange[0] = New-Object VMware.Vim.VirtualDeviceConfigSpec
	$spec.deviceChange[0].operation = "edit"
	$spec.deviceChange[0].device = $dev
	$spec.deviceChange[0].device.backing = New-Object VMware.Vim.VirtualCdromIsoBackingInfo
	$spec.deviceChange[0].device.backing.fileName = $isoName

	$vm.ReconfigVM_Task($spec)

	$winscript = "regedit /s windows_gos_timeout.reg"
	$linscript = "sudo /media/cdrom/linux_gos_timeout-install.sh"

	if($vm.Guest.OSFullName -match "Microsoft") {
			Invoke-VMScript -HostCredentials $esxcred -GuestCredentials -ScriptText $winscript
	}
	else if($vm.Guest.OSFullName -match "Linux") {
			Invoke-VMScript -HostCredentials $esxcred -GuestCredentials -ScriptText $linscript
	}
}

Back to work.

Gem in the VMware vCenter Converter 4.2 for vCenter Server Release Notes

It’s been a while since I’ve had time to post, but I couldn’t keep this to myself:

From the VMware vCenter Converter 4.2 for vCenter Server 4.1 Release Notes:

What’s New
The VMware vCenter Converter 4.2 is a substantial upgrade from vCenter Converter 4.1 and includes the following new functionality (previously found only in vCenter Converter Standalone 4.0.x):
Physical to virtual machine conversion support for Linux sources including:
Red Hat Enterprise Linux 2.1, 3.0, 4.0, and 5.0
SUSE Linux Enterprise Server 8.0, 9.0, 10.0, and 11.0
Ubuntu 5.x, 6.x, 7.x, and 8.x

This sounds too good to be true. I upgraded my lab environment and confirmed:  you can now do scheduled Linux P2V.

The lab setup was ESX 4.1, vCenter 4.1, and vCenter Converter 4.2 for vCenter Server with a RHEL4.7 AS server imported.

VMware Expert System with PowerCLI and Excel

This post is a writeup of a project for a master’s class in Decision Support Systems at Murray State.  This is my first dive into VMware PowerCLI aside from some one shots. All feedback is welcome.

Our Problems

Problem 1: Servers are not being virtualized due to a decentralized procurement process

A decentralized server procurement process presents many problems to an organization. There are many gains with standardizing OS/hardware platforms.

Problem 2: Servers are not being virtualized because knowledge is required to make “Virtualize/Don’t Virtualze” decision

The benefits of server virtualization are easy to explain and are a part of our culture. However, the organization has not adopted a “virtualize first” mentality. There is still a lack of stakeholder understanding with regards to virtualization.

Due to lack of knowledge, ROI is not maximized. This knowledge exists in two places – the virtual infrastructure itself and as tacit knowledge with the VMware administrator.

PROBLEM ANALYSIS

Problem 1: Servers are not being virtualized due to a decentralized procurement process
This problem is outside of the scope of the CIS645 class. We’re working on it.

Problem 2: Servers are not being virtualized because knowledge is required to make “Virtualize/Don’t Virtualze” decision

Problem 2 has two major parts.

CAPACITY – CAN OUR VIRTUAL INFRASTRUCTURE SUPPORT THIS APPLICATION?

This question has historically been answered heuristically with ball park figures. Manually gathering current storage and RAM capacity data too time consuming.

CANDIDACY – BASED ON SYSTEM REQUIREMENTS AND INDUSTRY KNOWLEDGE, IS VIRTUALIZATION SUITABLE FOR THIS APPLICATION?

This is the harder question. Typically you’ll hear consultants say “it depends”. Answering this question usually involves a phone call with the VMware administrator. The conversation is series of questions from the administrator to the stakeholder.

RECOMMENDATION

When the two questions have been answered, a recommendation of Virtualize/Don’t Virtualize is made. If a Virtualize decision is made, the VMware administrator must find the optimal storage unit to deploy to and coordinate the deployment with the stakeholder.

SOLUTION DESIGN

USER INTERFACE

The users of this system are already familiar with Excel and would prefer to utilize Excel’s familiarity and What-If scenario planning.

What if we added another 2TB of storage?
What if we upgraded our RAM?
What if we didn’t have to have the license dongle?

Excel quickly enables these questions to be answered. A normal ‘GUI’ application would take more time to develop and would not invite queries of an ad-hoc nature.

CAPACITY

Capacity data resides at several levels: the virtual machine itself, the host, and the data store. The data is put into Excel using VMware’s PowerCLI. PowerCLI is a Windows PowerShell snap-in that integrates with any VMware Virtual Infrastructure. Windows PowerShell also integrates nicely with Excel.
Here are the steps to capacity gathering with the VMware Expert System:

  • Open the Excel Spreadsheet
  • Clear previously gathered data
  • Connect to a vCenter Server
  • Gather datastore information
  • Gather host information
  • Gather virtual machine information
  • Write values to ‘Capacity’ Worksheet
  • Write values to ‘New Virtual Machine’ Worksheet
  • Save Excel Spreadsheet
  • Clean up and quit Excel

CANDIDACY

The user of the VMware Expert System will answer a series of questions to determine system candidacy. Through knowledge capture, the conversation with the VMware Administrator does not need to take place. The knowledge is generally accepted by a community of VMware experts.

RECOMMENDATION

After answering the capacity and candidacy questions, the user receives a final recommendation. The recommendation is only “Virtualize” if capacity is available and candidacy is met.

The interface also displays reasons why a machine is not suitable for virtualization to enable What-If analysis.

DECISION TREE

Modified from VI:OPS P2V Decision Tree

RUNNING THE VMWARE EXPERT SYSTEM

PREREQUISITES

STEPS TO RUN THE VMWARE EXPERT SYSTEM

  • Download and extract vmware-expert-system.zip
  • Rename launch.tab to launch.bat
  • Edit launch.bat, line 2
    • Substitute your path to updatespreadsheet.ps1 where you see “C:\users\%username%\Documents\cis645\Project\vmware_expert_system\updatespreadsheet.ps1″, make sure the path is in quotation marks
  • Edit updatespreadsheet.ps1, line 11
    • Substitute your path to vmware_expert_system.xlsm where you see “C:\users\%username%\Documents\cis645\Project\vmware_expert_system\vwmare_expert_system.xlsm”, make sure the path is in quotation marks
  • Run click ‘launch.bat’
  • A screen similar to this will appear:
  • Launch the spreadsheet “vmware_expert_system.xlsm” and enable macros
  • Enter system requirements
  • Press “Send Work Order”

EXAMPLE SYSTEM: NEW WEB SERVER

  • Enter the hostname: newwebserver
    • The hostname must not be already existing and must be a valid hostname (“The Internet Engineering Task Force (IETF)”)
  • Enter a functional contact: Andy Hill
  • Enter a staff contact: Andy Hill
  • Select an Operating System: Windows Server 2003
  • Enter a storage requirement: 20 GB
    • The minimum storage requirement must be >8 GB and less than the size of a maximum single disk
  • Enter a RAM requirement: 1024 MB
    • The minimum RAM requirement is 256MB and must be less than one host and still tolerant of a host failure
  • Number of Processors: 1
    • Must be numeric, greater than or equal to 1, less than or equal to 4
  • Number of NICs: 1
    • Must be numeric, greater than or equal to 1, less than or equal to 4
  • Average CPU utilization: 5%
    • Must be numeric, between 0 and 1, if 4 processors are used average utilization cannot exceed 50%
  • Average RAM utilization: 256 MB
    • Must not exceed 8GB
  • Average NIC utilization: 1 MBps
    • Must not exceed 100MBps
  • Maximum Disk IO: 10 MBps
    • Must not exceed 100MBps
  • Answer TRUE/FALSE to the following hardware components
    • Modems: FALSE
    • Fax Cards: FALSE
    • License Dongles: FALSE
    • Security Dongles: FALSE
    • Hardware Encryption: FALSE
  • Answer TRUE/FALSE to Vendor Support: TRUE
  • Recommendation: Virtualize!

ADD NEW SUPPORTED GUEST OS

VMware’s Guest OS Compatibility Guide (“VMware, Inc.”) is exhaustive and does not line up with Murray State University’s environment. The drop-down list is populated from a hidden worksheet within Excel. For our environment, we limited this drop down to Guests OSes which have regularly maintained templates.

To add, delete, or change an entry in the operating system list follow these steps:

  1. Toward the bottom of Excel, right click the current worksheet
  2. From the context menu, select “Unhide…”
  3. From the Unhide Window, Select ‘Supported Guest Operating Systems’ and press OK
  4. Navigate to the ‘Supported Guest Operating Systems’ Worksheet. Make changes Column A. Only changes in Column A will be reflected in the spreadsheet. Save your changes.

Future Considerations

Future versions of this project will include:
  • Support for advanced disk layouts
  • Get-Template feeding the ‘Supported Guest OS’ worksheet
  • 1 click ‘deploy from template’
  • Support for tiered storage
  • Graphs of compute resources by host and virtual machine

# VMware Expert System Capacity Gathering
# v0.2
# by Andy Hill
# http://virtualandy.wordpress.com

# gathering data for VMware capacity
$viserver = Read-Host "Enter a vCenter server";
Write-Host "Gathering Excel data...1/8"

$excel = new-object -comobject Excel.Application
# Edit this value to the location of your vmware_expert_system.xlsm
$excelfile = $excel.workbooks.open("C:\Users\andy.hill\Documents\cis645\Project\vmware_expert_system\vmware_expert_system.xlsm")
$worksheet = $excelfile.worksheets.item(3) # Select Capacity Worksheet

Write-Host "Clearing existing capacity data...2/8"
# Clear existing data
$worksheet.Range("A5:N65000").Clear() | out-null
$worksheet.cells.item(1,2) = $viserver

Write-Host "Connecting to $viserver, this may take a moment...3/8"
connect-viserver $viserver -erroraction stop -WarningAction SilentlyContinue | out-null

# datastore information
Write-Host "Gathering disk information...4/8"
$i = 5
$disks = get-datastore
foreach($disk in $disks) {
	$worksheet.cells.item($i, 1) = $disk.name;
	$worksheet.cells.item($i, 2) = $disk.freespaceMB;
	$worksheet.cells.item($i, 3) = $disk.capacityMB;
	$i++;
}
$disk_count = $i;

$i = 5
Write-Host "Gathering host information...5/8"
# host information
Get-VMHost | %{Get-View $_.ID} | %{
	$esx = "" | select Name, NumCpuPackages, NumCpuCores, Hz, Memory
	$esx.NumCpuPackages = $_.Hardware.CpuInfo.NumCpuPackages
	$esx.NumCpuCores = $_.Hardware.CpuInfo.NumCpuCores
	$esx.Hz = $_.Hardware.CpuInfo.Hz
	$esx.Memory = $_.Hardware.MemorySize
	$esx.Name = $_.Name

	$worksheet.cells.item($i, 6) = $esx.Name
	$worksheet.cells.item($i, 7) = $esx.NumCpuPackages
	$worksheet.cells.item($i, 8) = $esx.NumCpuCores
	$worksheet.cells.item($i, 9) = $esx.hz / 1000 / 1000
	$worksheet.cells.item($i, 10) = $esx.memory / 1024 / 1024;
	$i++;
}
$host_count = $i;

# vm information
$i = 5
Write-Host "Gathering virtual machine information...6/8"

get-vm | % {
	$vm = "" | select name, MemoryMB
	$worksheet.cells.item($i, 13) = $_.Name
	$worksheet.cells.item($i, 14) = $_.MemoryMB
	$i++;
}

# Create the totals and amount utilized
$worksheet.cells.item(($i+1),13) = "Total"
$worksheet.cells.item(($i+1),14) = "=sum(N6:N" + $i + ")"
$vm_count = $i;

Write-Host "Writing values to Excel Spreadsheet...7/8"
#add some formatting
$worksheet.cells.item(($disk_count + 2), 1) = "Datastore with most free space";
$worksheet.cells.item(($disk_count + 3), 1) = "Memory (MB) Available";
$worksheet.cells.item(($disk_count + 4), 1) = "Memory Utilization %";
$worksheet.cells.item(($disk_count + 5), 1) = "Storage Available (GB)";
$worksheet.cells.item(($disk_count + 6), 1) = "Storage Utilization %";
$worksheet.cells.item(($disk_count + 7), 1) = "Most Storage Available on a datastore (GB)";

# add the formulas
$worksheet.cells.item(($disk_count + 2), 2) = "=INDEX((A5:A" + $disk_count + "),MATCH(MAX(B5:B" + $disk_count + "),B5:B" + $disk_count + ",0))";
$worksheet.cells.item(($disk_count + 3), 2) = "=SUM(J5:J" + $host_count + ") - N" + ($vm_count+1);
$worksheet.cells.item(($disk_count + 4), 2) = "=N" + ($vm_count+1) + "/SUM(J5:J" + ($host_count-1) + ")"; # n-1 hosts for HA failover
$worksheet.cells.item(($disk_count + 5), 2) = "=SUM(B5:B" + $disk_count + ")/1024";
$worksheet.cells.item(($disk_count + 6), 2) = "=1-SUM(B5:B" + $disk_count + ")/SUM(C5:C" + $disk_count + ")";
$worksheet.cells.item(($disk_count + 7), 2) = "=INDEX((B5:B" + $disk_count + "),MATCH(MAX(B5:B" + $disk_count + "),B5:B" + $disk_count + ",0))/1024";

Write-Host "Saving Excel Spreadsheet...8/8";
# Select main worksheet
$worksheet = $excelfile.worksheets.item(1);
# Update the 'new virtual machine' worksheet with capacity data
$worksheet.cells.item(8,4) = "=Capacity!B" + ($disk_count + 5) + "-'New Virtual Machine'!B8";
$worksheet.cells.item(8,7) = "=MAX(Capacity!B5:" + "B" + ($disk_count - 1) + ")/1024";
$worksheet.cells.item(9,4) = "=(Capacity!B" + ($disk_count +3) + ")/1024";
$worksheet.cells.item(29,2) = "=Capacity!B" + ($disk_count + 2);
$excel.activeworkbook.save();
$excel.quit();

ESX 3.5 U4 Kickstart for IBM xSeries and QLA4050

This was our shop’s first real dive into kickstarts. The material I read in Visible Ops really emphasized track able/repeatable processes for setting up systems. One great way to do that is through kickstart scripts and some kind of version control system. We used Subversion.

I’ve edited a few parts out of this, but I spent a while finding several kickstart scripts that accomplished parts of what we needed. I highly customized one for our environment.

What it does:

  • Configures licensing for the host using a license server
  • Configures NTP
  • Adds users, expires their accounts and configures a sudo group
  • MOTD
  • Configures NICs and VMware ESX Networking
  • Creates a script to download and install IBM iSCSI Host Utilities Kit
  • Creates a script to download and install QLA4050C BIOS and firmware updates

Thanks to Leo’s ESX 3.5 Kickstart script – part 3.

You will need to download IBM iSCSI Host Utilities Kit from IBM and the QLA4050C BIOS and Firmware from QLogic to a server with scp capabilities.

# make sure this file is UNIX formatted so the line breaks can be handled.
install
lang en_US.UTF-8
langsupport --default en_US.UTF-8
keyboard us
mouse genericwheelps/2 --device psaux
skipx
network --device eth0 --bootproto static --ip <ip> --netmask <netmask> --gateway <gw> --nameserver <dns1>,<dns2> --hostname <hostname> --addvmportgroup=0 --vlanid=0
# Encrypted root password
rootpw --iscrypted
<password>
firewall --enabled
authconfig --enableshadow --enablemd5
timezone America/Chicago
bootloader --location=mbr
# The following is the partition information you requested
# Note that any partitions you deleted are not expressed
# here so unless you clear all partitions first, this is
# not guaranteed to work
vmaccepteula
# test license server
vmlicense --mode=server --server=27000@<vc> --edition=esxFull --features=vsmp,backup
reboot
firewall --enable
clearpart --exceptvmfs --drives=sda
part /boot --fstype ext3 --size=100 --ondisk=sda
part / --fstype ext3 --size=1800 --grow --maxsize=5000 --ondisk=sda
part swap --size=544 --grow --maxsize=544 --ondisk=sda
part /var/log --fstype ext3 --size=100 --grow --ondisk=sda

%packages
grub
@base

%post
cat > /etc/rc.d/rc3.d/S11servercfg << EOF

#Configure NTP
echo "Configuring NTP"
chkconfig --level 345 ntpd on
echo "restrict kod nomodify notrap noquery nopeer" > /etc/ntp.conf
echo "restrict 127.0.0.1" >> /etc/ntp.conf
echo "server <ntp> >> /etc/ntp.conf
echo "driftfile /var/lib/ntp/drift" >> /etc/ntp.conf
echo <ntp>" > /etc/ntp/step-tickers
service ntpd start

#Adding users with default password "changeme" generated with `openssl passwd changeme`

echo "Adding users"
adduser <user1> -p MKgX23V6snwoc
chage -d 0 -M 99999 <user1>
adduser <user2> -p MKgX23V6snwoc
chage -d 0 -M 99999 <user2>
adduser <user3>  -p MKgX23V6snwoc
chage -d 0 -M 99999 <user3>
usermod -G wheel user
usermod -G wheel user2
usermod -G wheel user3
echo "Done adding users"

echo "Configuring sudoers"
cat > /etc/sudoers << SUDO
# sudoers file.
#
# This file MUST be edited with the 'visudo' command as root.
#
# See the sudoers man page for the details on how to write a sudoers file.
#
# Host alias specification
# User alias specification
# Cmnd alias specification
# Defaults specification
Defaults syslog=local2
# User privilege specification
root ALL=(ALL) ALL
# Uncomment to allow people in group wheel to run all commands
%wheel ALL=(ALL) ALL
# Same thing without a password
# %wheel ALL=(ALL) NOPASSWD: ALL
# Samples
# %users ALL=/sbin/mount /cdrom,/sbin/umount /cdrom
# %users localhost=/sbin/shutdown -h now
SUDO
echo "Done configuring sudoers"

echo "Configuring MOTD"
echo "MOTD HERE" > /etc/motd
echo "Done configuring MOTD"

echo "Configuring hosts file"
echo "ip hostname.fqdn hostname" >> /etc/hosts
echo "Done configuring hosts file"

# we have 6 nics
echo "Configuring NIC duplex/speeds"
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic0
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic1
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic2
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic3
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic4
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic5
echo "Configuring NIC duplex/speeds"

echo "Configuring networking"
# VMNetwork
/usr/sbin/esxcfg-vswitch -a vSwitch1
# Blind Switch
/usr/sbin/esxcfg-vswitch -a vSwitch2
# VMkernel
/usr/sbin/esxcfg-vswitch -a vSwitch3
# Add NIC 1 and 3 to vSwitch1 (VMNetwork)
/usr/sbin/esxcfg-vswitch -L vmnic1 vSwitch1
/usr/sbin/esxcfg-vswitch -L vmnic3 vSwitch1
# Add NIC 2 to vSwitch0 (Service Console, already contains NIC 0)
/usr/sbin/esxcfg-vswitch -L vmnic2 vSwitch0
# Add NIC 4 and 5 to vSwitch3 (VMkernel)
/usr/sbin/esxcfg-vswitch -L vmnic4 vSwitch3
/usr/sbin/esxcfg-vswitch -L vmnic5 vSwitch3
# Give appropriate port group labels to vSwitches
/usr/sbin/esxcfg-vswitch -A "Blind Switch" vSwitch2
/usr/sbin/esxcfg-vswitch -A "VMkernel" vSwitch3
/usr/sbin/esxcfg-vswitch -A "VMNetwork" vSwitch1
# Configure IP addresses for service console and VMkernel
/usr/sbin/esxcfg-vswif -i <ip> -n 255.255.255.0 vswif0
/usr/sbin/esxcfg-vmknic -a -i <vmotion address> -n 255.255.255.0 VMotion
/usr/sbin/esxcfg-vswif -E
# Enable SSH Client through firewall
/usr/sbin/esxcfg-firewall -e sshClient
echo "Done configuring networking"

# generate script to download/install HUK, make it executable
echo "Generating host utilities download/install script"
cat > /root/huk-install.sh << HUK
cd /home/user/
scp user@host:/home/user/ibm_iscsi_esx_host_utilities_3_1.tar.gz .
tar -zxf ibm_iscsi_esx_host_utilities_3_1.tar.gz
cd ibm_iscsi_esx_host_utilities_3_1
./install
echo "Done generating host utilities download/install script"
HUK
chmod a+x /root/huk-install.sh

# generate script to download/install iscli and firmware/BIOS updates, make it executable
echo "Generating iscli and firmware update script"
cat > /root/iscli-script.sh << ISCLI
cd /home/user/
scp user@host:/home/user/iscli-1.2.00-15_linux_i386.install.tar.gz user@host:/home/user/ql4022rm.BIN user@host:/home/user/VER4032_03_00_01_53.zip .
tar -xvzf iscli-1.2.00-15_linux_i386.install.tar.gz
unzip VER4032_03_00_01_53.zip
chmod +x iscli.dkms.install.sh
./iscli.dkms.install.sh install
# HBA 0
/usr/local/bin/iscli -f 0 /home/user/qla4022.dl
sleep 5
/usr/local/bin/iscli -bootcode 0 /home/user/ql4022rm.BIN
sleep 5
# HBA 1
/usr/local/bin/iscli -f 1 /home/user/qla4022.dl
sleep 5
/usr/local/bin/iscli -bootcode 1 /home/user/ql4022rm.BIN
sleep 5
reboot
ISCLI
echo "Done generating iscli and firmware script"

# Moves this file so it will not be called on next host boot
mv /etc/rc.d/rc3.d/S11servercfg /root/unsw-setup.sh
rm -f /root/system-info
EOF
/bin/chmod a+x /etc/rc.d/rc3.d/S11servercfg

Upgrading ESX 3.5 to ESX 3.5 U4 and Virtual Center 2.5 to vCenter 2.5 U4

Here’s the ‘script’ read from while doing our ESX upgrades:

In general:

  • Do lots of up front work with kickstarts and analysis

Each ESX Host

  • Put host in maintenance mode
  • Shut Down
  • File request with storage administrator to make only boot LUN is visible to host as we are about to do some potentially damaging operations
  • Put in new HBA (QLA4050)
  • Boot to floppy diskette with QLA 4050 BIOS firmware updates
  • Upgrade HBA BIOS
  • iFlash
  • If the system detects a QLx40xx controller, it displays the following message:
  • QLx40xx Adapter found at I/O address: xxxxxxxx
  • You will need to enter the adapter address
  • Select “FB” to flash the BIOS. The iFlash program will write flash to the adapter using ql4022rm.BIN found in the same directory.
  • Reboot. Press CTRL+Q on the second (new) HBA to manage boot settings
  • Configure Host Adapter according to IP / initiator name
  • Configure iSCSI Target
    • You will need:
    • iSCSI name
    • IP Address
    • Subnet Mask
    • Default Gateway
    • iSCSI Target
    • IP Address:port
    • Target Name
    • Host Boot Settings = MANUAL
    • Exit and Reboot
  • Insert ESX 3.5 U4 CD (We don’t have PXE boot available yet)
  • Reboot system to boot from ESX 3.5 U4 CD
  • Install ESX 3.5 U4
  • type ‘esx ks=<url to kickstart file> ksdevice=eth0 method=cdrom
  • More on the kickstart file is here
  • Press enter. This installs ESX with all appropriate settings. Ask someone for the root password.
  • Log in as root
  • sh iscli-script.sh (from the kickstart)
  • sh huk-install.sh (from the kickstart)
  • Launch VirtualCenter
  • Disconnect the host from VirtualCenter (Right click, disconnect)
  • Reconnect the host to VirtualCenter (Right click, connect)
  • Enter maintenance mode (so no VMs are vMotioned on)
  • VMotion doesn’t get set up correctly via kickstart because the host does not have shared storage. Contact the SAN Administrator to make the other ESX LUNs  visible and rescan.
  • Delete the VMKernel Switch
  • Add the VMkernel switch (nic4 and nic5), enabling vmotion. <IP address> subnet <subnet> – no default GW since not routed
  • Configuration -> Memory -> Increase Service Console RAM to 800MB
  • Configure Storage Paths in Active/Passive
  • Reboot Host (to enact Service Console RAM changes)
  • Exit Maintenance Mode

vCenter Database Server

  • Manually backup VMware database
BACKUP DATABASE [VMWare] TO  DISK =  N'C:\Program Files\Microsoft SQL  Server\MSSQL.1\MSSQL\Backup\VMWare\VMWare_backup_preupgrade.bak' WITH NOFORMAT,  NOINIT,  NAME = N'VMWare-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS  = 10
	GO
  • Manually backup UpdateManager
BACKUP DATABASE [UpdateManager] TO  DISK =  N'C:\Program Files\Microsoft SQL  Server\MSSQL.1\MSSQL\Backup\UpdateManager\UpdateManager_backup_preupgrade.bak'  WITH NOFORMAT, NOINIT,  NAME = N'UpdateManager-Full Database Backup', SKIP,  NOREWIND, NOUNLOAD,  STATS = 10
GO
  • Grant MSDB owner permissions for SQL user
USE [msdb]GO

EXEC sp_addrolemember  N'db_owner', N'USER'

GO

vCenter Server

  • Log in as local administrator
  • Back up the License File
    • copy "C:\Program Files\VMware\VMware License Server\Licenses\vmware.lic" \\server\share\vmware-license-backup.lic
  • Mount vCenter DVD ISO
  • Back up sysprep files for templates
    • copy C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\sysprep\.* \\server\share
  • Run vCenter Install
  • Reboot Server
  • Notify users of upgrades
  • Schedule times for VMware Tools Upgrades

vCenter Database Server

  • Revoke MSDB owner permissions for SQL user
USE [msdb]GO
	EXEC  sp_droprolemember N'db_owner', N'USER'
        GO

iSCSI SAN performance woes with VMware ESX 3.5

We filed support requests with IBM and VMware and went through a very lengthy process without any results.

Each of our hosts had the following iSCSI HBAs:

  • QLA4010
  • QLA4050C

A while ago we found out QLA4010 is not on the ESX 3.5 HCL even though it runs with a legacy driver.

As our virtual environment grew we noticed storage performance lagging. This was particularly evident with our Oracle 10G Database server running our staging instance of Banner Operational Data Store. We were seeing 1.1 MB/sec and slower for disk writes.

We opened a case with VMware support and later with IBM support.  We provided lots of data to VMware and IBM while no one mentioned the unsupported HBA. No one at IBM mentioned it either. VMware support referred us to KB# 1006821 to test virtual machine storage I/O performance.

We ran HD Speed in a new VM mimicing the setup using RDM and using a dedicated LUN. Similar results.
We ran HD Speed on the same RDM on a physical machine and got 45 MB/sec.

All of our hosts had an entry like this in the logs (grep -i abort /var/log/vmkernel* | less)

vmkernel.36:Mon DD HH:ii:ss vmkernel: 29:02:31:16.863 cpu3:1061)LinSCSI: 3201: Abort failed for cmd with serial=541442, status=bad0001, retval=bad0001

Hundreds, if not thousands of these iSCSI aborts in the log files. We punted to IBM and they gave us the recommendation of running Host Utilities Kit. This optimizes HBA settings specific to IBM storage systems.

My recommendation ended up being two fold: Upgrade the ESX hosts because we were on an old build (95xxx) and replace the QLA4010 with a QLA4050C on each host.

Now that our ESX upgrade is complete we are seeing much better performance from our iSCSI storage.

ESXi Snapshots not showing in VI Client

Yesterday I made a mistake. We have a virtual machine set up to test Spacewalk which runs CentOS.

It has a virtual disk for this OS on datastore1 and a virtual disk for the data on datastore2. datastore1 had 11 gb free and datastore2 had 300 gb free. I snapshotted the VM, we did some work, and I committed the snapshot. Except it didn’t work. Now the machine won’t stay booted.I remembered reading something from Yellow-Bricks about disk space and snapshots. Oops. Since this VM was on an ESXi host, there was no service console commands to commit the snapshot.

This error popped up, and the VM would power down:

There is no more space for the redo log of VMNAME-000001.vmdk.

I freed up some space on datastore1, but I couldn’t find how to commit the snapshot. There were several -delta.vmdk files in the virtual machine’s folder on datastore1.

Solution: After freeing up some disk space, I created another snapshot from the VI Client. Then I immediately when to “Delete All”. This got rid of the orphaned snapshot as well as the newly created one.

AutoPager now works with VMTN and NetApp Technology Network sites

AutoPager is a Firefox extension which follows the “Next” links on lots of pages and loads them inline. If you’re already using the extension, go to AutoPager -> Update Setting -> Update Setting Online.

The authors just added VMTN forums and NetApp Technology Network to their supported sites. This means if you’re reading a long thread you don’t have to click next. You can just keep scrolling — the next page is loaded inline.

thread

It also works on thread lists.

threadlist-loading

This is a screenshot of the “Loading” indicator in the bottom left. Once you scroll so far, it automatically shows up, then fetches the next page.

threadlist

Restoring VMware Virtual Machines from NetApp Snapshots

In our organization, the storage administrator is completely separate from the VI Administrator. This process requires some coordination with the storage administrator. Here is our process for restoring a VM from our SAN snapshots. A lot of this information was gleamed from Scott Lowe’s posts on FlexClones.

Unfortunately, we do not have SMVI (the jaw dropping video demo is here) at this moment. It appears NetApp has made this process trivial with that application. This is how we’re making it work on a limited budget.

Step 0 – Determine Snapshot to clone from

Working with the VMware admin, determine which Snapshot to clone from based on timestamp and LUN

Step 1 – Create LUN Clone

  • Telnet to the filer
  • Run this command to create LUN clone – lun clone create /vol/volume_name/lun_clone_name -o noreserve -b /vol/volume_name/original_lun_name parent_snapshot_name
  • Verify new LUN is created using FilerView in a browser

Step 2 – Map clone LUN

  • Log into FilerView for the filer
  • In left column click on LUNS, then Manage
  • Click on the name of the new LUN clone
  • Click on Map LUN near the top
  • Click on Add Groups to Map, and add to appropriate group
  • Type a number (we typically use 99) into the box labeled LUN ID and click Apply

Step 3 – Enable Volume Resignature

  • Launch VirtualCenter
  • From VC, select a host
  • Select the configuration tab
  • Select advanced
  • Navigate to LVM
  • Change the value of LVM.EnableResignature to 1 (on, the default value is 0)

Step 4 – Rescan for the new LUN

  • From the Configuration tab on a selected host, Navigate to Storage Adapters
  • Select “Rescan”
  • The recovered VMFS datastore will appear with a name similar to “snap_*”
  • From here, there are two options:
    • Add the virtual machine to inventory and run from the recovered LUN
    • Copy the virtual machine’s folder to another LUN, then add to inventory
  • It is recommended that you copy the virtual machine’s folder to another LUN (non snap_*), and then add the recovered virtual machine to inventory.

Step 5 – Clean up

  • Disable LVM.EnableResignature – repeat step 1 of this document, but change the value back to 0.
  • Ensure all VMs running on the recovery LUN are powered off
  • From VC, select a host
  • Select the configuration tab
  • Select Storage
  • Select the recovery LUN and click Remove
  • Delete the LUN clone after VMware admin has finished removing

 

The Virtual Machine will be brought up as if it went down from a “dirty” shutdown. In a lot of cases, this is okay. For write intensive applications (like databases) you may have to go a few steps farther in restoring functionality.

PlanetV12n: My VMware RSS Feed Wish List

Here’s my PlanetV12n Wish List (in no particular order):

  1. Provide feed customization. Strategy/Administration/Business Case/etc. Virtualization has turned into an extremely broad topic. Too much noise in the feed reader is a loss of value to PlanetV12n.
  2. Provide more virtualization related feeds from vendors like EMC, NetApp, Dell, and IBM.
  3. Require full articles. If there is resistance on this, just politely remind publishers that advertising is available via RSS
  4. Give us the option of having OPML output of PlanetV12n. Personally, I would prefer OPML-only, it gives users more control over what feeds they want to see. OPML can be imported into almost any feed reader. Lots of the bloggers on PlanetV12n are very interested in their subscriber statistics. Being published on PlanetV12n drives those numbers down.

My ideal setup for PlanetV12n, a form to generate an OPML file I can add to Google Reader. VMware’s site is full of these forms, so adding another can’t be that bad right? ;-)

Select your role within IT: (checkboxes) Business / Strategy / Administration / Performance / Disaster Recovery / Evangelist / etc.

Tell us about your VMware Products: (checkboxes) ESX / ESXi / Workstation / Fusion / etc

Tell us about your vendors: IBM / Dell / NetApp / EMC / etc

… the list goes on. This could be useful for VMware’s marketers as well as end users.

VMware Knowledge Resources for the Beginner VI Administrator

I have no problem making it clear I’m relatively new to the virtual world. That doesn’t mean you can’t learn fast.

Here are a few tools I’ve used to become a better VI Administrator:

  1. Training. Pros: Certified knowledge from the source. We hosted a VMware Jumpstart, and that training is without a doubt my catalyst into the rest of the virtual world. Training teaches you how to talk the talk so that other sources of knowledge are useful. Cons: Cost (not just upfront $$, but time cost).
  2. Web Sites to Search. Once you take on the new role, you need to do a considerable amount of reading. Pros: Low cost (aside from time) and can have a particularly high benefit. Cons: Lots of noise. Trouble distinguishing between good and bad sources.
    • VMTN
    • Google Reader and Planet V12n – I may have to write a separate post about my thoughts on V12n, but for the most part it is useful
    • VMware Knowledge Base
    • Google and FoxItReader – VMware’s website can be tedious to use, so using some operators in Google makes it a little more bearable for instance… site:vmware.com filetype:pdf. FoxItReader makes those PDF’s tolerable compared to Adobe Reader – and it has tabs!
    • Free VMworld Videos from 2007 which are still applicable today
  3. “Social” Media. I’m not including VMTN because I rarely post to it. The items below have been useful from an interactive standpoint — not just one sided conversations. There are similar pros and cons to this as websites — e.g., low cost vs. information overload and finding a reputable source.
    • #vmware IRC channel on freenode
    • Twitter – most of the bloggers from PlanetV12 also have twitter accounts, they post when products are released and can also provide quick @replies to your questions
  4. VMware Gold Support Pros: Very thorough, certified support. I am very happy with the support we’ve received from VMware. After I’ve exhausted Google, Social Media, etc., VMware Support has come through for us several times. Cons: Cost. Time being on hold and turnaround times.

RHEL P2V: Old Way and New Way

Most of this was taken from this site: http://conshell.net/wiki/index.php/Linux_P2V

Up front work

Determine exactly what you’re doing, and the resources you’ll need on the VMware side.

as root:

sfdisk -s
/dev/hda: 39070080
total: 39070080 blocks

To find the size in GB, divide by 1024 twice.
39070080/1024/1024 = 37.260 GB

Partition layout – know exactly the partitions, sizes and FS types. This can be gleaned from the output of `fdisk -l /dev/sda` and the content of /etc/fstab.

Disk types – IDE? SATA?

Downtime – Unfortunately, your source system must be down for the duration of the P2V process.

Have a copy of the system rescue CD ready. Boot the source system to it.

On the source system:
Back up the kernel’s ramdisk
cp /boot/initrd-`uname -r`.img /root/`uname -r`.bak
Make a new ramdisk with VMware-friendly RHEL scsi drivers
mkinitrd -v -f –with=mptscsih /boot/initrd-`uname -r`.img `uname -r`

This command will make SCSI drivers that VMware needs to use available to RHEL at boot time. This should not affect the source system.

md5sum /dev/sda – record the last six characters of the output. This generates a fingerprint used to verify integrity at the end.

On the target system:

Create a new VM
OS: Linux (RHEL 4/5)
Disk slightly larger than source system
NIC: upgrade to VMware tools
CDROM: System Rescue CD ISO
Boot the system, make sure the disks are recognized (sfdisk -s). Verify network is up with ifconfig eth0.

Disk Cloning

This part takes a while. Boot both systems to the system rescue CD. Try making a benchmark.

Make a 1 GB File on the source system, set the target to prepare for an incoming transmission:

Source: dd if=/dev/zero of=bigfile bs=1024 count=1048576
Target: nc -l -p 9001 | dd of=/dev/sda
Write down the start time.
Source: dd if=bigfile | nc 9001
Write down the finish time.
Estimate accordingly. (e.g., 20 gb would be at least 20 times longer)
For the “real” copy, remember you are copying a device to a device.

Target:

nc -l -p 9001 | dd of=/dev/sda
Source:

dd if=/dev/sda | nc 9001
There may be differing builds of nc, so your mileage may vary regarding the switches for ports. use nc –help to find out which version of the rescue CD. To gauge how long this would take you may want to try using pipe viewer.

One the source machine, if you need it to boot again you may need to run this command:
mv /root/`uname -r`.bak /boot/initrd-`uname -r`.img

New Way
VMware vCenter Converter 4.0 supports RHEL P2V. Win.

Get-VMStat and Resource Allocation

We have a problem that a lot of VI administrators (especially us young ones) run into – VM Sprawl.

In attempting to reduce the chaos and get a much better understanding of our virtual infrastructure I’ve run into a very helpful cmdlet provided in the VI Toolkit.

This tutorial is intended for the complete novice.

Step 1:

Get your system set up for VI Toolkit.

Download PowerShell.
Download the VI Toolkit.

Step 2:

Preparing your PowerShell environment.

Go to Start -> Run and type powershell. Press enter.

echo $profile
# C:\Documents and Settings\ME\My Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1

See Windows PowerShell Profiles to set up a profile.

Run

new-item -path $profile -itemtype file -force

If you’re familiar with bash, think of this as a /~/.bash_profile. Any “aliases” or cmdlets you place in here can be accessed throughout your powershell session.

A notepad window should open. If it didn’t, run the following command:

notepad $profile

Now, paste the Get-VMStat cmdlet in your profile.

function Get-VMStat
{
param( $VM,[Int32]$Hours = 1 )
 $VM | Sort Name |
 Select Name, @{N="CPU";E={[Math]::Round((($_ |
 Get-Stat -Stat cpu.usage.average -Start (Get-Date).AddHours(-$Hours) -IntervalMins 5 -MaxSamples ($Hours*12) |
 Measure-Object Value -Average).Average),2)}}, @{N="MEM";E={[Math]::Round((($_ |
 Get-Stat -Stat mem.usage.average -Start (Get-Date).AddHours(-$Hours) -IntervalMins 5 -MaxSamples ($Hours*12) |
 Measure-Object Value -Average).Average),2)}}
}

Save your profile.

Step 3: Use the cmdlet.

Launch the VI-Toolkit (Start > Programs > VMware > VMware VI Toolkit)

Then, run the following:

Get-VC # you will be asked for your VC server
Get-VMStat (Get-VM) -Hours 2160

2160 hours gives the average CPU and RAM for all of my VMs for the last 90 days. This is giving us a wonder baseline of information to make some resource allocation decisions.

VMware’s VCP Certification, my $0.02

It started with Eric Siebert’s open letter to VMware making this suggestion:

6. Relax the VMware Certified Professional (VCP) certification requirements. I shouldn’t have to take a class to become a VCP, if I have the knowledge and experience to pass the VCP exam that should be enough. Many qualified people can’t afford to take a class just so they can take the test.

Amen. Then Dave Lawrence, the VMguy responded, making a few points about the certification requirements.

One of my readers wrote in (Thanks Jay!) and disagreed with this one and I do as well. My reader, Jay, reminded me of the integrity of the exam and how that must be maintained. He told me how VMware should keep it the way it is to avoid “good test takers”. He also reminded me of the girl who got her MCP at age 9 . Sheesh, really!?! (and I failed my first one, that makes me feel good.)

Did any company hire the 9 year old girl who got her MCP? Probably not.

Is the MCP still a worthwhile certifications? Do employers still value it? Yes & yes.

He summarizes (emphasis his):

Your certification should be a challenge in the form of effort, know how and experience. I think that VMware wants to be certain that the test takers have actually seen and used the product at least once. Perhaps the MCP should do the same so 4th graders are not getting their certifications.

Training != Real Experience. How is this enhancing the VCP certification’s value?

I don’t have the money to pay for the class myself right now. It’s difficult to justify the cost to my employer since we already purchased a jumpstart where we installed and configured VI3. My employer can comp the exam costs, but I can’t take it without additional training that I can’t justify.

If the experience has to be there to create value in the VCP program, why does it have to be such a specific kind?

My correspondence with VMware:

Hello,
I have attended a training session entitled “VMware Jumpstart” – does this qualify me to take the VCP exam?

VMware’s response:

Hi Andy,

The jumpstart class is not a VCP qualifying class. Only the classes listed below meet the requirement.

1. VI 3: Install and Configure

2. VI 3: Deploy Secure and Analyze

3. VI 3: Fast Track

Without completion of one of these classes you will not be certified. Class must be taken with VMware or one of VMware’s authorized training partners to be counted. All authorized class can be found at http://www.vmware.com/education.

Boo, VMware. Boo. That’s my $0.02 since I don’t have at least $3000 available to qualify for the VCP.

False Alarms with Virtual Center 2.5

Since I enabled alarms in VirtualCenter on 10/07/2008 we have encountered 14 seperate false alarms regarding host connectivity.

Here’s the alarm:

Target: hostname.goes.here

Old Status: Green
New Status: Red

Current value:
Host connection state – (State = Not responding)

Alarm: Host connection state
([Red State Is Equal To notResponding])

Description:
Alarm Host connection state on hostname.goes.here changed from Green to Red

Here’s what we went through with support.

  • Sending Diagnostics from VC
  • Found out we are running an unsupported HBA (QLA4010’s are not supported in ESX 3.5, but in ESX 3) … this was frustrating because we have seen that they will work with ESX 3.5 elsewhere from VMware
  • Advised to up the Service Console RAM to 800 MB from 272 MB

We haven’t been seeing false alarms since.

Our Storage Problem

We had some storage issues. We still have some storage issues, but it’s getting better. Here’s what we’ve fixed:

  • Overbooked storage
  • Storage Switch Failure Tolerance
  • Adapter Failure Tolerance

 

Overbooked Storage Units

The most immediate issue that could be addressed was the storage bloat. This did not require additional hardware. Previously, our storage allocated for VMware was as follows:

  • VMFSLun1 (600 GB)
  • VMFSLun2 (900 GB)

All 30 virtual machines the university ran (46 individual virtual hard disks) were running on two LUNs. Through collaboration with the SAN administrator, VMware’s Storage vMotion technology, and the SVMotion Plugin[3], the LUNs were balanced as much as possible without the addition of new hardware. The new storage is laid out as follows, per NetApp & VMware recommendations [1] [2]:

  • VMFSLun1 (300 GB – reallocated from old VMFSLun1)
  • VMFSLun2 (300 GB – reallocated from old VMFSLun1)
  • VMFSLun3 (300 GB – reallocated from old VMFSLun2)
  • VMFSLun4 (300 GB – reallocated from old VMFSLun2)
  • VMFSLun5 (300 GB – reallocated from old VMFSLun2)
  • Templates_and_ISOs (50 GB – new)
  • VMFSLun6 (300 GB – new)

We reorganized their existing allocated storage (1500 GB) into a more optimized layout. Additionally, a 50 GB LUN was added (Templates_and_ISOs) for organizational purposes. The need for additional storage capacity was identified and VMFSLun6 was created with existing iSCSI storage.

Switch and Adapter Fault Tolerance

During the procurement process, Information Systems staff planned to implement the new hardware. We then created a plan.

The existing storage setup had 3 hosts, 30 virtual machines (46 virtual disks) attached to two LUNs with one host based adapter (HBA) and one path. The filers are redundant in the sense that they are clustered for IP takeover, but there were two additional points of failure:

  • If a host’s HBA failed, the VMs would be unavailable and data would be lost.
  • If the switch the any of the cluster’s HBAs are connected to failed, all virtual machine disks would be unavailable and all virtual machines would likely incur data loss.

Figure 1.1 – Before the upgrade, HBA fault tolerance diagram

The new storage adapter fault tolerance plan had two major goals: tolerance of a switch failure and tolerance of a storage adapter failure. Planning to tolerate switch failure was straightforward: attach the additional HBA into another switch. Planning to tolerate HBA failure for a given LUN was handled with VMware’s Virtual Infrastructure Client.

Managing Paths within VI Client.

After we received the new HBAs, each host was brought down with no downtime. The new HBAs were placed in the hosts, and connected to the additional switches. The hosts were brought back up and primary paths were set in the VI Client per LUN.

Diagram of our Virtual Infrastructure showing HBA and switch fault tolerance

Storage Performance

Multiple paths to LUNs serves as an important point which will also help us lessen LUN contention and increase IO performance. Now that there are two available paths for each LUN, the “hot” paths can be evenly split between HBAs, increasing the total throughput per HBA. Previously, we had all LUN traffic travel through a single point (HBA 1), and each virtual disk per LUN had to be accommodated.

Our previous LUN / path layout

With the additional path along with the better balanced LUNs (see “Overbooked Storage”) the new structure has six LUNs, but only 3 being used by each HBA. This layout means less total traffic through each HBA per LUN.

 

Our Current Storage Layout, demonstrating multiple paths and lower LUN contention

 

[1] “SAN System Design and Deployment Guide”, VMware
http://www.vmware.com/resources/techresources/772

[2] “NetApp and VMware Virtual Infrastructure 3 Storage Best Practices”, NetApp; page 11
http://media.netapp.com/documents/tr-3428.pdf

[3] “VI Plug-in – SVMotion”, Schley Andrew Kutz
http://sourceforge.net/project/showfiles.php?group_id=228535