VMware Knowledge Resources for the Beginner VI Administrator

I have no problem making it clear I’m relatively new to the virtual world. That doesn’t mean you can’t learn fast.

Here are a few tools I’ve used to become a better VI Administrator:

  1. Training. Pros: Certified knowledge from the source. We hosted a VMware Jumpstart, and that training is without a doubt my catalyst into the rest of the virtual world. Training teaches you how to talk the talk so that other sources of knowledge are useful. Cons: Cost (not just upfront $$, but time cost).
  2. Web Sites to Search. Once you take on the new role, you need to do a considerable amount of reading. Pros: Low cost (aside from time) and can have a particularly high benefit. Cons: Lots of noise. Trouble distinguishing between good and bad sources.
    • VMTN
    • Google Reader and Planet V12n – I may have to write a separate post about my thoughts on V12n, but for the most part it is useful
    • VMware Knowledge Base
    • Google and FoxItReader – VMware’s website can be tedious to use, so using some operators in Google makes it a little more bearable for instance… site:vmware.com filetype:pdf. FoxItReader makes those PDF’s tolerable compared to Adobe Reader – and it has tabs!
    • Free VMworld Videos from 2007 which are still applicable today
  3. “Social” Media. I’m not including VMTN because I rarely post to it. The items below have been useful from an interactive standpoint — not just one sided conversations. There are similar pros and cons to this as websites — e.g., low cost vs. information overload and finding a reputable source.
    • #vmware IRC channel on freenode
    • Twitter – most of the bloggers from PlanetV12 also have twitter accounts, they post when products are released and can also provide quick @replies to your questions
  4. VMware Gold Support Pros: Very thorough, certified support. I am very happy with the support we’ve received from VMware. After I’ve exhausted Google, Social Media, etc., VMware Support has come through for us several times. Cons: Cost. Time being on hold and turnaround times.

Chargeback: The Value Added Pitch

Our chargeback policy for virtual machines was not clearly defined. To encourage adoption, the provisioning fee for a virtual machine was $500 regardless of system requirements. In lieu of a host being added and other considerable investments being made hardware, the chargeback policy needed to be revised. Virtual Machines within our matured Virtual Infrastructure cost more to provide but add a considerable amount of value.

After a bit of reading at vmMBA and watching BM15 Managing Chargeback with VMware Infrastructure 3 from VMworld 2007 we decided to go with a tiered chargeback method. The chargeback method has three tiers – low, middle, and high. The lowest tier is designed to be cheapest by far to encourage adoption on a larger scale. Our goal is to instill the confidence we have in our virtual infrastructure to our users.

Feature

Physical System

Virtual Machine

Redundant Power

Standard in higher-tier servers, lower-tier servers ~$100

Standard on hosts

Redundant CPU

Duplicate hardware required

Through multiple hosts and high availability

Redundant RAM

Duplicate hardware required

Through multiple hosts and high availability

Redundant Storage

Not required; additional costs for RAID levels or multiple storage adapters (HBAs)

Standard through multiple HBAs, SAN switch redundancy, SAN RAID levels

Redundant Network

Standard in higher-tier servers; switch redundancy not always implemented

Through multiple hosts, multiple NICs, & multiple switches

Warranty

Optional – increases total cost between $350-$1500

Standard on hosts

System Lifetime

Server MTBF, then reordered, and moved (With an outage)

N/A

Testability

Risky, requires downtime or “test” hardware

Live servers can be placed into snapshot mode; Local copies can also be downloaded for extensive tests

Hardware Maintenance

Outage/Maintenance Windows Required

Transparent to users via vMotion

Recoverability

File level only, requires OS & application reloads

File level and Image level

Illustration of added value with virtual machines

Let’s take this a step further with a Dell server. Quotes generated with the using Dell Higher Education website.

The server I will be using is the PowerEdge 2970. Base cost: $5,330.

Feature

Physical System

Virtual Machine

Redundant Power

Added Cost: $0

Included.

Redundant CPU

Duplicate hardware required. Added Cost: $5,330

Included.

Redundant RAM

Included in the above duplicate hardware. Requires downtime to replace.

Included.

Redundant Storage

RAID 5 Standard

Cost: 0

Included.

Redundant Network

Standard; must instruct staff to use different switches

Added Cost: $0

Included.

Warranty

3 year warranty standard

Added Cost: $0

Included.

Total

($5,330 + $5,330) = $10,660

 

 

This is probably a mid to high tier VM, which would depend on disk IO rates and load. I could see charging anywhere from $3,000-$4,500 for this VM. I will eventually post the details on how that number was derived.

Total value added: ($10,660 – $4,500) = $6,160. That’s the high estimate on what would be charged. That’s not even including the depreciation that the server would incur each year. Or the value added from faster recovery times with VMware Consolidated Backup or SAN based snapshots. Or the value added from being able to deploy a virtual machine much faster than acquire physical equipment. Or the zero downtime VMware hosts can have with hardware maintenance via vMotion. Or the downtime that was not incurred because virtual machines have ability to be placed in snapshot mode, protecting from potentially disruptive changes.

Of course, this is all just talk unless you to have an SLA to back the value added claims up. In your SLA it is important to note that high availability with CPU/RAM is not fault tolerance. That’s coming soon, but VMware Fault Tolerance has not hit the market yet. While the SLA cannot guarantee zero downtime, an SLA backed by VMware HA can guarantee much less downtime than a non clustered physical system.

How We Found Our Virtual Networking Mojo

Switch and Network Adapter Fault Tolerance

Each of the VMware ESX hosts that we had were equipped with dual Network Adapters (NICs). With a typical physical server, two NICs could demonstrate fault tolerance. However, for ESX hosts the dual NIC is not fault tolerant. VMware ESX has three major types of traffic:

  1. VMkernel – used for vMotion, which allows host downtime without an interruption of service
  2. Service Console – initiates vMotion, serves as the primary venue of managing Virtual Machines
  3. VM Traffic – each individual virtual machine’s traffic, e.g., a web server’s incoming /outgoing requests

Our previous setup had significant points of failure:

  • If one NIC failed there would be complete service interruption.
    • If VMNIC0 failed, the interruption is slightly more controllable as it could be scheduled even though the virtual machines would not be manageable.
    • If VMNIC1 failed, all virtual machine traffic would be unexpectedly interrupted.
  • If the physical switch failed there would be an unscheduled complete service interruption.

Our old VMware Networking setup, no NIC/switch redundancy

 

The plan was to purchase an additional quad port NIC to remedy the NIC redundancy issue. With the additional 4 ports, the three types of traffic could have two dedicated ports. This setup is fault tolerant for NICs with each of the three traffic types. It’s taking the Cisco ESX Hosts with 4 NICs (page 62) to the next level. To fix the switch redundancy problem, the NICs are evenly split across two switches, per traffic type.

Networking setup demonstrating switch/NIC redundancy

 

Networking Performance

With the previous setup, we had an average of 7.5 VM’s going through one NIC per host. The NICs were being overused. Users were complaining of slow network performance. We implemented NIC teaming, while being afforded failover with favorable results.

cacti

Cacti Diagram showing before and after on an ESX host

In the above pic, a drastic decrease in utilization is shown after Week 32. This demonstrates the decrease in stress on the NIC that was being over utilized.

VI3 Value Added: System Deployment

One of the things I have to do as a Virtual Infrastructure Administrator is evangelize value added with virtual machines. What are my stakeholders going to get out of this? Where’s the value added? Why should we use a virtual machines instead of physical machines? Here’s my explanation of some advantages and value added using VMware Virtual Infrastructure 3 from a system deployment standpoint.

This post will show a “before and after” view of system deployment. We had a pretty ‘vanilla’ setup of ESX on a few servers, but with a few simple changes to your environment with partition alignment, sysprep, and adding a patching policy can really add to the value your virtual infrastructure.

Here’s a quick comparison of physical system deployment and virtual machine deployment (with an ‘out of the box’ ESX setup):

Physical System 

Virtual Machine 

Discover system requirements 

Discover system requirements 

Contact vendor and receive system quote 

Contact system stakeholders, provide a quote

Receive quote approval from system stakeholders 

Receive quote approval from system stakeholders 

Complete the procurement process

Template Deployment 

System ships to our location

Document Virtual Machine

Unbox, Rack, Cable, and Document server

Configure OS*

OS installation and hardware configuration

Software Installation

Software Installation 

 

* Windows guests had to be configured, patched, and joined to the domain

Add in a few enhancements (mainly sysprep for Windows guests) and here’s the same comparison, only showing relevant costs while omitting costs which are the same among alternatives:

Physical System 

Virtual Machine 

Complete the procurement process 

Template Deployment 

System ships to our location

 

Unbox, Rack, and Cable server

 

Hardware configuration

 

Physical system deployment compared to Virtual Machine Deployment, omitting sunk costs

Changes we have made regarding template redesign have further shortened the time to deploy a new virtual machine, thus shortening the turnaround time for system requests.

Template Redesign

Our Windows and Linux virtual machine templates received necessary patches, updates as well as several enhancements to further realize ROI in with VI3.

  • Higher I/O performance, reduced storage latency with partition alignment
  • Eliminate configuration time necessity for Windows guests with sysprep
  • Inclusion of a configured copy of VMware Tools in all templates
  • Creation of policy mandating quarterly template updates

Partition Alignment

The majority of our existing Virtual Machines running the Windows OS were deployed from templates that were created with improper partition alignment. This improper alignment resulted in less than optimal virtual disk performance. Now, new Windows templates have been created with proper partition alignment which have higher I/O performance and reduced latency [1] [2].

Sysprep

VMware recommends using sysprep to decrease Windows guest deployment times[3]. Windows guests required several configuration steps when previously deployed without sysprep.

  • Running NewSID to ensure no Active Directory collisions and renaming the computer
  • Large OS and Anti Virus updating overhead due to lack of regular updates
  • Joining the domain

All three of these previously mandatory steps have been completely replaced by sysprep answer files for Windows Server 2003 and Windows XP guests, significantly reducing the time spent in OS configuration. We have also followed most of the recommendations made by Leo Raikhman (Linux, Windows) aside from the thin disk provisioning.

Template Patching Policy

The Virtual Machine templates had not been subject to a policy of upgrades due to scarcity of staff resources. We have identified system deployment time as a priority and taken necessary actions to ensure deployment time remains manageable and efficient. We now have policy in place to quarterly patch our template guest OSes.

[1] Storage Block Alignment with VMware Virtual Infrastructure, NetApp; page 7
http://media.netapp.com/documents/tr_3593.pdf

[2] Recommendations for Aligning VMFS Partitions, VMware; page 8-10
http://www.vmware.com/pdf/esx3_partition_align.pdf

[3] “Basic System Administration”, VMware; page 345
http://www.vmware.com/pdf/vi3_301_201_admin_guide.pdf