PowerCLI: nSeries Guest OS Timeout Scripts

First post in months. Hopefully I’ll have more to post on in the next few weeks. Onto more pressing issues…

“The N Series VMware ESX Host Utilities” have a couple of ISOs that have scripts for tweaking the guest OS disk timeouts.

From the IBM Redbook IBM System Storage N series and VMware vSphere Storage Best Practices:

This is a job for PowerCLI. Inspired from Jase’s script.

Considerations for Linux: Use an account that can do passwordless sudo (or root- you don’t have to have root SSH enabled for Invoke-VMscript).

# Requires VMware Tools to be installed on each guest.
$winisoName = "[datastore] path/to/windows_gos_timeout.iso"
$linisoName = "[datastore] path/to/linux_gos_timeout.iso"
# Get Windows Guest Credentials
Write-Host "Enter Windows Credentials"
$wincred = Get-Credential
# Get Linux Guest Credentials
Write-Host "Enter Linux Credentials"
$lincred = Get-Credential
# Get ESX Host Creds
Write-Host "Enter ESX  Credentials"
$esxcred = Get-Credential
# Get ALL VMs
$vms = Get-VM
foreach($vms as $vm){
	$driveName = "CD/DVD Drive 1"
	$vm = $vm | Get-View
	$dev = $vm.Config.Hardware.Device | where {$_.DeviceInfo.Label -eq $driveName}

	$spec = New-Object VMware.Vim.VirtualMachineConfigSpec
	$spec.deviceChange = New-Object VMware.Vim.VirtualDeviceConfigSpec[] (1)
	$spec.deviceChange[0] = New-Object VMware.Vim.VirtualDeviceConfigSpec
	$spec.deviceChange[0].operation = "edit"
	$spec.deviceChange[0].device = $dev
	$spec.deviceChange[0].device.backing = New-Object VMware.Vim.VirtualCdromIsoBackingInfo
	$spec.deviceChange[0].device.backing.fileName = $isoName

	$vm.ReconfigVM_Task($spec)

	$winscript = "regedit /s windows_gos_timeout.reg"
	$linscript = "sudo /media/cdrom/linux_gos_timeout-install.sh"

	if($vm.Guest.OSFullName -match "Microsoft") {
			Invoke-VMScript -HostCredentials $esxcred -GuestCredentials -ScriptText $winscript
	}
	else if($vm.Guest.OSFullName -match "Linux") {
			Invoke-VMScript -HostCredentials $esxcred -GuestCredentials -ScriptText $linscript
	}
}

Back to work.

Advertisements

Gem in the VMware vCenter Converter 4.2 for vCenter Server Release Notes

It’s been a while since I’ve had time to post, but I couldn’t keep this to myself:

From the VMware vCenter Converter 4.2 for vCenter Server 4.1 Release Notes:

What’s New
The VMware vCenter Converter 4.2 is a substantial upgrade from vCenter Converter 4.1 and includes the following new functionality (previously found only in vCenter Converter Standalone 4.0.x):
Physical to virtual machine conversion support for Linux sources including:
Red Hat Enterprise Linux 2.1, 3.0, 4.0, and 5.0
SUSE Linux Enterprise Server 8.0, 9.0, 10.0, and 11.0
Ubuntu 5.x, 6.x, 7.x, and 8.x

This sounds too good to be true. I upgraded my lab environment and confirmed:  you can now do scheduled Linux P2V.

The lab setup was ESX 4.1, vCenter 4.1, and vCenter Converter 4.2 for vCenter Server with a RHEL4.7 AS server imported.

VMware Expert System with PowerCLI and Excel

This post is a writeup of a project for a master’s class in Decision Support Systems at Murray State.  This is my first dive into VMware PowerCLI aside from some one shots. All feedback is welcome.

Our Problems

Problem 1: Servers are not being virtualized due to a decentralized procurement process

A decentralized server procurement process presents many problems to an organization. There are many gains with standardizing OS/hardware platforms.

Problem 2: Servers are not being virtualized because knowledge is required to make “Virtualize/Don’t Virtualze” decision

The benefits of server virtualization are easy to explain and are a part of our culture. However, the organization has not adopted a “virtualize first” mentality. There is still a lack of stakeholder understanding with regards to virtualization.

Due to lack of knowledge, ROI is not maximized. This knowledge exists in two places – the virtual infrastructure itself and as tacit knowledge with the VMware administrator.

PROBLEM ANALYSIS

Problem 1: Servers are not being virtualized due to a decentralized procurement process
This problem is outside of the scope of the CIS645 class. We’re working on it.

Problem 2: Servers are not being virtualized because knowledge is required to make “Virtualize/Don’t Virtualze” decision

Problem 2 has two major parts.

CAPACITY – CAN OUR VIRTUAL INFRASTRUCTURE SUPPORT THIS APPLICATION?

This question has historically been answered heuristically with ball park figures. Manually gathering current storage and RAM capacity data too time consuming.

CANDIDACY – BASED ON SYSTEM REQUIREMENTS AND INDUSTRY KNOWLEDGE, IS VIRTUALIZATION SUITABLE FOR THIS APPLICATION?

This is the harder question. Typically you’ll hear consultants say “it depends”. Answering this question usually involves a phone call with the VMware administrator. The conversation is series of questions from the administrator to the stakeholder.

RECOMMENDATION

When the two questions have been answered, a recommendation of Virtualize/Don’t Virtualize is made. If a Virtualize decision is made, the VMware administrator must find the optimal storage unit to deploy to and coordinate the deployment with the stakeholder.

SOLUTION DESIGN

USER INTERFACE

The users of this system are already familiar with Excel and would prefer to utilize Excel’s familiarity and What-If scenario planning.

What if we added another 2TB of storage?
What if we upgraded our RAM?
What if we didn’t have to have the license dongle?

Excel quickly enables these questions to be answered. A normal ‘GUI’ application would take more time to develop and would not invite queries of an ad-hoc nature.

CAPACITY

Capacity data resides at several levels: the virtual machine itself, the host, and the data store. The data is put into Excel using VMware’s PowerCLI. PowerCLI is a Windows PowerShell snap-in that integrates with any VMware Virtual Infrastructure. Windows PowerShell also integrates nicely with Excel.
Here are the steps to capacity gathering with the VMware Expert System:

  • Open the Excel Spreadsheet
  • Clear previously gathered data
  • Connect to a vCenter Server
  • Gather datastore information
  • Gather host information
  • Gather virtual machine information
  • Write values to ‘Capacity’ Worksheet
  • Write values to ‘New Virtual Machine’ Worksheet
  • Save Excel Spreadsheet
  • Clean up and quit Excel

CANDIDACY

The user of the VMware Expert System will answer a series of questions to determine system candidacy. Through knowledge capture, the conversation with the VMware Administrator does not need to take place. The knowledge is generally accepted by a community of VMware experts.

RECOMMENDATION

After answering the capacity and candidacy questions, the user receives a final recommendation. The recommendation is only “Virtualize” if capacity is available and candidacy is met.

The interface also displays reasons why a machine is not suitable for virtualization to enable What-If analysis.

DECISION TREE

Modified from VI:OPS P2V Decision Tree

RUNNING THE VMWARE EXPERT SYSTEM

PREREQUISITES

STEPS TO RUN THE VMWARE EXPERT SYSTEM

  • Download and extract vmware-expert-system.zip
  • Rename launch.tab to launch.bat
  • Edit launch.bat, line 2
    • Substitute your path to updatespreadsheet.ps1 where you see “C:\users\%username%\Documents\cis645\Project\vmware_expert_system\updatespreadsheet.ps1”, make sure the path is in quotation marks
  • Edit updatespreadsheet.ps1, line 11
    • Substitute your path to vmware_expert_system.xlsm where you see “C:\users\%username%\Documents\cis645\Project\vmware_expert_system\vwmare_expert_system.xlsm”, make sure the path is in quotation marks
  • Run click ‘launch.bat’
  • A screen similar to this will appear:
  • Launch the spreadsheet “vmware_expert_system.xlsm” and enable macros
  • Enter system requirements
  • Press “Send Work Order”

EXAMPLE SYSTEM: NEW WEB SERVER

  • Enter the hostname: newwebserver
    • The hostname must not be already existing and must be a valid hostname (“The Internet Engineering Task Force (IETF)”)
  • Enter a functional contact: Andy Hill
  • Enter a staff contact: Andy Hill
  • Select an Operating System: Windows Server 2003
  • Enter a storage requirement: 20 GB
    • The minimum storage requirement must be >8 GB and less than the size of a maximum single disk
  • Enter a RAM requirement: 1024 MB
    • The minimum RAM requirement is 256MB and must be less than one host and still tolerant of a host failure
  • Number of Processors: 1
    • Must be numeric, greater than or equal to 1, less than or equal to 4
  • Number of NICs: 1
    • Must be numeric, greater than or equal to 1, less than or equal to 4
  • Average CPU utilization: 5%
    • Must be numeric, between 0 and 1, if 4 processors are used average utilization cannot exceed 50%
  • Average RAM utilization: 256 MB
    • Must not exceed 8GB
  • Average NIC utilization: 1 MBps
    • Must not exceed 100MBps
  • Maximum Disk IO: 10 MBps
    • Must not exceed 100MBps
  • Answer TRUE/FALSE to the following hardware components
    • Modems: FALSE
    • Fax Cards: FALSE
    • License Dongles: FALSE
    • Security Dongles: FALSE
    • Hardware Encryption: FALSE
  • Answer TRUE/FALSE to Vendor Support: TRUE
  • Recommendation: Virtualize!

ADD NEW SUPPORTED GUEST OS

VMware’s Guest OS Compatibility Guide (“VMware, Inc.”) is exhaustive and does not line up with Murray State University’s environment. The drop-down list is populated from a hidden worksheet within Excel. For our environment, we limited this drop down to Guests OSes which have regularly maintained templates.

To add, delete, or change an entry in the operating system list follow these steps:

  1. Toward the bottom of Excel, right click the current worksheet
  2. From the context menu, select “Unhide…”
  3. From the Unhide Window, Select ‘Supported Guest Operating Systems’ and press OK
  4. Navigate to the ‘Supported Guest Operating Systems’ Worksheet. Make changes Column A. Only changes in Column A will be reflected in the spreadsheet. Save your changes.

Future Considerations

Future versions of this project will include:
  • Support for advanced disk layouts
  • Get-Template feeding the ‘Supported Guest OS’ worksheet
  • 1 click ‘deploy from template’
  • Support for tiered storage
  • Graphs of compute resources by host and virtual machine

# VMware Expert System Capacity Gathering
# v0.2
# by Andy Hill
# https://virtualandy.wordpress.com

# gathering data for VMware capacity
$viserver = Read-Host "Enter a vCenter server";
Write-Host "Gathering Excel data...1/8"

$excel = new-object -comobject Excel.Application
# Edit this value to the location of your vmware_expert_system.xlsm
$excelfile = $excel.workbooks.open("C:\Users\andy.hill\Documents\cis645\Project\vmware_expert_system\vmware_expert_system.xlsm")
$worksheet = $excelfile.worksheets.item(3) # Select Capacity Worksheet

Write-Host "Clearing existing capacity data...2/8"
# Clear existing data
$worksheet.Range("A5:N65000").Clear() | out-null
$worksheet.cells.item(1,2) = $viserver

Write-Host "Connecting to $viserver, this may take a moment...3/8"
connect-viserver $viserver -erroraction stop -WarningAction SilentlyContinue | out-null

# datastore information
Write-Host "Gathering disk information...4/8"
$i = 5
$disks = get-datastore
foreach($disk in $disks) {
	$worksheet.cells.item($i, 1) = $disk.name;
	$worksheet.cells.item($i, 2) = $disk.freespaceMB;
	$worksheet.cells.item($i, 3) = $disk.capacityMB;
	$i++;
}
$disk_count = $i;

$i = 5
Write-Host "Gathering host information...5/8"
# host information
Get-VMHost | %{Get-View $_.ID} | %{
	$esx = "" | select Name, NumCpuPackages, NumCpuCores, Hz, Memory
	$esx.NumCpuPackages = $_.Hardware.CpuInfo.NumCpuPackages
	$esx.NumCpuCores = $_.Hardware.CpuInfo.NumCpuCores
	$esx.Hz = $_.Hardware.CpuInfo.Hz
	$esx.Memory = $_.Hardware.MemorySize
	$esx.Name = $_.Name

	$worksheet.cells.item($i, 6) = $esx.Name
	$worksheet.cells.item($i, 7) = $esx.NumCpuPackages
	$worksheet.cells.item($i, 8) = $esx.NumCpuCores
	$worksheet.cells.item($i, 9) = $esx.hz / 1000 / 1000
	$worksheet.cells.item($i, 10) = $esx.memory / 1024 / 1024;
	$i++;
}
$host_count = $i;

# vm information
$i = 5
Write-Host "Gathering virtual machine information...6/8"

get-vm | % {
	$vm = "" | select name, MemoryMB
	$worksheet.cells.item($i, 13) = $_.Name
	$worksheet.cells.item($i, 14) = $_.MemoryMB
	$i++;
}

# Create the totals and amount utilized
$worksheet.cells.item(($i+1),13) = "Total"
$worksheet.cells.item(($i+1),14) = "=sum(N6:N" + $i + ")"
$vm_count = $i;

Write-Host "Writing values to Excel Spreadsheet...7/8"
#add some formatting
$worksheet.cells.item(($disk_count + 2), 1) = "Datastore with most free space";
$worksheet.cells.item(($disk_count + 3), 1) = "Memory (MB) Available";
$worksheet.cells.item(($disk_count + 4), 1) = "Memory Utilization %";
$worksheet.cells.item(($disk_count + 5), 1) = "Storage Available (GB)";
$worksheet.cells.item(($disk_count + 6), 1) = "Storage Utilization %";
$worksheet.cells.item(($disk_count + 7), 1) = "Most Storage Available on a datastore (GB)";

# add the formulas
$worksheet.cells.item(($disk_count + 2), 2) = "=INDEX((A5:A" + $disk_count + "),MATCH(MAX(B5:B" + $disk_count + "),B5:B" + $disk_count + ",0))";
$worksheet.cells.item(($disk_count + 3), 2) = "=SUM(J5:J" + $host_count + ") - N" + ($vm_count+1);
$worksheet.cells.item(($disk_count + 4), 2) = "=N" + ($vm_count+1) + "/SUM(J5:J" + ($host_count-1) + ")"; # n-1 hosts for HA failover
$worksheet.cells.item(($disk_count + 5), 2) = "=SUM(B5:B" + $disk_count + ")/1024";
$worksheet.cells.item(($disk_count + 6), 2) = "=1-SUM(B5:B" + $disk_count + ")/SUM(C5:C" + $disk_count + ")";
$worksheet.cells.item(($disk_count + 7), 2) = "=INDEX((B5:B" + $disk_count + "),MATCH(MAX(B5:B" + $disk_count + "),B5:B" + $disk_count + ",0))/1024";

Write-Host "Saving Excel Spreadsheet...8/8";
# Select main worksheet
$worksheet = $excelfile.worksheets.item(1);
# Update the 'new virtual machine' worksheet with capacity data
$worksheet.cells.item(8,4) = "=Capacity!B" + ($disk_count + 5) + "-'New Virtual Machine'!B8";
$worksheet.cells.item(8,7) = "=MAX(Capacity!B5:" + "B" + ($disk_count - 1) + ")/1024";
$worksheet.cells.item(9,4) = "=(Capacity!B" + ($disk_count +3) + ")/1024";
$worksheet.cells.item(29,2) = "=Capacity!B" + ($disk_count + 2);
$excel.activeworkbook.save();
$excel.quit();

ESX 3.5 U4 Kickstart for IBM xSeries and QLA4050

This was our shop’s first real dive into kickstarts. The material I read in Visible Ops really emphasized track able/repeatable processes for setting up systems. One great way to do that is through kickstart scripts and some kind of version control system. We used Subversion.

I’ve edited a few parts out of this, but I spent a while finding several kickstart scripts that accomplished parts of what we needed. I highly customized one for our environment.

What it does:

  • Configures licensing for the host using a license server
  • Configures NTP
  • Adds users, expires their accounts and configures a sudo group
  • MOTD
  • Configures NICs and VMware ESX Networking
  • Creates a script to download and install IBM iSCSI Host Utilities Kit
  • Creates a script to download and install QLA4050C BIOS and firmware updates

Thanks to Leo’s ESX 3.5 Kickstart script – part 3.

You will need to download IBM iSCSI Host Utilities Kit from IBM and the QLA4050C BIOS and Firmware from QLogic to a server with scp capabilities.

# make sure this file is UNIX formatted so the line breaks can be handled.
install
lang en_US.UTF-8
langsupport –default en_US.UTF-8
keyboard us
mouse genericwheelps/2 –device psaux
skipx
network –device eth0 –bootproto static –ip –netmask –gateway –nameserver , –hostname –addvmportgroup=0 –vlanid=0
# Encrypted root password
rootpw –iscrypted firewall –enabled
authconfig –enableshadow –enablemd5
timezone America/Chicago
bootloader –location=mbr
# The following is the partition information you requested
# Note that any partitions you deleted are not expressed
# here so unless you clear all partitions first, this is
# not guaranteed to work
vmaccepteula
# test license server
vmlicense –mode=server –server=27000@ –edition=esxFull –features=vsmp,backup
reboot
firewall –enable
clearpart –exceptvmfs –drives=sda
part /boot –fstype ext3 –size=100 –ondisk=sda
part / –fstype ext3 –size=1800 –grow –maxsize=5000 –ondisk=sda
part swap –size=544 –grow –maxsize=544 –ondisk=sda
part /var/log –fstype ext3 –size=100 –grow –ondisk=sda

%packages
grub
@base

%post
cat > /etc/rc.d/rc3.d/S11servercfg << EOF #Configure NTP echo "Configuring NTP" chkconfig --level 345 ntpd on echo "restrict kod nomodify notrap noquery nopeer" > /etc/ntp.conf
echo “restrict 127.0.0.1” >> /etc/ntp.conf
echo “server >> /etc/ntp.conf
echo “driftfile /var/lib/ntp/drift” >> /etc/ntp.conf
echo ” > /etc/ntp/step-tickers
service ntpd start

#Adding users with default password “changeme” generated with `openssl passwd changeme`

echo “Adding users”
adduser -p MKgX23V6snwoc
chage -d 0 -M 99999
adduser -p MKgX23V6snwoc
chage -d 0 -M 99999
adduser -p MKgX23V6snwoc
chage -d 0 -M 99999
usermod -G wheel user
usermod -G wheel user2
usermod -G wheel user3
echo “Done adding users”

echo “Configuring sudoers”
cat > /etc/sudoers << SUDO # sudoers file. # # This file MUST be edited with the 'visudo' command as root. # # See the sudoers man page for the details on how to write a sudoers file. # # Host alias specification # User alias specification # Cmnd alias specification # Defaults specification Defaults syslog=local2 # User privilege specification root ALL=(ALL) ALL # Uncomment to allow people in group wheel to run all commands %wheel ALL=(ALL) ALL # Same thing without a password # %wheel ALL=(ALL) NOPASSWD: ALL # Samples # %users ALL=/sbin/mount /cdrom,/sbin/umount /cdrom # %users localhost=/sbin/shutdown -h now SUDO echo "Done configuring sudoers" echo "Configuring MOTD" echo "MOTD HERE" > /etc/motd
echo “Done configuring MOTD”

echo “Configuring hosts file”
echo “ip hostname.fqdn hostname” >> /etc/hosts
echo “Done configuring hosts file”

# we have 6 nics
echo “Configuring NIC duplex/speeds”
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic0
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic1
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic2
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic3
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic4
/usr/sbin/esxcfg-nics -s 1000 -d full vmnic5
echo “Configuring NIC duplex/speeds”

echo “Configuring networking”
# VMNetwork
/usr/sbin/esxcfg-vswitch -a vSwitch1
# Blind Switch
/usr/sbin/esxcfg-vswitch -a vSwitch2
# VMkernel
/usr/sbin/esxcfg-vswitch -a vSwitch3
# Add NIC 1 and 3 to vSwitch1 (VMNetwork)
/usr/sbin/esxcfg-vswitch -L vmnic1 vSwitch1
/usr/sbin/esxcfg-vswitch -L vmnic3 vSwitch1
# Add NIC 2 to vSwitch0 (Service Console, already contains NIC 0)
/usr/sbin/esxcfg-vswitch -L vmnic2 vSwitch0
# Add NIC 4 and 5 to vSwitch3 (VMkernel)
/usr/sbin/esxcfg-vswitch -L vmnic4 vSwitch3
/usr/sbin/esxcfg-vswitch -L vmnic5 vSwitch3
# Give appropriate port group labels to vSwitches
/usr/sbin/esxcfg-vswitch -A “Blind Switch” vSwitch2
/usr/sbin/esxcfg-vswitch -A “VMkernel” vSwitch3
/usr/sbin/esxcfg-vswitch -A “VMNetwork” vSwitch1
# Configure IP addresses for service console and VMkernel
/usr/sbin/esxcfg-vswif -i -n 255.255.255.0 vswif0
/usr/sbin/esxcfg-vmknic -a -i -n 255.255.255.0 VMotion
/usr/sbin/esxcfg-vswif -E
# Enable SSH Client through firewall
/usr/sbin/esxcfg-firewall -e sshClient
echo “Done configuring networking”

# generate script to download/install HUK, make it executable
echo “Generating host utilities download/install script”
cat > /root/huk-install.sh << HUK cd /home/user/ scp user@host:/home/user/ibm_iscsi_esx_host_utilities_3_1.tar.gz . tar -zxf ibm_iscsi_esx_host_utilities_3_1.tar.gz cd ibm_iscsi_esx_host_utilities_3_1 ./install echo "Done generating host utilities download/install script" HUK chmod a+x /root/huk-install.sh # generate script to download/install iscli and firmware/BIOS updates, make it executable echo "Generating iscli and firmware update script" cat > /root/iscli-script.sh << ISCLI cd /home/user/ scp user@host:/home/user/iscli-1.2.00-15_linux_i386.install.tar.gz user@host:/home/user/ql4022rm.BIN user@host:/home/user/VER4032_03_00_01_53.zip . tar -xvzf iscli-1.2.00-15_linux_i386.install.tar.gz unzip VER4032_03_00_01_53.zip chmod +x iscli.dkms.install.sh ./iscli.dkms.install.sh install # HBA 0 /usr/local/bin/iscli -f 0 /home/user/qla4022.dl sleep 5 /usr/local/bin/iscli -bootcode 0 /home/user/ql4022rm.BIN sleep 5 # HBA 1 /usr/local/bin/iscli -f 1 /home/user/qla4022.dl sleep 5 /usr/local/bin/iscli -bootcode 1 /home/user/ql4022rm.BIN sleep 5 reboot ISCLI echo "Done generating iscli and firmware script" # Moves this file so it will not be called on next host boot mv /etc/rc.d/rc3.d/S11servercfg /root/unsw-setup.sh rm -f /root/system-info EOF /bin/chmod a+x /etc/rc.d/rc3.d/S11servercfg [/sourcecode]

Upgrading ESX 3.5 to ESX 3.5 U4 and Virtual Center 2.5 to vCenter 2.5 U4

Here’s the ‘script’ read from while doing our ESX upgrades:

In general:

  • Do lots of up front work with kickstarts and analysis

Each ESX Host

  • Put host in maintenance mode
  • Shut Down
  • File request with storage administrator to make only boot LUN is visible to host as we are about to do some potentially damaging operations
  • Put in new HBA (QLA4050)
  • Boot to floppy diskette with QLA 4050 BIOS firmware updates
  • Upgrade HBA BIOS
  • iFlash
  • If the system detects a QLx40xx controller, it displays the following message:
  • QLx40xx Adapter found at I/O address: xxxxxxxx
  • You will need to enter the adapter address
  • Select “FB” to flash the BIOS. The iFlash program will write flash to the adapter using ql4022rm.BIN found in the same directory.
  • Reboot. Press CTRL+Q on the second (new) HBA to manage boot settings
  • Configure Host Adapter according to IP / initiator name
  • Configure iSCSI Target
    • You will need:
    • iSCSI name
    • IP Address
    • Subnet Mask
    • Default Gateway
    • iSCSI Target
    • IP Address:port
    • Target Name
    • Host Boot Settings = MANUAL
    • Exit and Reboot
  • Insert ESX 3.5 U4 CD (We don’t have PXE boot available yet)
  • Reboot system to boot from ESX 3.5 U4 CD
  • Install ESX 3.5 U4
  • type ‘esx ks=<url to kickstart file> ksdevice=eth0 method=cdrom
  • More on the kickstart file is here
  • Press enter. This installs ESX with all appropriate settings. Ask someone for the root password.
  • Log in as root
  • sh iscli-script.sh (from the kickstart)
  • sh huk-install.sh (from the kickstart)
  • Launch VirtualCenter
  • Disconnect the host from VirtualCenter (Right click, disconnect)
  • Reconnect the host to VirtualCenter (Right click, connect)
  • Enter maintenance mode (so no VMs are vMotioned on)
  • VMotion doesn’t get set up correctly via kickstart because the host does not have shared storage. Contact the SAN Administrator to make the other ESX LUNs  visible and rescan.
  • Delete the VMKernel Switch
  • Add the VMkernel switch (nic4 and nic5), enabling vmotion. <IP address> subnet <subnet> – no default GW since not routed
  • Configuration -> Memory -> Increase Service Console RAM to 800MB
  • Configure Storage Paths in Active/Passive
  • Reboot Host (to enact Service Console RAM changes)
  • Exit Maintenance Mode

vCenter Database Server

  • Manually backup VMware database
BACKUP DATABASE [VMWare] TO  DISK =  N'C:\Program Files\Microsoft SQL  Server\MSSQL.1\MSSQL\Backup\VMWare\VMWare_backup_preupgrade.bak' WITH NOFORMAT,  NOINIT,  NAME = N'VMWare-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS  = 10
	GO
  • Manually backup UpdateManager
BACKUP DATABASE [UpdateManager] TO  DISK =  N'C:\Program Files\Microsoft SQL  Server\MSSQL.1\MSSQL\Backup\UpdateManager\UpdateManager_backup_preupgrade.bak'  WITH NOFORMAT, NOINIT,  NAME = N'UpdateManager-Full Database Backup', SKIP,  NOREWIND, NOUNLOAD,  STATS = 10
GO
  • Grant MSDB owner permissions for SQL user
USE [msdb]GO

EXEC sp_addrolemember  N'db_owner', N'USER'

GO

vCenter Server

  • Log in as local administrator
  • Back up the License File
    • copy "C:\Program Files\VMware\VMware License Server\Licenses\vmware.lic" \\server\share\vmware-license-backup.lic
  • Mount vCenter DVD ISO
  • Back up sysprep files for templates
    • copy C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\sysprep\.* \\server\share
  • Run vCenter Install
  • Reboot Server
  • Notify users of upgrades
  • Schedule times for VMware Tools Upgrades

vCenter Database Server

  • Revoke MSDB owner permissions for SQL user
USE [msdb]GO
	EXEC  sp_droprolemember N'db_owner', N'USER'
        GO

iSCSI SAN performance woes with VMware ESX 3.5

We filed support requests with IBM and VMware and went through a very lengthy process without any results.

Each of our hosts had the following iSCSI HBAs:

  • QLA4010
  • QLA4050C

A while ago we found out QLA4010 is not on the ESX 3.5 HCL even though it runs with a legacy driver.

As our virtual environment grew we noticed storage performance lagging. This was particularly evident with our Oracle 10G Database server running our staging instance of Banner Operational Data Store. We were seeing 1.1 MB/sec and slower for disk writes.

We opened a case with VMware support and later with IBM support.  We provided lots of data to VMware and IBM while no one mentioned the unsupported HBA. No one at IBM mentioned it either. VMware support referred us to KB# 1006821 to test virtual machine storage I/O performance.

We ran HD Speed in a new VM mimicing the setup using RDM and using a dedicated LUN. Similar results.
We ran HD Speed on the same RDM on a physical machine and got 45 MB/sec.

All of our hosts had an entry like this in the logs (grep -i abort /var/log/vmkernel* | less)

vmkernel.36:Mon DD HH:ii:ss vmkernel: 29:02:31:16.863 cpu3:1061)LinSCSI: 3201: Abort failed for cmd with serial=541442, status=bad0001, retval=bad0001

Hundreds, if not thousands of these iSCSI aborts in the log files. We punted to IBM and they gave us the recommendation of running Host Utilities Kit. This optimizes HBA settings specific to IBM storage systems.

My recommendation ended up being two fold: Upgrade the ESX hosts because we were on an old build (95xxx) and replace the QLA4010 with a QLA4050C on each host.

Now that our ESX upgrade is complete we are seeing much better performance from our iSCSI storage.

ESXi Snapshots not showing in VI Client

Yesterday I made a mistake. We have a virtual machine set up to test Spacewalk which runs CentOS.

It has a virtual disk for this OS on datastore1 and a virtual disk for the data on datastore2. datastore1 had 11 gb free and datastore2 had 300 gb free. I snapshotted the VM, we did some work, and I committed the snapshot. Except it didn’t work. Now the machine won’t stay booted.I remembered reading something from Yellow-Bricks about disk space and snapshots. Oops. Since this VM was on an ESXi host, there was no service console commands to commit the snapshot.

This error popped up, and the VM would power down:

There is no more space for the redo log of VMNAME-000001.vmdk.

I freed up some space on datastore1, but I couldn’t find how to commit the snapshot. There were several -delta.vmdk files in the virtual machine’s folder on datastore1.

Solution: After freeing up some disk space, I created another snapshot from the VI Client. Then I immediately when to “Delete All”. This got rid of the orphaned snapshot as well as the newly created one.

AutoPager now works with VMTN and NetApp Technology Network sites

AutoPager is a Firefox extension which follows the “Next” links on lots of pages and loads them inline. If you’re already using the extension, go to AutoPager -> Update Setting -> Update Setting Online.

The authors just added VMTN forums and NetApp Technology Network to their supported sites. This means if you’re reading a long thread you don’t have to click next. You can just keep scrolling — the next page is loaded inline.

thread

It also works on thread lists.

threadlist-loading

This is a screenshot of the “Loading” indicator in the bottom left. Once you scroll so far, it automatically shows up, then fetches the next page.

threadlist

Restoring VMware Virtual Machines from NetApp Snapshots

In our organization, the storage administrator is completely separate from the VI Administrator. This process requires some coordination with the storage administrator. Here is our process for restoring a VM from our SAN snapshots. A lot of this information was gleamed from Scott Lowe’s posts on FlexClones.

Unfortunately, we do not have SMVI (the jaw dropping video demo is here) at this moment. It appears NetApp has made this process trivial with that application. This is how we’re making it work on a limited budget.

Step 0 – Determine Snapshot to clone from

Working with the VMware admin, determine which Snapshot to clone from based on timestamp and LUN

Step 1 – Create LUN Clone

  • Telnet to the filer
  • Run this command to create LUN clone – lun clone create /vol/volume_name/lun_clone_name -o noreserve -b /vol/volume_name/original_lun_name parent_snapshot_name
  • Verify new LUN is created using FilerView in a browser

Step 2 – Map clone LUN

  • Log into FilerView for the filer
  • In left column click on LUNS, then Manage
  • Click on the name of the new LUN clone
  • Click on Map LUN near the top
  • Click on Add Groups to Map, and add to appropriate group
  • Type a number (we typically use 99) into the box labeled LUN ID and click Apply

Step 3 – Enable Volume Resignature

  • Launch VirtualCenter
  • From VC, select a host
  • Select the configuration tab
  • Select advanced
  • Navigate to LVM
  • Change the value of LVM.EnableResignature to 1 (on, the default value is 0)

Step 4 – Rescan for the new LUN

  • From the Configuration tab on a selected host, Navigate to Storage Adapters
  • Select “Rescan”
  • The recovered VMFS datastore will appear with a name similar to “snap_*”
  • From here, there are two options:
    • Add the virtual machine to inventory and run from the recovered LUN
    • Copy the virtual machine’s folder to another LUN, then add to inventory
  • It is recommended that you copy the virtual machine’s folder to another LUN (non snap_*), and then add the recovered virtual machine to inventory.

Step 5 – Clean up

  • Disable LVM.EnableResignature – repeat step 1 of this document, but change the value back to 0.
  • Ensure all VMs running on the recovery LUN are powered off
  • From VC, select a host
  • Select the configuration tab
  • Select Storage
  • Select the recovery LUN and click Remove
  • Delete the LUN clone after VMware admin has finished removing

 

The Virtual Machine will be brought up as if it went down from a “dirty” shutdown. In a lot of cases, this is okay. For write intensive applications (like databases) you may have to go a few steps farther in restoring functionality.

PlanetV12n: My VMware RSS Feed Wish List

Here’s my PlanetV12n Wish List (in no particular order):

  1. Provide feed customization. Strategy/Administration/Business Case/etc. Virtualization has turned into an extremely broad topic. Too much noise in the feed reader is a loss of value to PlanetV12n.
  2. Provide more virtualization related feeds from vendors like EMC, NetApp, Dell, and IBM.
  3. Require full articles. If there is resistance on this, just politely remind publishers that advertising is available via RSS
  4. Give us the option of having OPML output of PlanetV12n. Personally, I would prefer OPML-only, it gives users more control over what feeds they want to see. OPML can be imported into almost any feed reader. Lots of the bloggers on PlanetV12n are very interested in their subscriber statistics. Being published on PlanetV12n drives those numbers down.

My ideal setup for PlanetV12n, a form to generate an OPML file I can add to Google Reader. VMware’s site is full of these forms, so adding another can’t be that bad right? ;-)

Select your role within IT: (checkboxes) Business / Strategy / Administration / Performance / Disaster Recovery / Evangelist / etc.

Tell us about your VMware Products: (checkboxes) ESX / ESXi / Workstation / Fusion / etc

Tell us about your vendors: IBM / Dell / NetApp / EMC / etc

… the list goes on. This could be useful for VMware’s marketers as well as end users.

VMware Knowledge Resources for the Beginner VI Administrator

I have no problem making it clear I’m relatively new to the virtual world. That doesn’t mean you can’t learn fast.

Here are a few tools I’ve used to become a better VI Administrator:

  1. Training. Pros: Certified knowledge from the source. We hosted a VMware Jumpstart, and that training is without a doubt my catalyst into the rest of the virtual world. Training teaches you how to talk the talk so that other sources of knowledge are useful. Cons: Cost (not just upfront $$, but time cost).
  2. Web Sites to Search. Once you take on the new role, you need to do a considerable amount of reading. Pros: Low cost (aside from time) and can have a particularly high benefit. Cons: Lots of noise. Trouble distinguishing between good and bad sources.
    • VMTN
    • Google Reader and Planet V12n – I may have to write a separate post about my thoughts on V12n, but for the most part it is useful
    • VMware Knowledge Base
    • Google and FoxItReader – VMware’s website can be tedious to use, so using some operators in Google makes it a little more bearable for instance… site:vmware.com filetype:pdf. FoxItReader makes those PDF’s tolerable compared to Adobe Reader – and it has tabs!
    • Free VMworld Videos from 2007 which are still applicable today
  3. “Social” Media. I’m not including VMTN because I rarely post to it. The items below have been useful from an interactive standpoint — not just one sided conversations. There are similar pros and cons to this as websites — e.g., low cost vs. information overload and finding a reputable source.
    • #vmware IRC channel on freenode
    • Twitter – most of the bloggers from PlanetV12 also have twitter accounts, they post when products are released and can also provide quick @replies to your questions
  4. VMware Gold Support Pros: Very thorough, certified support. I am very happy with the support we’ve received from VMware. After I’ve exhausted Google, Social Media, etc., VMware Support has come through for us several times. Cons: Cost. Time being on hold and turnaround times.

RHEL P2V: Old Way and New Way

Most of this was taken from this site: http://conshell.net/wiki/index.php/Linux_P2V

Up front work

Determine exactly what you’re doing, and the resources you’ll need on the VMware side.

as root:

sfdisk -s
/dev/hda: 39070080
total: 39070080 blocks

To find the size in GB, divide by 1024 twice.
39070080/1024/1024 = 37.260 GB

Partition layout – know exactly the partitions, sizes and FS types. This can be gleaned from the output of `fdisk -l /dev/sda` and the content of /etc/fstab.

Disk types – IDE? SATA?

Downtime – Unfortunately, your source system must be down for the duration of the P2V process.

Have a copy of the system rescue CD ready. Boot the source system to it.

On the source system:
Back up the kernel’s ramdisk
cp /boot/initrd-`uname -r`.img /root/`uname -r`.bak
Make a new ramdisk with VMware-friendly RHEL scsi drivers
mkinitrd -v -f –with=mptscsih /boot/initrd-`uname -r`.img `uname -r`

This command will make SCSI drivers that VMware needs to use available to RHEL at boot time. This should not affect the source system.

md5sum /dev/sda – record the last six characters of the output. This generates a fingerprint used to verify integrity at the end.

On the target system:

Create a new VM
OS: Linux (RHEL 4/5)
Disk slightly larger than source system
NIC: upgrade to VMware tools
CDROM: System Rescue CD ISO
Boot the system, make sure the disks are recognized (sfdisk -s). Verify network is up with ifconfig eth0.

Disk Cloning

This part takes a while. Boot both systems to the system rescue CD. Try making a benchmark.

Make a 1 GB File on the source system, set the target to prepare for an incoming transmission:

Source: dd if=/dev/zero of=bigfile bs=1024 count=1048576
Target: nc -l -p 9001 | dd of=/dev/sda
Write down the start time.
Source: dd if=bigfile | nc 9001
Write down the finish time.
Estimate accordingly. (e.g., 20 gb would be at least 20 times longer)
For the “real” copy, remember you are copying a device to a device.

Target:

nc -l -p 9001 | dd of=/dev/sda
Source:

dd if=/dev/sda | nc 9001
There may be differing builds of nc, so your mileage may vary regarding the switches for ports. use nc –help to find out which version of the rescue CD. To gauge how long this would take you may want to try using pipe viewer.

One the source machine, if you need it to boot again you may need to run this command:
mv /root/`uname -r`.bak /boot/initrd-`uname -r`.img

New Way
VMware vCenter Converter 4.0 supports RHEL P2V. Win.

Get-VMStat and Resource Allocation

We have a problem that a lot of VI administrators (especially us young ones) run into – VM Sprawl.

In attempting to reduce the chaos and get a much better understanding of our virtual infrastructure I’ve run into a very helpful cmdlet provided in the VI Toolkit.

This tutorial is intended for the complete novice.

Step 1:

Get your system set up for VI Toolkit.

Download PowerShell.
Download the VI Toolkit.

Step 2:

Preparing your PowerShell environment.

Go to Start -> Run and type powershell. Press enter.

echo $profile
# C:\Documents and Settings\ME\My Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1

See Windows PowerShell Profiles to set up a profile.

Run

new-item -path $profile -itemtype file -force

If you’re familiar with bash, think of this as a /~/.bash_profile. Any “aliases” or cmdlets you place in here can be accessed throughout your powershell session.

A notepad window should open. If it didn’t, run the following command:

notepad $profile

Now, paste the Get-VMStat cmdlet in your profile.

function Get-VMStat
{
param( $VM,[Int32]$Hours = 1 )
 $VM | Sort Name |
 Select Name, @{N="CPU";E={[Math]::Round((($_ |
 Get-Stat -Stat cpu.usage.average -Start (Get-Date).AddHours(-$Hours) -IntervalMins 5 -MaxSamples ($Hours*12) |
 Measure-Object Value -Average).Average),2)}}, @{N="MEM";E={[Math]::Round((($_ |
 Get-Stat -Stat mem.usage.average -Start (Get-Date).AddHours(-$Hours) -IntervalMins 5 -MaxSamples ($Hours*12) |
 Measure-Object Value -Average).Average),2)}}
}

Save your profile.

Step 3: Use the cmdlet.

Launch the VI-Toolkit (Start > Programs > VMware > VMware VI Toolkit)

Then, run the following:

Get-VC # you will be asked for your VC server
Get-VMStat (Get-VM) -Hours 2160

2160 hours gives the average CPU and RAM for all of my VMs for the last 90 days. This is giving us a wonder baseline of information to make some resource allocation decisions.

VMware’s VCP Certification, my $0.02

It started with Eric Siebert’s open letter to VMware making this suggestion:

6. Relax the VMware Certified Professional (VCP) certification requirements. I shouldn’t have to take a class to become a VCP, if I have the knowledge and experience to pass the VCP exam that should be enough. Many qualified people can’t afford to take a class just so they can take the test.

Amen. Then Dave Lawrence, the VMguy responded, making a few points about the certification requirements.

One of my readers wrote in (Thanks Jay!) and disagreed with this one and I do as well. My reader, Jay, reminded me of the integrity of the exam and how that must be maintained. He told me how VMware should keep it the way it is to avoid “good test takers”. He also reminded me of the girl who got her MCP at age 9 . Sheesh, really!?! (and I failed my first one, that makes me feel good.)

Did any company hire the 9 year old girl who got her MCP? Probably not.

Is the MCP still a worthwhile certifications? Do employers still value it? Yes & yes.

He summarizes (emphasis his):

Your certification should be a challenge in the form of effort, know how and experience. I think that VMware wants to be certain that the test takers have actually seen and used the product at least once. Perhaps the MCP should do the same so 4th graders are not getting their certifications.

Training != Real Experience. How is this enhancing the VCP certification’s value?

I don’t have the money to pay for the class myself right now. It’s difficult to justify the cost to my employer since we already purchased a jumpstart where we installed and configured VI3. My employer can comp the exam costs, but I can’t take it without additional training that I can’t justify.

If the experience has to be there to create value in the VCP program, why does it have to be such a specific kind?

My correspondence with VMware:

Hello,
I have attended a training session entitled “VMware Jumpstart” – does this qualify me to take the VCP exam?

VMware’s response:

Hi Andy,

The jumpstart class is not a VCP qualifying class. Only the classes listed below meet the requirement.

1. VI 3: Install and Configure

2. VI 3: Deploy Secure and Analyze

3. VI 3: Fast Track

Without completion of one of these classes you will not be certified. Class must be taken with VMware or one of VMware’s authorized training partners to be counted. All authorized class can be found at http://www.vmware.com/education.

Boo, VMware. Boo. That’s my $0.02 since I don’t have at least $3000 available to qualify for the VCP.

False Alarms with Virtual Center 2.5

Since I enabled alarms in VirtualCenter on 10/07/2008 we have encountered 14 seperate false alarms regarding host connectivity.

Here’s the alarm:

Target: hostname.goes.here

Old Status: Green
New Status: Red

Current value:
Host connection state – (State = Not responding)

Alarm: Host connection state
([Red State Is Equal To notResponding])

Description:
Alarm Host connection state on hostname.goes.here changed from Green to Red

Here’s what we went through with support.

  • Sending Diagnostics from VC
  • Found out we are running an unsupported HBA (QLA4010’s are not supported in ESX 3.5, but in ESX 3) … this was frustrating because we have seen that they will work with ESX 3.5 elsewhere from VMware
  • Advised to up the Service Console RAM to 800 MB from 272 MB

We haven’t been seeing false alarms since.

Our Storage Problem

We had some storage issues. We still have some storage issues, but it’s getting better. Here’s what we’ve fixed:

  • Overbooked storage
  • Storage Switch Failure Tolerance
  • Adapter Failure Tolerance

 

Overbooked Storage Units

The most immediate issue that could be addressed was the storage bloat. This did not require additional hardware. Previously, our storage allocated for VMware was as follows:

  • VMFSLun1 (600 GB)
  • VMFSLun2 (900 GB)

All 30 virtual machines the university ran (46 individual virtual hard disks) were running on two LUNs. Through collaboration with the SAN administrator, VMware’s Storage vMotion technology, and the SVMotion Plugin[3], the LUNs were balanced as much as possible without the addition of new hardware. The new storage is laid out as follows, per NetApp & VMware recommendations [1] [2]:

  • VMFSLun1 (300 GB – reallocated from old VMFSLun1)
  • VMFSLun2 (300 GB – reallocated from old VMFSLun1)
  • VMFSLun3 (300 GB – reallocated from old VMFSLun2)
  • VMFSLun4 (300 GB – reallocated from old VMFSLun2)
  • VMFSLun5 (300 GB – reallocated from old VMFSLun2)
  • Templates_and_ISOs (50 GB – new)
  • VMFSLun6 (300 GB – new)

We reorganized their existing allocated storage (1500 GB) into a more optimized layout. Additionally, a 50 GB LUN was added (Templates_and_ISOs) for organizational purposes. The need for additional storage capacity was identified and VMFSLun6 was created with existing iSCSI storage.

Switch and Adapter Fault Tolerance

During the procurement process, Information Systems staff planned to implement the new hardware. We then created a plan.

The existing storage setup had 3 hosts, 30 virtual machines (46 virtual disks) attached to two LUNs with one host based adapter (HBA) and one path. The filers are redundant in the sense that they are clustered for IP takeover, but there were two additional points of failure:

  • If a host’s HBA failed, the VMs would be unavailable and data would be lost.
  • If the switch the any of the cluster’s HBAs are connected to failed, all virtual machine disks would be unavailable and all virtual machines would likely incur data loss.

Figure 1.1 – Before the upgrade, HBA fault tolerance diagram

The new storage adapter fault tolerance plan had two major goals: tolerance of a switch failure and tolerance of a storage adapter failure. Planning to tolerate switch failure was straightforward: attach the additional HBA into another switch. Planning to tolerate HBA failure for a given LUN was handled with VMware’s Virtual Infrastructure Client.

Managing Paths within VI Client.

After we received the new HBAs, each host was brought down with no downtime. The new HBAs were placed in the hosts, and connected to the additional switches. The hosts were brought back up and primary paths were set in the VI Client per LUN.

Diagram of our Virtual Infrastructure showing HBA and switch fault tolerance

Storage Performance

Multiple paths to LUNs serves as an important point which will also help us lessen LUN contention and increase IO performance. Now that there are two available paths for each LUN, the “hot” paths can be evenly split between HBAs, increasing the total throughput per HBA. Previously, we had all LUN traffic travel through a single point (HBA 1), and each virtual disk per LUN had to be accommodated.

Our previous LUN / path layout

With the additional path along with the better balanced LUNs (see “Overbooked Storage”) the new structure has six LUNs, but only 3 being used by each HBA. This layout means less total traffic through each HBA per LUN.

 

Our Current Storage Layout, demonstrating multiple paths and lower LUN contention

 

[1] “SAN System Design and Deployment Guide”, VMware
http://www.vmware.com/resources/techresources/772

[2] “NetApp and VMware Virtual Infrastructure 3 Storage Best Practices”, NetApp; page 11
http://media.netapp.com/documents/tr-3428.pdf

[3] “VI Plug-in – SVMotion”, Schley Andrew Kutz
http://sourceforge.net/project/showfiles.php?group_id=228535

Chargeback: The Value Added Pitch

Our chargeback policy for virtual machines was not clearly defined. To encourage adoption, the provisioning fee for a virtual machine was $500 regardless of system requirements. In lieu of a host being added and other considerable investments being made hardware, the chargeback policy needed to be revised. Virtual Machines within our matured Virtual Infrastructure cost more to provide but add a considerable amount of value.

After a bit of reading at vmMBA and watching BM15 Managing Chargeback with VMware Infrastructure 3 from VMworld 2007 we decided to go with a tiered chargeback method. The chargeback method has three tiers – low, middle, and high. The lowest tier is designed to be cheapest by far to encourage adoption on a larger scale. Our goal is to instill the confidence we have in our virtual infrastructure to our users.

Feature

Physical System

Virtual Machine

Redundant Power

Standard in higher-tier servers, lower-tier servers ~$100

Standard on hosts

Redundant CPU

Duplicate hardware required

Through multiple hosts and high availability

Redundant RAM

Duplicate hardware required

Through multiple hosts and high availability

Redundant Storage

Not required; additional costs for RAID levels or multiple storage adapters (HBAs)

Standard through multiple HBAs, SAN switch redundancy, SAN RAID levels

Redundant Network

Standard in higher-tier servers; switch redundancy not always implemented

Through multiple hosts, multiple NICs, & multiple switches

Warranty

Optional – increases total cost between $350-$1500

Standard on hosts

System Lifetime

Server MTBF, then reordered, and moved (With an outage)

N/A

Testability

Risky, requires downtime or “test” hardware

Live servers can be placed into snapshot mode; Local copies can also be downloaded for extensive tests

Hardware Maintenance

Outage/Maintenance Windows Required

Transparent to users via vMotion

Recoverability

File level only, requires OS & application reloads

File level and Image level

Illustration of added value with virtual machines

Let’s take this a step further with a Dell server. Quotes generated with the using Dell Higher Education website.

The server I will be using is the PowerEdge 2970. Base cost: $5,330.

Feature

Physical System

Virtual Machine

Redundant Power

Added Cost: $0

Included.

Redundant CPU

Duplicate hardware required. Added Cost: $5,330

Included.

Redundant RAM

Included in the above duplicate hardware. Requires downtime to replace.

Included.

Redundant Storage

RAID 5 Standard

Cost: 0

Included.

Redundant Network

Standard; must instruct staff to use different switches

Added Cost: $0

Included.

Warranty

3 year warranty standard

Added Cost: $0

Included.

Total

($5,330 + $5,330) = $10,660

 

 

This is probably a mid to high tier VM, which would depend on disk IO rates and load. I could see charging anywhere from $3,000-$4,500 for this VM. I will eventually post the details on how that number was derived.

Total value added: ($10,660 – $4,500) = $6,160. That’s the high estimate on what would be charged. That’s not even including the depreciation that the server would incur each year. Or the value added from faster recovery times with VMware Consolidated Backup or SAN based snapshots. Or the value added from being able to deploy a virtual machine much faster than acquire physical equipment. Or the zero downtime VMware hosts can have with hardware maintenance via vMotion. Or the downtime that was not incurred because virtual machines have ability to be placed in snapshot mode, protecting from potentially disruptive changes.

Of course, this is all just talk unless you to have an SLA to back the value added claims up. In your SLA it is important to note that high availability with CPU/RAM is not fault tolerance. That’s coming soon, but VMware Fault Tolerance has not hit the market yet. While the SLA cannot guarantee zero downtime, an SLA backed by VMware HA can guarantee much less downtime than a non clustered physical system.

How We Found Our Virtual Networking Mojo

Switch and Network Adapter Fault Tolerance

Each of the VMware ESX hosts that we had were equipped with dual Network Adapters (NICs). With a typical physical server, two NICs could demonstrate fault tolerance. However, for ESX hosts the dual NIC is not fault tolerant. VMware ESX has three major types of traffic:

  1. VMkernel – used for vMotion, which allows host downtime without an interruption of service
  2. Service Console – initiates vMotion, serves as the primary venue of managing Virtual Machines
  3. VM Traffic – each individual virtual machine’s traffic, e.g., a web server’s incoming /outgoing requests

Our previous setup had significant points of failure:

  • If one NIC failed there would be complete service interruption.
    • If VMNIC0 failed, the interruption is slightly more controllable as it could be scheduled even though the virtual machines would not be manageable.
    • If VMNIC1 failed, all virtual machine traffic would be unexpectedly interrupted.
  • If the physical switch failed there would be an unscheduled complete service interruption.

Our old VMware Networking setup, no NIC/switch redundancy

 

The plan was to purchase an additional quad port NIC to remedy the NIC redundancy issue. With the additional 4 ports, the three types of traffic could have two dedicated ports. This setup is fault tolerant for NICs with each of the three traffic types. It’s taking the Cisco ESX Hosts with 4 NICs (page 62) to the next level. To fix the switch redundancy problem, the NICs are evenly split across two switches, per traffic type.

Networking setup demonstrating switch/NIC redundancy

 

Networking Performance

With the previous setup, we had an average of 7.5 VM’s going through one NIC per host. The NICs were being overused. Users were complaining of slow network performance. We implemented NIC teaming, while being afforded failover with favorable results.

cacti

Cacti Diagram showing before and after on an ESX host

In the above pic, a drastic decrease in utilization is shown after Week 32. This demonstrates the decrease in stress on the NIC that was being over utilized.

VI3 Value Added: System Deployment

One of the things I have to do as a Virtual Infrastructure Administrator is evangelize value added with virtual machines. What are my stakeholders going to get out of this? Where’s the value added? Why should we use a virtual machines instead of physical machines? Here’s my explanation of some advantages and value added using VMware Virtual Infrastructure 3 from a system deployment standpoint.

This post will show a “before and after” view of system deployment. We had a pretty ‘vanilla’ setup of ESX on a few servers, but with a few simple changes to your environment with partition alignment, sysprep, and adding a patching policy can really add to the value your virtual infrastructure.

Here’s a quick comparison of physical system deployment and virtual machine deployment (with an ‘out of the box’ ESX setup):

Physical System 

Virtual Machine 

Discover system requirements 

Discover system requirements 

Contact vendor and receive system quote 

Contact system stakeholders, provide a quote

Receive quote approval from system stakeholders 

Receive quote approval from system stakeholders 

Complete the procurement process

Template Deployment 

System ships to our location

Document Virtual Machine

Unbox, Rack, Cable, and Document server

Configure OS*

OS installation and hardware configuration

Software Installation

Software Installation 

 

* Windows guests had to be configured, patched, and joined to the domain

Add in a few enhancements (mainly sysprep for Windows guests) and here’s the same comparison, only showing relevant costs while omitting costs which are the same among alternatives:

Physical System 

Virtual Machine 

Complete the procurement process 

Template Deployment 

System ships to our location

 

Unbox, Rack, and Cable server

 

Hardware configuration

 

Physical system deployment compared to Virtual Machine Deployment, omitting sunk costs

Changes we have made regarding template redesign have further shortened the time to deploy a new virtual machine, thus shortening the turnaround time for system requests.

Template Redesign

Our Windows and Linux virtual machine templates received necessary patches, updates as well as several enhancements to further realize ROI in with VI3.

  • Higher I/O performance, reduced storage latency with partition alignment
  • Eliminate configuration time necessity for Windows guests with sysprep
  • Inclusion of a configured copy of VMware Tools in all templates
  • Creation of policy mandating quarterly template updates

Partition Alignment

The majority of our existing Virtual Machines running the Windows OS were deployed from templates that were created with improper partition alignment. This improper alignment resulted in less than optimal virtual disk performance. Now, new Windows templates have been created with proper partition alignment which have higher I/O performance and reduced latency [1] [2].

Sysprep

VMware recommends using sysprep to decrease Windows guest deployment times[3]. Windows guests required several configuration steps when previously deployed without sysprep.

  • Running NewSID to ensure no Active Directory collisions and renaming the computer
  • Large OS and Anti Virus updating overhead due to lack of regular updates
  • Joining the domain

All three of these previously mandatory steps have been completely replaced by sysprep answer files for Windows Server 2003 and Windows XP guests, significantly reducing the time spent in OS configuration. We have also followed most of the recommendations made by Leo Raikhman (Linux, Windows) aside from the thin disk provisioning.

Template Patching Policy

The Virtual Machine templates had not been subject to a policy of upgrades due to scarcity of staff resources. We have identified system deployment time as a priority and taken necessary actions to ensure deployment time remains manageable and efficient. We now have policy in place to quarterly patch our template guest OSes.

[1] Storage Block Alignment with VMware Virtual Infrastructure, NetApp; page 7
http://media.netapp.com/documents/tr_3593.pdf

[2] Recommendations for Aligning VMFS Partitions, VMware; page 8-10
http://www.vmware.com/pdf/esx3_partition_align.pdf

[3] “Basic System Administration”, VMware; page 345
http://www.vmware.com/pdf/vi3_301_201_admin_guide.pdf

Welcome

Welcome to Virtual Andy, a weblog that will discuss challenges with implementing virtualization. I’ll discuss the technical and business challenges that our organization has run into with virtualization. There will be a good mixture of ROI and Excel Spreadsheets with some technical diagrams and scripts.

I am relatively new to the virtual world, as I have only been working with VMware products since 05/2008. VCP4 in May 2010.

Guide to Backing up your Virtual Machines with VMware Consolidated Backup (VCB)

This has been a pretty fun project at work and I think it could benefit a lot of people, so I’ve decided to post a modified version of my documentation. It’s a DRAFT and not quite yet in production. I’m still working through a lot of the process. Here’s what I’ve learned…

There’s an awful lot of noise (and even worse, quite a bit of actual material) to wrap your mind around when doing a Disaster Recovery plan for virtual machines.

This guide is not a complete Disaster Recovery Plan. It’s intended to get you, the VMware administrator, to a point where you’re comfortable with passing the buck onto another backup solution (Tivoli, Veritas, USB Drive, whatever) so you can integrate it into your organization’s existing Disaster Recovery Plan.

The process, in a nutshell:

Windows, File-level

VCB talks to Virtual Center. Using a user that has limited permissions, creates a snapshot of each running Windows VM. VCB then mounts the .vmdk as a network share taking up no room on the VCB proxy. Tivoli, or any third-party backup software can then consume the files as if they were locally on the VCB proxy. When the backup process is finished, the snapshot is committed and the share removed.

All OSes, Image Level

VCB talks to Virtual Center. Using a user that has limited permissions, creates a snapshot of each running Windows VM. VCB then copies the .vmdk, .vmx, and all virtual machine files to the VCB proxy. Tivoli, or any third-party backup software can then consume the files. When Tivoli is finished, the snapshot is committed and the files are removed.

Step 0: Get VI-Toolkit for Windows PowerShell. It’s a powerful tool that we’ll use to make your backups change-friendly.

Step 1: Get a physical machine with san connectivity. It’s going to be your VCB proxy. I know… physical machine. Yuck. It’s worth it. We’re using an IBM x346. The machine doesn’t have to be amazing, but it does need to have a rather large amount of storage if you’re doing backups of Linux Guest VMs or full image-level backups.

Step 1b: Determine how much storage you will need for full VM backups. We chose to go with weekly fullVM backups of all VMs, nightly file-level backups of our Windows VMs, and nightly fullVM (image-level) backups of our Linux VMs. The beauty of VCB is the file-level backups do not consume any disk on the VCB Proxy.

From PowerShell… determine how much storage your non-Windows Disks use:

get-vm 
| where { $_.Guest.OSFullName -notmatch "Microsoft" -band $_.Guest.state -eq "Running" } 
| Get-HardDisk 
| measure-object -sum CapacityKB

Step 2: Install Windows 2003 Server. It’s a requirement for VCB. You’ll want to run these commands once it boots:

cmd
diskpart
automount disable
automount scrub

This is CRUCIAL. Not doing this will cause big problems for your VMFS Luns once they are made visible to the proxy.

(Search “Disabling Automatic Drive-Letter Assignment” for more information)

Step 3: Install .NET 2.0, VMware Consolidated Backup Framework, PowerShell, VI-Toolkit.

We need the PowerShell & VI-Toolkit to make our VCB proxy change friendly.

Since we’re being specific with the types of guests we’re backing up and the types of backups we’re running on those specific guests, we have two choices:

  1. Maintain a set of Windows VMs in a script and Maintain a set of Linux VMs in a script. Hope we remember to add/remove these VMs when a change occurs.
  2. Automate.

Step 4: From VirtualCenter, Create a Role called VCBuser and a user for VCB.

Here are the recommended permissions for the vcb role:

  • Virtual Machine
    • Configuration
      • Disk Lease
    • State
      • Create Snapshot
      • Remove Snapshot
    • Provisioning
      • Allow Virtual Machine Download
      • Allow Read-only Disk Access

It is crucial to add this user at the Hosts & Clusters level, otherwise you will run into permissions errors that are difficult to track down.

Step 5: Before taking backups, generate some scripts.

A lot of the enterprise-grade software has options for disk quiescence which basically means preparing the machine to take for a good, usable backup.

In this case, preparing the machine for good & usable backups means mounting our virtual machines to the VCB proxy.

Before a backup can take place:

  • Mount / Unmount scripts must be generated
  • Mount scripts must be run.

After the backup, you need to run the appropriate unmount scripts.

Here’s some powershell code that will create four .bat files:

  • file_level_mount_script.bat
  • file_level_unmount_script.bat
  • image_level_mount_script.bat
  • image_level_unmount_script.bat

Update – 9/05/2008: Added get-vc to grab a connection to VirtualCenter.

$vcbuser = "vcbuser";
$vcbpass = "password";
$vc = "virtual_center";
$mount_root = "c:\backup\"; # include trailing \...

get-vc -Server $vc -User $vcbuser -Password $vcbpass

$file_level_mount_script_name = "C:\file_level_mount_script.bat";
$file_level_unmount_script_name = "C:\file_level_unmount_script.bat";
$image_level_mount_script_name = "C:\image_level_mount_script.bat";
$image_level_unmount_script_name = "C:\image_level_unmount_script.bat";



# Begin Windows VM File Level Backup MOUNT
$file_level_mount_script ="cd `"\Program Files\VMware\VMware Consolidated Backup Framework\`""; 
get-vm  
| where { $_.Guest.OSFullName -match "Windows" -band $_.Guest.state -eq "Running" } 
| % { $file_level_mount_script += "`nvcbmounter.exe -h " + $vc + " -u " + $vcbuser + " -p " + $vcbpass + " -r " + $mount_root + "" + $_.Guest.Hostname + " -t `"file`" -m san -a ipaddr:" + $_.Guest.Hostname };
echo $file_level_mount_script | out-File $file_level_mount_script_name
# End Windows VM File Level Backup MOUNT

# Begin Windows VM File Level Backup UNMOUNT
$file_level_unmount_script ="cd `"\Program Files\VMware\VMware Consolidated Backup Framework\`""; 
get-vm  
| where { $_.Guest.OSFullName -match "Windows" -band $_.Guest.state -eq "Running" } 
| % { $file_level_unmount_script += "`nvcbmounter.exe -h " + $vc + " -u " + $vcbuser + " -p " + $vcbpass + " -U " + $mount_root + "" + $_.Guest.Hostname }; 
echo $file_level_unmount_script | out-File $file_level_unmount_script_name
# End Windows VM File Level Backup UNMOUNT
 
# Begin Full Image Level Mount All VMs
$image_level_mount_script ="cd `"\Program Files\VMware\VMware Consolidated Backup Framework\`""; 
get-vm  
| where { $_.Guest.state -eq "Running" } 
| % { $image_level_mount_script += "`nvcbmounter.exe -h " + $vc + " -u " + $vcbuser + " -p " + $vcbpass + " -r " + $mount_root + "" + $_.Guest.Hostname + " -t `"fullvm`" -m san -a ipaddr:" + $_.Guest.Hostname }; 
echo $image_level_mount_script | out-File $image_level_mount_script_name
# End Full Image Level Mount All VMs
 
# Begin Full Image Level UNMOUNT All VMs
$image_level_unmount_script ="cd `"\Program Files\VMware\VMware Consolidated Backup Framework\`""; 
get-vm  
| where { $_.Guest.state -eq "Running" } 
| % { $image_level_unmount_script += "`nvcbmounter.exe -h " + $vc + " -u " + $vcbuser + " -p " + $vcbpass + " -U " + $mount_root + "" + $_.Guest.Hostname }; 
echo $image_level_unmount_script | out-File $image_level_unmount_script_name
# End Full Image Level UNMOUNT All VMs

Find total disk usage of snapshots on a given LUN

I was about to delete a ton (more on that later) of VMware ESX 3.5 snapshots in our maintenance window.

After some quick Googling, I found Delete all Snapshots which made me hold off on it until I had some more information about our snapshots. This number will just increase daily until they’re committed.

So in order to plead a case with our SAN admin, I’ve got to have some information. How much space is being taken up on our LUN with regards to snapshots?

bash to the rescue!

[root@host LUN] ls -laR | grep delta | awk '{ SUM += $5} END { print SUM/1024/1024 }'

(Watch out for apostrophes etc if pasting this.)

Win.