SDN Development Environment

Recently, I began a deep dive into more SDN and OpenFlow. Overall I was very happy with the process and quality of the material out there for newcomers.

However, I noticed a gap when I hit my first stumbling block. I set up a mininet instance, noticed it was running Open vSwitch (OVS) v2.0. I needed a newer version of OVS, and turfed the mininet instance while the upgrading OVS. It quickly became apparent that I needed a repeatable development environment setup.

I created ansible-sdn-dev to help out with this problem.

ansible-sdn-dev includes Ansible roles to build, install and configure these applications:

ansible-sdn-dev also includes a Vagrantfile so you can clone the repository, issue vagrant up and start hacking!

Upgrading Open vSwitch

Operating Open vSwitch brings a new set of challenges.

One of those challenges is managing Open vSwitch itself and making sure you’re up to date with performance and stability fixes. For example, in late 2013 there were significant performance improvements with the release of 1.11 (flow wildcarding!) and in the 2.x series there are even more improvements coming.

This means everyone running those old versions of OVS (I’m looking at you, <=1.6) should upgrade and get these huge performance gains.

There are a few things to be aware of when upgrading OVS:

  1. Reloading the kernel module is a data plane impacting event. It’s minimal. Most won’t notice, and the ones that do only see a quick blip. The duration of the interruption is a function of the number of ports and number of flows before the upgrade.
  2. Along those lines, if you orchestrate OVS kernel module reloads with parallel-ssh or Ansible or really any other tool, be mindful of the connection timeouts. All traffic on the host will be momentarily dropped, including your SSH connection! Set your SSH timeouts appropriately or bad things happen!
  3. Pay very close attention to kernel upgrades and OVS kernel module upgrades. Failure to do so could mean your host networking does not survive a reboot!
  4. Some OVS related changes you’ve made to objects OVS manages outside of OVS/OVSdb, e.g., manual setup of tc buckets will be destroyed.
  5. If you use XenServer, by upgrading OVS beyond what’s delivered from Citrix directly, you’re likely unsupported.

Here is a rough outline of the OVS upgrade process for an individual hypervisor:

  • Obtain Open vSwitch packages
  • Install Open vSwitch userspace components, kernel module(s) (see #3 and “Where things can really go awry”)
  • Load new Open vSwitch kernel module (/etc/init.d/openvswitch force-kmod-reload)
  • Simplified Ansible Playbook: https://gist.github.com/andyhky/9983421

The INSTALL file provides more detailed upgrade instructions. In the old days, upgrading Open vSwitch meant you had to either reboot your host or rebuild all of your flows because of the kernel module reload. After the introduction of the kernel module reloads, the upgrade process is more durable and less impacting.

Where things can really go awry

If your OS has a new kernel pending, e.g., after a XenServer service pack, you will want to install the packages for both your running kernel module and the one which will be running after reboot. Failing to do so can result in losing connectivity to your machine.

hoserville

It is not a guaranteed loss of networking when the Open vSwitch kernel module doesn’t match the xen kernel module, but it is a best practice to ensure they are in lock-step. The cases I’ve seen happen are usually significant version changes, e.g., 1.6 -> 1.11.

You can check if you’re likely to have a problem by running this code (XenServer only, apologies for quick & dirty bash):

#!/usr/bin/env bash
RUNNING_XEN_KERNEL=`uname -r | sed s/xen//`
PENDING_XEN_KERNEL=`readlink /boot/vmlinuz-2.6-xen  | sed s/xen// | sed s/vmlinuz-//`
OVS_BUILD=`/etc/init.d/openvswitch version | grep ovs-vswitchd | awk '{print $NF}'`
rpm -q openvswitch-modules-xen-$RUNNING_XEN_KERNEL-$OVS_BUILD > /dev/null
if [[ $? == 0 ]]
then
    echo "Current kernel and OVS modules match"
else
    CURRENT_MISMATCH=1
    echo "Current kernel and OVS modules do not match"
fi

rpm -q openvswitch-modules-xen-$PENDING_XEN_KERNEL-$OVS_BUILD > /dev/null
if [[ $? == 0 ]]
then
    echo "Pending kernel and OVS modules match"
else
    PENDING_MISMATCH=1
    echo "Pending kernel and OVS will not match after reboot. This can cause system instability."
fi

if [[ $CURRENT_MISMATCH == 1 || $PENDING_MISMATCH == 1 ]]
then
    exit 1
fi

Luckily, this can be rolled back. Access the host via DRAC/iLO and roll back the vmlinuz-2.6-xen symlink in /boot to one that matches your installed openvswitch-modules RPM. I made a quick and dirty bash script which can roll back, but it won’t be too useful unless you put the script on the server beforehand. Here it is (again, XenServer only):

#!/usr/bin/env bash
# Not guaranteed to work. YMMV and all that.
OVS_KERNEL_MODULES=`rpm -qa 'openvswitch-modules-xen*' | sed s/openvswitch-modules-xen-// | cut -d "-" -f1,2;`
XEN_KERNELS=`find /boot -name "vmlinuz*xen" \! -type l -exec ls -ld {} + | awk '{print $NF}'  | cut -d "-" -f2,3 | sed s/xen//`
COMMON_KERNEL_VERSION=`echo $XEN_KERNELS $OVS_KERNEL_MODULES | tr " " "\n"  | sort | uniq -d`
stat /boot/vmlinuz-${COMMON_KERNEL_VERSION}xen > /dev/null
if [[ $? == 0 ]]
then
    rm /boot/vmlinuz-2.6-xen
    ln -s /boot/vmlinuz-${COMMON_KERNEL_VERSION}xen /boot/vmlinuz-2.6-xen
else
    echo "Unable to find kernel version to roll back to! :(:(:(:("
fi

Network wiring with XenServer and Open vSwitch

In the physical world when you power on a server it’s already cabled (hopefully).

With VMs things are a bit different. Here’s the sequence of events when a VM is started in Nova and what happens on XenServer to wire it up with Open vSwitch.

VM_start

  1. nova-compute starts the VM via XenAPI
  2. XenAPI VM.start creates a domain and creates the VM’s vifs on the hypervisor
  3. The Linux user device manager manages receives this event, and scripts within /etc/udev/rules.d are fired in lexical order
  4. Xen’s vif plug script is fired, which at a minimum creates a port on the relevant virtual switch
    • Newer versions (XS 6.1+) of this plug script also have a setup-vif-rules script which creates several entries in the OpenFlow table (just grabbed from the code comments):
      • Allow DHCP traffic (outgoing UDP on port 67)
      • Filter ARP requests
      • Filter ARP responses
      • Allow traffic from specified ipv4 addresses
      • Neighbour solicitation
      • Neighbour advertisement
      • Allow traffic from specified ipv6 addresses
      • Drop all other neighbour discovery
      • Drop other specific ICMPv6 types
      • Router advertisement
      • Redirect gateway
      • Mobile prefix solicitation
      • Mobile prefix advertisement
      • Multicast router advertisement
      • Multicast router solicitation
      • Multicast router termination
      • Drop everything else
  5. Creation of the port on the virtual switch also adds entries into OVSDB, the database which backs Open vSwitch.
  6. ovs-xapi-sync, which starts on XenAPI/Open vSwitch startup has a local copy of the system’s state in memory. It checks for changes in Bridge/Interface tables, and pulls in XenServer specific data to other columns in those tables.
  7. On many events within OVSDB, including create/update of tables touched in these OVSDB operations, the OVS controller is notified via JSON RPC. Thanks Scott Lowe for clarification on this part.

After all of that happens, the VM boots the guest OS sets up its network stack.

Deep Dive: HTB Rate Limiting (QoS) with Open vSwitch and XenServer

DISCLAIMER: I’m still getting my feet wet with Open vSwitch. This post is just a cleaned up version of my scratchpad.

Open vSwitch has a few ways of providing rate limiting – this deep dive will go into the internals of reverse engineering an existing virtual interface’s egress rate limits applied with tc-htb. Hierarchy Token Bucket (htb) is a standard linux packet scheduling implementation. More reading on HTB can be done on the author’s site – I found the implementation and theory pretty interesting.

This is current as of Open vSwitch 1.9.

The information needed to retrieve htb rate limits mostly lives in the ovsdb:

vswitch-schema
Open vSwitch Schema (http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf)

Things can get complex depending on how your vifs plug into your physical interfaces. In my case, OpenStack Quantum requires an integration bridge which I’ve attempted to diagram:

OpenvSwitchQueues

  1. On instance boot, vifs are plugged into xapi0. xapi0’s controller nodes pull down information including flows and logical queues.
  2. The flows pulled from (1) set the destination queue on all traffic for the source IP address for the interface.
  3. The queue which the traffic gets sent to goes to a linux-htb ring where the packets are scheduled.

Let’s take a look at an example. I want to retrieve the rate limit according to the hypervisor for vif2.1 which connects to xapi0, xenbr1, and the physical interface eth1. The IP address is 10.0.0.37.

Steps:

  • Find the QoS used by the physical interface:
    # ovs-vsctl find Port name=eth1 | grep qos
    qos : 678567ed-9f71-432b-99a2-2f28efced79c

  • Determine which queue is being used for your virtual interface. The value after set_queue is our queue_id.
    # ovs-ofctl dump-flows xapi0 | grep 10.0.0.37 | grep "set_queue"
    ... ,nw_src=10.0.0.37 actions=set_queue:13947, ...
  • List the QoS from the first step and its type. NOTE: This command outputs every single OpenFlow queue_id/OVS Queue UUID for the physical interface. The queue_id from the previous step will be the key we’re interested in and the value is our Queue’s UUID
    # ovs-vsctl list Qos 678567ed-9f71-432b-99a2-2f28efced79c | egrep 'queues|type'
    queues : { ... 13947=787b609b-417c-459f-b9df-9fb5b362e815,... }
    type : linux-htb
  • Use the Queue UUID from the previous step to list the Queue:
    # ovs-vsctl list Queue 787b609b-417c-459f-b9df-9fb5b362e815 | grep other_config
    other_config : {... max-rate="614400000" ...}
  • In order to tie it back to tc-htb we have to convert the OpenFlow queue_id+1 to hexadecimal (367c). I think it’s happening here in the OVS code, but I’d love to have a definitive answer.
    # tc -s -d class show dev eth1 | grep 367c | grep ceil # Queue ID + 1 in Hex
    class htb 1:367c ... ceil 614400Kbit