Less known Solaris features - IP Multipathing (Part 2): Foundations 1

The fundamental concepts of both implementations are pretty much the same, thus I will start with a short introduction into the nomenclature of IPMP. I scarified some precision to make it easier to understand, but for our purposes the precision is more than sufficient.

  • IP Link:

    An IP link is the logical connection to an IP network. Think of a router, that has two legs ... one to the Internet and one to the inside network. Such a router has two IP links (even when the router has multiple connections to both networks).

  • Physical Interface:

    The physical interface is the foundation of all networking, but it isn't really the basic entity in IPMP. The basic entity is the IP interface that is bound to a physical interface. Or to simply it: It's the IP address, not the cable that is managed by IPMP. Of course, you need physical interfaces. At best two or more of them, because with one path you can't do multipathing. (Albeit I can think of remote cases were IPMP with one path can be useful)

  • IPMP Group:

    Now you have physical interfaces on some network interface cards into several IP Links. How do you tell the system, that certain interfaces are redundant connections into the same IP link?

    The concept of the IPMP group solves this problem. You put all interfaces into a IPMP group that connect into an IP link into a IPMP group. All interfaces in this group are considered as redundancy to each other, so the IPMP can use them to receive and transmit the traffic out of this network.

  • Failure:

    Okay, you may think, this one is so obvious you don't have to talk about it. Well, not really. This is one of the most frequent errors in HA. Buying or using a HA product without thinking about the failure modes that are addressed by the mechanism.

    You have to think about what failures are in-scope of IPMP and which one are out-of-scope. IPMP is called IP Multipathing for a reason: It's a tool for IP, it isn't meant for other protocols. So it protects the availability of IP services against failures. But for this task it uses information of other layers of the stack, for example the information if there is a link on the physical connection. Primarily it uses this information to speed up failover. There is no need to check upper layers if you already know that lower layers went away.

  • Failure Detection:

    You do IPMP for a reason. You want protect your system from loosing its network connection in the case a networking component fails. One of the most important component of an automatic availability protection mechanism is it's capability to detect the need of doing something like switching the IP configuration to another physical interface. Without such a mechanism it's just an easier interface to switch such configuration manually.

    That said, IPMP provides two mechanisms to detect failures:
    • Link based:

      As the name suggests, the link based failure detection checks if the physical interface has an active and operational link to the physical network. When a physical interface looses its link - for example by problems with the cabling or a switch powered down - IPMP considers the interface as failed and starts to failover to a operational link.

      The monitoring mechanism for the link state is quite simple. It's done by monitoring the RUNNING flag of an IP interface. When you look at a functional interface with ifconfig you will recognize this flag:
      e1000g0: flags=209040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,CoS> mtu 1500 index 11
              inet netmask ffffff00 broadcast
              groupname production0
              ether 8:0:27:11:34:43
      When you unplug the cable, the RUNNING flag is missing:
      e1000g0: flags=219040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED,CoS> mtu 1500 index 11
              inet netmask ffffff00 broadcast
              groupname production0
              ether 8:0:27:11:34:43
      This method of monitoring the interfaces mandates an capability of the networking card driver to use link-based IPMP. They have to set and unset the RUNNING flag based on the link state.( hme, eri, ce, ge, bge, qfe, dmfe, e1000g, ixgb, nge, nxge, rge, xge definitely work, ask the provider of the driver for other cards)

    • Probe based:

      The probe base mechanism itself is independent from the hardware. It checks the IP layer on the IP layer. The basic idea is: When you are able to communicate via IP to other systems, it's safe to assume that the IP layer is there.

      The probing itself is a simple mechanism. The probe based failure detection sends ICMP messages to a number of systems. As long the other systems react on those ICMP packets, a link is considered as ok. When those other systems don't react in a certain time, the link is considered as failed and the failover takes place
    • I will talk about the advantages and disadvantages of both in a later section.

  • Data Address:

    In IPMP terminology the data addresses are the addresses that are really used for communication. An IPMP group be used for multiple data addresses. However, all data addresses have to be in the same IP link.

  • Test Address:

    When you send ICMP messages to detect a failure, you need a sourcing IP address for those messages. So each physical interface needs an address that is just used for testing purposes. This address is called test address.

  • Repair and Repair detection:

    When you talk about failures, you have to talk about repairs as well. When an interface is functional again - for example by using another cable or a different switch - you have to detect this situation and reactivate the interface. Without repairs and the detection of repairs you would run out of interfaces pretty soon. The repair detection is just the other side of the failure dection, just that you check for probes getting through or a link that's getting up again.

  • Target systems:

    A target system is the matching opposite part of the test address. When you want to check the availability of a network connection via sending probe messages via ICMP, you need a source as well as a target for this ICMP communication.

    In IPMP speak a target system is a system that is used to test the availability of an IP interface. The IPMP mechanism tries to ping the target system in order to evaluate if the network interface is still fully functional. This is done for each interface by choosing the test address as the source address of the IPMP request.

    Target systems are chosen by the IPMP mechanism. The mechanism to do so is quite simple:
    • Routers in an ip link are chosen as target systems automatically.
    • When there are no routers connected to the IP-link, the IPMP mechanism tries to find hosts in the neighborhood. A ping is sent to the "all hosts"-multicast address address is specified by RFC 1112 http://tools.ietf.org/html/rfc1112)
      jmoekamp@hivemind:~$ ping -s
      PING 56 data bytes
      64 bytes from hivemind-prod ( icmp_seq=0. time=0.052 ms
      64 bytes from icmp_seq=0. time=0.284 ms
      64 bytes from icmp_seq=0. time=20.198 ms
      The first few systems replying to this ping are chosen as target systems.
    • The automatic mechanism doesn't always choose the most optimal system for this check, thus you can specify them in the case you think a manual configuration ensures that the target system really represent a set of system, whose availability represents a check the availability of the network. Manually defined hosts have always precedence over routers, so manually defining such systems can reduce the ICMP load on your router. However, in most cases the automatic mechanism yields reasonable and sufficient results.
    </ul> </noautobr>