Written by J. Moellenkamp on
Reading time: 7 minutes
General
How to show off live migration with a SPARC system?
Yesterday i had the opportunity to show Oracle VM for SPARC in front of customers in action. Not a single slide was used … Everything was live ;) . The following entry shows what i essentially did in this demo. Perhaps long time users of LDOMS or Oracle VM for SPARC (as they are called today) have already seen for this, however that wasn’t the planned audience of this walkthrough.;) In this example i’ve configured the control domain, one guest domain, installed it with Solaris 11 and migrated it live (without service interruption) from one system to another)
Okay, i started with two unconfigured (okay, to be exact … deconfigured systems) system of the type SPARC T3-4. So i had plenty resources to play. The first system was node1 listening on 10.128.0.72, the second system was node2 listening on 10.128.0.73.
Just to be sure, i checked the configuration.
There was just a single logical domain with all resources (512 virtual CPUs and 256 GB memory were assigned to it. The situation on the second node was the same. No wonder. Same HW config, same SW config.
Ensure that you have enabled the vntsd daemon on both systems
Okay, basics were the same, now i had to start the basic config. Important is those single large domains will act as so called control domains, however they will significantly smaller for that task. The already running Solaris 10 was kept unharmed and got the OS of the control domain.
First step was to configure the virtual console server:
With this command you configure a console server listening on ports 5000 to 5100 named primary-vcc0 in the domain primary. Okay, the next step was to configure the so called virtual disk server. As long as you don’t configure any hardware directly into a domain like a networking card for for iSCSI or a HBA for storage access, the virtual disk server is the daemon that provides the service of storage devices to all guest domains.
With this command we have configured a virtual disk server called primary-vds0 in the domain vds. The next step is the configuration of the networking. For this task we configure a virtual switch.
The virtual switch called primary-vsw0 is running in the domain primary and it’s connecting into the real world via the device igb0. When you want to check all the services you have just configured, you can do this with a single command.
At the moment this primary domain is using all the resources. In order to be able to configure some guests, we have to free some room. So at first we reduce the number of assigned crypto units. I just want to give them one.
In the next step we assign 8 processor to the domain.
Okay, let’s check the current configuration.
Okay, just 8 virtual cpus, hpowever the domain still occupied all the memory in the system. We have to reduce that. Technically it’s possible to do this on the running system but getting a running logical domain from 256 GB to 8 GB is quite some work, so most often it is just faster to put the domain in the deferred configuration mode, do the configuration and reboot the system as in this moment nothing runs on the system. When doing deferred reconfiguration, the config change is accepted but it will be executed with the next reboot:
Now we set the memory of the domain primary to 8 GB
Saving the config to the ILOM, and rebooting the system.
Okay, while the first system is rebooting, we just repeat the same configuration steps on the second system:
We now check the config on both systems. On the first system:
On the second system.
Okay … i have to explain a little bit … 10.10.1.37 is a S7000 filer i’ve used for central storage. In the directory /ldoms/isos i’ve put a iso of the solaris 11 11/11 text install image.
As i want to install the ldom i will create later on, i add this iso to the virtual disk server as a device:
Okay, i want to demo live migration in this walkthrough, so i need some shared storage. It’s obvious why i need shared storage: It’ makes no sense to migrate a logical domain to a system that hasn’t access to the same disk devices. So i configured my S7000 filer to offer some LUNs via iSCSI. However i have to configure the primary domain in order to actually use this LUNs. However this is pretty easy. At first we tell the iSCSI initator of Solaris 10 that there are disk to find behind 10.10.1.37
Now we tell Solaris to discover the LUNs behing this IP.
And now we poplulate the /dev tree with the nescessary nodes.
Okay, repeat on the second system.
Okay, let’s have a look what the system has found. From the configuration in the filer i knew that there must be something like 600144F0C56DC0FB00004F586FD60004 in the disk id. As the disk is unlabeled at that time the format command will offer to you do this labeling. Do it … you need a labeled disk.
Okay, now check the availability of disk disk on the other server.
Check … is there. The next step is the last one in this tour we have to execute on both systems. With this command we assign the disk /dev/dsk/c0t600144F0C56DC0FB00004F586FD60004d0s2 on both nodes as lmtest001iscsibootdisk to the virtual disk server called primary-vds0
Okay, now we configure our first guest domain.
At first we just create the domain.
Now we add 8 virtual CPUs to the domain.
Of course a domain needs memory, so i give it 16 GB.
Now i’m creating a networking interface for the domain lmtest001 connected to the virtual switch primary-vsw0 and naming it vnet1.
Okay, as my iscsi disk is totally empty, i have to provide an installation image ( i could do this via jumpstart or AI, however that would be out of scope of this short article). So i assign the virtual disk sol11iso on the virtual disk server primary-vds0 (remember, we configured them earlier) to lmtest001. To the domain it’s named vdisk_iso.
Now i have to assign the iscsi boot disk to the domain. The command is quite similar.
The next stop is to declare the bootdevice and to tell the system to boot automatically from it.
However, the domain is still inactive and no resources have been binded to the domain
So we bind the resources with a single command:
When you look up the status again, you see a state transition. The domain isn’t “inactive” any longer, it’s now bound.
Now it’s time to start up the domain.
When you look back into the last output of ldm ls you will see the 5000 in the column “CONS” (short for console) for lmtest001. This 5000 is now important to get access to the console of the lmtest001 domain.
As you see, there is a boot prompt like with a native SPARC machine. As there is no operating system on the device we’ve called bootdisk, the system doesn’t come up but stays in that prompt.
Now let’s boot from the iso image:
Okay, at first the system comes up from the ISO as you see
Okay, this will now take a while. I won’t write about it. It’s a standard Solaris 11 install. You know the drill.
After the reboot initiated in the installation procedure, the systems comes up with an installed OS. As you will recognize by the string of the boot device, that you now have booted from the iscsi boot disk.
Okay, let’s play a little bit with the domain. Login into the shell of the system. When you execute an prtdiag, you will see 16 GB of memory and 8 virtual CPUs.
Okay, let’s assume we’ve changed our mind and want a domain with 8 additional virtual CPUs. You can do this while the domain is running:
When you do another prtdiag in the still running domain, you will see 16 virtual CPUs.
Okay, 8 additional 8G may be a nice idea. So let’s add them to the running domain.
Do another prtdiag, an we see 24 gigs of memory.
However, we aren’t really in decision mood today and think that our first config was nice and revert to the old values. So we remove 8 gigs of memory again from the domain lmtest001
And again our domain has just 16 GB.
Now we have just to remove the 8 additional vcpus.
Okay, a last time we will execute prtdiag.
Again, back to 8 virtual CPUs in the domain.
Okay, but now back to our demonstration of live migation. I would like to demonstrate the live migration with some network traffic, so i need an ip address. So i configure one one the vnet0 interface card i’ve created when i was configuring the domain earlier. Log into the domain as root or assume a role that allows you to configure networking.
Just a short test from my local workstation. Just a short remark … the times are that bad, because my servers were in Scotland and i was in the Düsseldorf FTL lounge connected via VPN over an UTMS line …
Okay, let’s kick of the live migation. With this command i order the logical domain manager to migrate the domain lmtest001 to the server running a control domain on 10.128.0.73. The password was in my case the root password of that control domain.
This will take a while, however you will just get back your prompt in a very unspectacular way. You will get back the prompt as soon as the migration has completeted
What has in the meantime happened?
Well a ping has been lost. However what’s more interesting, there isn’t a domain lmtest001 on my server.