Ganeti

Overview of a Ganeti setup

This document is now maintained in a new location. This remains, for now, for hysterical raisins.

NOTE This document is currently being updated to a newer ganeti installation process on Debian Jessie, with a single node to begin with and then moving to adding an additional node, and drbd storage. It is not advised to follow this until this transition is complete.

introduction

Ganeti is software designed to facilitate the management of virtual machines (KVM or Xen). It helps you move virtual machine instances from one node to another, create an instance with DRBD replication on another node and do the live migration from one to another, etc.

Ganeti does provide HA features, but it is not a complete HA solution. It is more like Amazon EC2, you can create an instance from a template, perform backups and restores. It operates better than EC2 in that you can move one to another node without downtime, but there is no automatic failover system included.

There is a watcher that runs out of cron, which will look to see if an instance is down or in error state, and if so, it will attempt to start it.

This document must be read together with the Ganeti installation documentation. This describes how to setup Ganeti 2.12.4~bpo8+1on Debian Jessie for KVM. This setup describes setting up Ganeti on a single machine to start, and getting an instance running there. Then we’ll move into setup of a second node, and drbd between them.

It is important to have a homogeneous setup, and that the machines are setup the same way. The base setup and Ganeti must be performed on all the nodes. Each node can become the master, so it needs to be setup as if it were one.

terminology

  • node, server: The non virtualized server.
  • instance, or VM: The virtualized operating system.

prepare machines for ganeti

This is done for nodes, or for cluster masters.

setup bridged networking

/etc/network/interfaces should look like this, with the IPs changed of course:

auto br-public
iface br-public inet static
	 address 198.252.153.77
	 netmask 255.255.255.0
	 gateway 198.252.153.1
         bridge_ports eth1
         bridge_stp off
         bridge_fd 0
         bridge_maxwait 0

auto br-private
iface br-private inet static
         address 10.0.1.68
         netmask 255.255.255.0
         bridge_ports eth0
         bridge_stp off
         bridge_fd 0
         bridge_maxwait 0

setup hosts/nodename

Make sure that /etc/hostname has something like this, it should be the private network, and it should be the FQDN:

weaver-pn.riseup.net

Setup /etc/hosts like this, so it has the cluster master’s private network address listed, it will get modified by ganeti, make sure that the hosts file is setup to have the private name of the machine first, and then the short name, then the private name of the cluster master:

10.0.1.81	barbet-pn.riseup.net	barbet-pn
10.0.1.83	monal-pn.riseup.net monal-pn

basic ssh setup

Setup site_hiera/files/node-pn.yaml to have this:

sshd::authorized_keys_file: '%h/.ssh/authorized_keys /etc/ssh/authorized_keys/%u'

Reboot and make sure networking is working

Setup puppet

Add to the node something like this:

class { 'ganeti::node': with_drbd => false }

Only set with_drbd => false if the node is not going to have secondaries.

Add the kvm, or xen class, depending on what it is:

include kvm

or:

include xen

Initialize a new cluster

On the master, after puppet has run, you would run something like this for a KVM cluster:

gnt-cluster init --uid-pool=4000-4020 --nic-parameters link=br-private --node-parameters ssh_port=4422 --vg-name=ganetivg0 --master-netdev=br-private --enabled-hypervisors=kvm -H kvm:vhost_net=True,security_model=pool,migration_bandwidth=100,kvm_extra="-smbios type=0\,vendor=ganeti",vnc_bind_address=,kernel_path=,initrd_path=,serial_console=True,serial_speed=115200 iora-pn

Or like this for a single-node Xen cluster:

gnt-cluster init --uid-pool=4000-4020 --enabled-disk-templates=plain --nic-parameters link=br-private --vg-name=ganetivg0 --node-parameters=ssh_port=4422 --master-netdev=br-private --enabled-hypervisors=xen-pvm -H xen-pvm:xen_cmd=xl,kernel_path=/usr/lib/grub-xen/grub-x86_64-xen.bin,kernel_args='(hd0\,0)/boot/grub/menu.lst',initrd_path=,bootloader_path='',bootloader_args='',root_path='' passeri-pn

add nodes to the cluster

Temporarily set /etc/sshd_config to have:

PermitRootLogin yes
PasswordAuthentication yes

this is temporary for node addition, restart ssh to get this in place.

Then add the node from the master:

gnt-node add --verbose --secondary-ip 10.0.1.82 --master-capable=yes --vm-capable=yes --node-parameters=ssh_port=4422 thrush-pn.riseup.net

Now you should see both nodes in the cluster when you do this:

# gnt-node list
Node               DTotal DFree MTotal MNode MFree Pinst Sinst
tanager.riseup.net  37.3G  144M   7.8G  1.2G  7.0G     1     0
thrush.riseup.net   37.3G  144M   7.8G  339M  7.6G     0     1

security notes

The host will have its SSH host key replaced with the one of the cluster (which is the one the initial node had at the cluster creation)

A new public key will be added to root’s authorized_keys file, granting root access to all nodes of the cluster. The private part of the key is also distributed to all nodes. Old files are renamed.

As you can see, as soon as a node is joined, it becomes equal to all other nodes in the cluster, and the security of the cluster is determined by the weakest node.

verify the cluster

Make sure everything is setup right by running a cluster verification step:

root@tanager# gnt-cluster verify
Wed May  9 13:27:22 2012 * Verifying global settings
Wed May  9 13:27:22 2012 * Gathering data (2 nodes)
Wed May  9 13:27:23 2012 * Gathering disk information (2 nodes)
Wed May  9 13:27:23 2012 * Verifying node status
Wed May  9 13:27:23 2012 * Verifying instance status
Wed May  9 13:27:23 2012 * Verifying orphan volumes
Wed May  9 13:27:23 2012   - ERROR: node thrush.riseup.net: volume vg_thrush0/var is unknown
Wed May  9 13:27:23 2012   - ERROR: node thrush.riseup.net: volume vg_thrush0/root is unknown
Wed May  9 13:27:23 2012   - ERROR: node tanager.riseup.net: volume vg_tanager0/var is unknown
Wed May  9 13:27:23 2012   - ERROR: node tanager.riseup.net: volume vg_tanager0/root is unknown
Wed May  9 13:27:23 2012 * Verifying orphan instances
Wed May  9 13:27:23 2012 * Verifying N+1 Memory redundancy
Wed May  9 13:27:23 2012 * Other Notes
Wed May  9 13:27:23 2012 * Hooks Results

Everything is fine above, but ganeti is finding some of the host’s LVM volumes and complaining because it doesn’t know what they are. We can configure ganeti to ignore those:

adding instances

We now add an instance, we are creating a 37gig instance on the two nodes tanager and thrush, with 7.5G of memory, specifying its public and private bridge interfaces and IPs, and its name (iora.riseup.net):

weaver cluster example:

# gnt-instance add -d -t drbd -s 10G -o debootstrap+default -n roller-pn.riseup.net:rail-pn.riseup.net -B vcpu=1 -B memory=2G --net=0:ip=198.252.153.##,network=riseup_pub0 --net=1:ip=10.0.1.##,network=riseup_priv0 foo.riseup.net

martin cluster example:

gnt-instance add -d -t drbd -s 10G -o debootstrap+default -B vcpu=2 -B memory=2G --net=0:ip=198.252.153.xx,link=br-public --net=1:ip=10.0.1.xx,link=br-private xxxxx.riseup.net

While that is building you can look at ‘cat /proc/drdb’

If you are building without drbd:

gnt-instance add --no-ip-check --no-name-check -o debootstrap+default -s 20G -B memory=768M --net=0:ip=204.13.164.XXX,link=br-public foo

if you want to specify an OS version, use something like “debootstrap+jessie” for the -o option.

post-instance work

Once the instance has been created, it should be started and running. Sometimes this doesn’t work right, and you need to look at why.

Once it is running, connect to the console, using the password from the instance creation login as root. Once logged in, change the password and then note that in the right place.

setup the networking in the instance

Since you are there, setup /etc/network/interfaces.

Now you should be able to reboot the instance and get grub on the console, and networking working.

Security

Be sure to review the security section of the Ganeti documentation to understand how things are setup, what security considerations there are, and different changes you might want to do to increase the security of your cluster.

working with ganeti

getting information on the cluster

You can get instance specific information, or cluster-wide information as follows:

gnt-instance list
gnt-instance info iora
gnt-instance info --all
gnt-cluster info
gnt-node list
gnt-node info

modifying instances

gnt-instance modify -B vcpus=2 vmname
gnt-instance reboot vmname
gnt-instance modify -B memory=4g vmname
gnt-instance reboot vmname
gnt-instance grow-disk vmname 0 2g
gnt-instance reboot vmname
vmname# fdisk
gnt-instance reboot vmname
vmname# resize2fs /dev/vda1
gnt-instance info vmname |grep 'IP:'  # look for instance's IP address
gnt-instance modify --net 0:modify,ip=204.13.164.xx vmname  # change it
gnt-instance modify --net 0:modify,link=br-public vmname # change the link used
gnt-instance info vmname |grep 'IP:'  # confirm change
gnt-instance stop vmname  # full stop and start to take effect
gnt-instance start vmname

rebooting nodes

Before rebooting a node, you need to shutdown any instances that you haven’t migrated off. But you want them to come back up after reboot, so do this:

gnt-instance shutdown --no-remember --primary <node>

Then run shutdown on the node, it should not prompt you warning that you will take down instances, if it does then the above isn’t done yet. It will prompt you for the hostname (molly-guard).

After the node comes backup, on the cluster master run:

gnt-instance list -oname,pnode,snodes,oper_vcpus,oper_ram,disk.sizes,disk_usage,disk.count |grep <nodefoo>

to watch for the instances associate with that node to come back up. It might take a couple minutes for ganeti to realize the node is up and restart things.

balancing the cluster

You can use the hbal tool to balance the instances across the cluster.

print the list of moves to optimize the cluster, using the Luxi backend

hbal -L

same, but with a list of detailed commands that will be run

hbal -C -L

same, but avoid expensive disk moves (less optimal, but faster)

hbal -C -L --no-disk-moves

print AND execute the commands to balance the cluster

hbal -C -X -L

The hbal command can also ensure that particular instances never share the same node, to prevent single point of failure for redundant services. It can do this with the “—exclusion-tags” option, or be set to automatically exclude tags.

add/list/remove a tag on an instance(s):

gnt-instance add-tags piha fews
gnt-instance list-tags piha
gnt-instance remove-tags piha fews

add that tag to the cluster as an exclusion tag:

gnt-cluster add-tags htools:iextags:fews

list existing cluster exclusion tags:

gnt-cluster list-tags |grep htools:iextags

find what instances are using a tag:

gnt-cluster search-tags dns

We wrote a couple shell functions to make it easier to look at things
list tags and the instances that are using them:

gnt-listtags

list instances that have tags and the tags they have:

gnt-showtags

Current riseup tags (all in weaver cluster):

  • db: brevipes.riseup.net drake.riseup.net incana.riseup.net scaup.riseup.net
  • dbwrite: brevipes.riseup.net scaup.riseup.net
  • dns: dns1.riseup.net owl.riseup.net primary.riseup.net screech
  • fews: bell.riseup.net capuchin.riseup.net piha.riseup.net
  • inmx: mx1.riseup.net
  • k8: kube01.hexacab.org kube02.hexacab.org kubeadmin.hexacab.org
  • nestdb: drake.riseup.net scaup.riseup.net
  • sympadb: brevipes.riseup.net incana.riseup.net

moving things around by hand

If for some reason hbal doesn’t do what you want or you need to move things around for other reasons, here are a few commands that might be handy.

make an instance switch to using it’s secondary

gnt-instance migrate instancename

make all instances on a node switch to their secondaries
gnt-node migrate nodename

change an instance’s secondary

gnt-instance replace-disks -n nodename instancename

at times it might make sense to use both of the above to move things around. The replace-disks option does have a -p flag for changing the primary.

moving the master

If you need to work on the current master you can make a different node master instead. WARNING: But moving the master is risky, we’ve had cases where things get super confused and we end up with “split brain”. So we try to only move the master if the existing master is going to be down for a long time (and we’ll need to be able to do master type operations). It’s fine to just reboot the existing master for a new kernel or hardware changes, etc.

masternode# gnt-cluster verify  # confirm things are ok
othernode# gnt-cluster master-failover
othernode# gnt-cluster verify  # confirm things are ok

and then when you are done, do the same thing to move it back.

cloning an instance

You might, for various reasons, want to clone an instance. For example, maybe you want to try a relatively destructive operation but not do it on the live machine, like upgrading to a new debian version, or changing from apache to nginx, or moving to a new version of php and you want to work out the issues before upgrading the live site. To do that, you should do the following:

First you export the instance. Doing this will write the instance’s data to a destination node into the directory /var/lib/ganeti/export/instance. You probably do not have enough disk space there so make sure you have space there before you do this, or you will just fill up /var. One possibility is to carve out a LVM from an available VG, make a filesystem on it, and temporarily mount it on that directory, for example:

lvcreate -L 110G ganetivg0 -n export; mke2fs -t ext4 /dev/mapper/ganetivg0-export ; mkdir -p /var/lib/ganeti/export; mount /dev/mapper/ganetivg0-export /var/lib/ganeti/export

but if you are doing this temporarily, you will want to be sure to remove that when finished.

Now to export the instance, typically you should do this when the instance is stopped, because otherwise you will get inconsistencies, but if you are feeling brave, and you think the lvm snapshot mechanism will work fine, you can pass the

--noshutdown
option to the following, otherwise it will shutdown the instance. It is recommended that you allow the shutdown to happen.

Do this on the cluster master:

# gnt-backup export -n nodename instancename

NOTE:

  • the snapshots get created on the instance primary, so you need to make sure you have enough space there, or migrate the instance if the secondary does
  • if you stop the instance first, it doesn’t need to use lvm snapshots, it can just export from the lvols directly. Depending on what you are doing you might want this anyway to prevent data diverging.

Now that you have waited forever, you can see the results with:

# gnt-backup list
Node           Export
nodename  instancename

Also, have a look at what is in nodename:/var/lib/ganeti/export/instancename, you will see a uuid and a config.ini, that config.ini contains your instance’s parameters, which will be used on the import, or you can override.

Now to import the image, you reverse the process. You can specify different resource parameters, like the ones you use when you add instances, to override the ones that would otherwise be used to mirror the instance you exported. If you do not specify other parameters, then the same ones will be used that the original instance had.

# gnt-backup import --src-node nodename --src-dir /var/lib/ganeti/export/instancename -t drbd -s 100G -n primarynodename:secondarynodename --net 0:mac=generate --no-name-check --no-ip-check newinstancename

The above will pull from ‘nodename’, from the ‘/var/lib/ganeti/export/instancename’ directory the exported instance. It will create a new instance using disk type drbd, with a size of 100gb (overriding what might be in the config.ini of the exported instance), and it will place the new node called ‘newinstancename’ onto the primary and secondary node specified (like other ganeti commands, if you omit this, then the iallocator will attempt to find a good node to place the instance).

If there is more than one network device, you might need to specify that its MAC is also generated by passing a second parameter to the import for that NIC. If you do not specify the generate parameter to the mac, then it will re-use the existing one in the config and you may get this type of error:

MAC address aa:00:00:49:ff:44 already in use in cluster

Once that finishes, the new instance will not be started. You don’t want it to start right away, because you need to clean it up a little. You need to change its IP, make sure grub is properly loaded, and regenerate its ssh host keys:

# gnt-instance activate-disks newinstancename
primarynode:disk/0:/dev/drbdsomething

… go to the primary node…

primary node# kpartx -a /dev/drbdsomething
primary node# mount /dev/mapper/drbdsomethingp1 /mnt
primary node# mount -o bind /dev /mnt/dev; mount -t proc proc /mnt/proc
primary node# chroot /mnt
/mnt# <edit /etc/network/interfaces to have the right IP, it will have the IP of the cloned instance>
/mnt# rm /etc/ssh/ssh_host* ; ssh-keygen -A
/mnt# update-grub
/mnt# grub-install /dev/drbdsomething
/mnt# exit
primary node# umount /mnt/dev; umount /mnt/proc; umount /mnt; kpartx -d /dev/drbdsomething

…back on the cluster master….

# gnt-instance deactivate-disks newinstancename

Now make sure the IPs are properly allocated in ganeti:

# gnt-instance modify --net 0:modify,ip=xxx.xxx.xxx.xxx newinstancename

and then start the instance, and make sure it starts properly:

# gnt-instance start newinstancename; gnt-instance console newinstancename

Now you have a cloned instance, with a different IP.

The exported instance will still be on that system, in that /var/lib/ganeti/export/instancename directory, so you may want to remove that, and remove the LV that you created.

failure scenarios

We only have two nodes in our setup, so different failures require a little bit of a dancing that isn’t required when there are three nodes. This will require the cluster master be failed over to the secondary and we will need to skip voting due to the dual node setup. An ideal minimum deployment should be 3 machines… but 2 machines is doable (just requires some extra manual work for when one of the nodes fail).

Primary node fails

This failure scenario is when the primary node fails hard, such as hanging. This was simulated in our setup by yanking the power on the primary node.

In our situation the node ‘tanager’ is the primary, ‘thrush’ is the secondary, and ‘iora’ is the instance.

We can’t look at the cluster information yet because the commands typically need to be run on the primary (if you try to run them on the secondary, it will tell you that these commands need to be run on the primary), but the primary is down:

root@thrush# gnt-node info tanager
Failure: prerequisites not met for this operation:
This is not the master node, please connect to node 'tanager.riseup.net' and rerun the command
root@thrush# gnt-instance info iora
Failure: prerequisites not met for this operation:
This is not the master node, please connect to node 'tanager.riseup.net' and rerun the command

So in order to do operations, we need to get the master running on the secondary, and the only way it will do that is if we disable ‘voting’. Voting is typically how the nodes determine who is the master and where instances should be migrated to during failure, this is part of the ‘dance’ that we have to do due to the fact that we only have two nodes, it would not be necessary in a 3-node setup. So we start the master daemon on thrush, by disabling the voting:

root@thrush# EXTRA_MASTERD_ARGS=--no-voting  /etc/init.d/ganeti restart
Starting Ganeti cluster:ganeti-noded...done.
ganeti-masterd...The 'no voting' option has been selected.
This is dangerous, please confirm by typing uppercase 'yes': YES
done.
ganeti-rapi...done.
ganeti-confd...done.

Now we can look at the cluster and node configuration. Because tanager has failed, we cannot get any information about the instance:

root@thrush# gnt-instance info  iora
Failure: command execution error:
Error checking node tanager.riseup.net: Error 7: Failed connect to 198.252.153.67:1811; Operation now in progress

and the node list shows a problem:

root@thrush# gnt-node list
Node               DTotal DFree MTotal MNode MFree Pinst Sinst
tanager.riseup.net      ?     ?      ?     ?     ?     1     0
thrush.riseup.net   37.3G  144M   7.8G  378M  7.5G     0     1

So first thing we need to do is to set tanager offline, since it is down:

root@thrush# gnt-node modify -O yes tanager.riseup.net
Wed May  9 12:58:43 2012  - WARNING: Communication failure to node tanager.riseup.net: Error 7: Failed connect to 198.252.153.67:1811; Success
Modified node tanager.riseup.net
 - master_candidate -> False
 - offline -> True

Now we fail-over the instance so that it is running on thrush:

root@thrush# gnt-instance failover --ignore-consistency iora
Failover will happen to image iora. This requires a shutdown of the
instance. Continue?
y/[n]/?: y
Wed May  9 13:00:26 2012 * checking disk consistency between source and target
Wed May  9 13:00:26 2012 * shutting down instance on source node
Wed May  9 13:00:26 2012  - WARNING: Could not shutdown instance iora.riseup.net on node tanager.riseup.net. Proceeding anyway. Please make sure node tanager.riseup.net is down. Error details: Node is marked offline
Wed May  9 13:00:26 2012 * deactivating the instance's disks on source node
Wed May  9 13:00:26 2012  - WARNING: Could not shutdown block device disk/0 on node tanager.riseup.net: Node is marked offline
Wed May  9 13:00:26 2012 * activating the instance's disks on target node
Wed May  9 13:00:26 2012  - WARNING: Could not prepare block device disk/0 on node tanager.riseup.net (is_primary=False, pass=1): Node is marked offline
Wed May  9 13:00:27 2012 * starting the instance on the target node

Now we can see in the node info that tanager now has iora as a secondary, and thrush has it as a primary (before it was switched):

root@thrush# gnt-node info 
Node name: tanager.riseup.net
  primary ip: 198.252.153.67
  secondary ip: 10.0.1.84
  master candidate: False
  drained: False
  offline: True
  master_capable: True
  vm_capable: True
  primary for no instances
  secondary for instances:
    - iora.riseup.net
  node parameters:
    - oob_program: default (None)
Node name: thrush.riseup.net
  primary ip: 198.252.153.68
  secondary ip: 10.0.1.85
  master candidate: True
  drained: False
  offline: False
  master_capable: True
  vm_capable: True
  primary for instances:
    - iora.riseup.net
  secondary for no instances
  node parameters:
    - oob_program: default (None)

and if we look at the instance info for iora, we will see that the drbd disk is degraded:

root@thrush# gnt-instance info iora
Instance name: iora.riseup.net
UUID: 0da6a680-d148-4535-9e4a-55da7e0ba89a
Serial number: 8
Creation time: 2012-05-03 07:57:45
Modification time: 2012-05-09 13:00:26
State: configured to be up, actual state is up
  Nodes:
    - primary: thrush.riseup.net
    - secondaries: tanager.riseup.net
  Operating system: debootstrap+default
  Allocated network port: 11016
  Hypervisor: kvm
    - acpi: default (True)
    - boot_order: default (disk)
    - cdrom2_image_path: default ()
    - cdrom_disk_type: default ()
    - cdrom_image_path: default ()
    - disk_cache: default (default)
    - disk_type: default (paravirtual)
    - floppy_image_path: default ()
    - initrd_path: default (/boot/initrd-2.6-kvmU)
    - kernel_args: default (ro)
    - kernel_path: 
    - kvm_flag: default ()
    - mem_path: default ()
    - migration_downtime: default (30)
    - nic_type: default (paravirtual)
    - root_path: default (/dev/vda1)
    - security_domain: default ()
    - security_model: default (none)
    - serial_console: default (True)
    - usb_mouse: default ()
    - use_chroot: default (False)
    - use_localtime: default (False)
    - vhost_net: default (False)
    - vnc_bind_address: 
    - vnc_password_file: default ()
    - vnc_tls: default (False)
    - vnc_x509_path: default ()
    - vnc_x509_verify: default (False)
  Hardware:
    - VCPUs: 1
    - memory: 7680MiB
    - NICs:
      - nic/0: MAC: aa:00:00:0d:5a:23, IP: 198.252.153.70, mode: bridged, link: kvm-br-public
      - nic/1: MAC: aa:00:00:a0:fe:e6, IP: 10.0.1.92, mode: bridged, link: kvm-br-private
  Disk template: drbd
  Disks:
    - disk/0: drbd8, size 37.0G
      access mode: rw
      nodeA:       tanager.riseup.net, minor=0
      nodeB:       thrush.riseup.net, minor=0
      port:        11017
      auth key:    cbc9b4893a857f9f893005c66e4b92b4070a5ddf
      on primary:  /dev/drbd0 (147:0) in sync, status *DEGRADED*
      child devices:
        - child 0: lvm, size 37.0G
          logical_id: vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_data
          on primary: /dev/vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_data (254:5)
        - child 1: lvm, size 128M
          logical_id: vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_meta
          on primary: /dev/vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_meta (254:6)

The important part to notice is the “status DEGRADED” above, the node is not currently running on redundant disks.

If this were a three-node cluster we would be able to ‘evacuate’ the disks away from the broken node, ensuring your instances are redundant.

The instance is running now, and you can connect to it via the console.

Now we bring tanager back online, when it comes up it will try to start ganeti, and it will fail:

Starting Ganeti cluster:ganeti-noded...done.
ganeti-masterd...CRITICAL:root:It seems we are not the master (top-voted node is thrush.riseup.net with 1 out of 1 votes)
failed (exit code 1).
ganeti-rapi...done.
ganeti-confd...done.

Right now both nodes think that they are the master, but that is not true, so we need to reconcile the configs between the nodes, this can be done on thrush:

root@thrush# gnt-cluster redist-conf

Then we need to add tanager back as online:

root@thrush# gnt-node modify -O no tanager
Wed May  9 13:24:52 2012  - INFO: Auto-promoting node to master candidate
Wed May  9 13:24:52 2012  - WARNING: Transitioning node from offline to online state without using re-add. Please make sure the node is healthy!
Modified node tanager
 - master_candidate -> True
 - offline -> False

Now we can see in the node list that it is back:

root@thrush# gnt-node list
Node               DTotal DFree MTotal MNode MFree Pinst Sinst
tanager.riseup.net  37.3G  144M   7.8G   95M  7.7G     0     1
thrush.riseup.net   37.3G  144M   7.8G  862M  7.0G     1     0

if we ran the ‘gnt-instance info iora’ command, we would find that the drbd devices are no longer in a degraded state:

root@thrush# gnt-instance info iora
[snip...]
  Disk template: drbd
  Disks:
    - disk/0: drbd8, size 37.0G
      access mode:  rw
      nodeA:        tanager.riseup.net, minor=0
      nodeB:        thrush.riseup.net, minor=0
      port:         11017
      auth key:     cbc9b4893a857f9f893005c66e4b92b4070a5ddf
      on primary:   /dev/drbd0 (147:0) in sync, status ok
      on secondary: /dev/drbd0 (147:0) in sync, status ok
      child devices:
        - child 0: lvm, size 37.0G
          logical_id:   vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_data
          on primary:   /dev/vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_data (254:5)
          on secondary: /dev/vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_data (254:5)
        - child 1: lvm, size 128M
          logical_id:   vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_meta
          on primary:   /dev/vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_meta (254:6)
          on secondary: /dev/vg_ocines0/53f9b50c-54d8-40ad-9106-722d070e6b23.disk0_meta (254:6)

As a final check, we should make sure the cluster verifies fine:

root@thrush# gnt-cluster verify
Wed May  9 13:32:33 2012 * Verifying global settings
Wed May  9 13:32:33 2012 * Gathering data (2 nodes)
Wed May  9 13:32:34 2012 * Gathering disk information (2 nodes)
Wed May  9 13:32:34 2012 * Verifying node status
Wed May  9 13:32:34 2012 * Verifying instance status
Wed May  9 13:32:34 2012 * Verifying orphan volumes
Wed May  9 13:32:34 2012 * Verifying orphan instances
Wed May  9 13:32:34 2012 * Verifying N+1 Memory redundancy
Wed May  9 13:32:34 2012 * Other Notes
Wed May  9 13:32:34 2012 * Hooks Results

wacky stuff

move an instance from one cluster to another, from xen to kvm

First create an empty instance on the target machine, the important flags for creation are the ones that keep the install from happening and the instance from starting, but the rest are whatever flags are needed for the VM:

root@warbler-pn:~# gnt-instance add --no-install --no-start --no-ip-check --no-name-check -o debootstrap+default -n warbler-pn --disk 0:size=15G,vg=ganetivg0 -B memory=2048M,vcpus=1 --net 0:ip=199.254.238.42,link=br-public foo.riseup.net

If drbd then you need to specify a secondary too

root@weaver-pn:~# gnt-instance add --no-install --no-start --no-ip-check --no-name-check -o debootstrap+default -n warbler-pn:fruiteater-pn --disk 0:size=15G,vg=ganetivg0 -B memory=2048M -B vcpus=1 --net 0:ip=199.254.238.42,link=riseup_pub0 foo.riseup.net

Then make the disks available to the host system (the activate-disks command needs to be run on the cluster master node, and then the kpartx needs to be run on the node that the instance resides on. In the first step below the cluster master is the same as where the node resides). Note: activate-disks requires that the instance is stopped:

root@warbler-pn:~# gnt-instance activate-disks <destination instance name>
warbler-pn.riseup.net:disk/0:/dev/ganetivg0/727c9f7d-dfbe-4e1a-82d3-442326b7ed7f.disk0
root@warbler-pn:~# kpartx -a /dev/ganetivg0/727c9f7d-dfbe-4e1a-82d3-442326b7ed7f.disk0

if drbd:

master# gnt-instance activate-disks admin.riseup.net
master# gnt-node volumes raggiana-pn.riseup.net
(find the volume for the instance you want)
node# kpartx -a /dev/mapper/ganetivg0-088e1005--7f4f--49a4--852b--333f8f6ec315.disk0_data

Now also make the source disk available to the host system:
if ganeti (turkey example):

root@monal-pn:~# gnt-instance activate-disks <source instance name>
turkey-pn.riseup.net:disk/0:/dev/ganetivg0/716efe13-a7d5-4052-8c32-43c545b952b3.disk0
root@monal-pn:~#
... connect to the node that this instance is on...
root@turkey-pn:~# kpartx -a /dev/ganetivg0/716efe13-a7d5-4052-8c32-43c545b952b3.disk0

if kvm-manager(admin on skua example):
root@skua:~# kpartx -a /dev/mapper/vg_skua0-admin
root@skua:~# ls -la /dev/mapper/vg_skua0-admin*
lrwxrwxrwx 1 root root 7 Oct 21 19:01 /dev/mapper/vg_skua0-admin -> ../dm-3
lrwxrwxrwx 1 root root 8 Oct 31 22:51 /dev/mapper/vg_skua0-admin1 -> ../dm-10
lrwxrwxrwx 1 root root 8 Oct 31 22:51 /dev/mapper/vg_skua0-admin2 -> ../dm-11

One is the root filesystem(admin1), the other is swap(admin2).

Now dd onto that disk from the source machine (woohoo!):

root@turkey-pn:~# dd if=/dev/mapper/ganetivg0-716efe13--a7d5--4052--8c32--43c545b952b3.disk0 bs=64K | ssh -p 4422 root@warbler-pn 'dd of=//dev/mapper/ganetivg0-727c9f7d--dfbe--4e1a--82d3--442326b7ed7f.disk0'

Now reverse the process, first remove access to the disk from the node:

root@warbler-pn:~# kpartx -d /dev/ganetivg0/727c9f7d-dfbe-4e1a-82d3-442326b7ed7f.disk0
root@warbler-pn:~# gnt-instance deactivate-disks <destination instance name>

do the same on the source machine as well:

root@turkey-pn:~# kpartx -d /dev/ganetivg0/716efe13-a7d5-4052-8c32-43c545b952b3.disk0
... connect to the cluster master...
root@monal-pn:~# gnt-instance deactivate-disks <source instance name>
root@monal-pn:~#

Now try and start the node on the new machine.

Troubleshooting

Which LV does a ganeti instance use?

If you are trying to figure out which lv a ganeti instance uses you can run

lvs -o+tags /dev/<vg>/<lv>

I’m trying to track down drbd stuff

If you are trying to track down drbd stuff, a terse version is in /proc/drbd but you can get more info using the drbdsetup command (the drbdadm command has some stuff too, but drdbsetup is a nicer wrapper on top of that)

drbdsetup status

building an instance results in 100% synchronized drbd forever… it never finishes!?

Are messages like these getting you down:

Wed Oct 28 05:35:34 2020  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)

Then you came to the right section of these docs!

This will never finish, and it will block jobs from happening in the cluster, so you need to resolve it.

First, find the primary instance that it was being installed on. If you don’t have that in your backlog, you could find it in the ganeti logs on the master by searching around a bit.

Once you know the primary instance, then you need to go to that machine and look at /proc/drbd, you will likely see something like this:

    root@barbet-pn:~# cat /proc/drbd 
    version: 8.4.10 (api:1/proto:86-101)
    srcversion: 473968AD625BA317874A57E 
    [snip]
    11: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
        ns:10530816 nr:0 dw:10485760 dr:46285464 al:2568 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
            [===================>] sync'ed:100.0% (0/10240)M
            finish: 0:00:00 speed: 52 (52) K/sec (stalled)

This says resource11 has the problem…. So you will need to tear down the drbd device:

# drbdsetup resource11 down

Now if you look at /proc/drbd it shouldn’t show that device in a stalled state any longer, and the ganeti task should finally finish, but the VM is not actually setup, so you will need to remove the VM (gnt-instance remove) and then re-create it. However this time make it with the option

--no-wait-for-sync

re-do the creation with the no-wait-for-sync option

cleaning up VM creation failures

Sometimes if a VM creation goes wrong, it might be hard to remove. When the VM is being bootstrapped, the scripts mount the disk and then chroot into it to run puppet. If this process doesn’t work and puppet doesn’t exit, you have to back things out. The error looks like

WARNING: Could not remove disk 0 on node colibri-pn.riseup.net, continuing anyway: drbd7: can't shutdown drbd device: resource7: State change failed: (-12) Device is held open by someone\nadditional info from kernel:\nfailed to demote\n; Can't lvremove: exited with exit code 5 -   Logical volume ganetivg0/bd7b9e6c-418d-4bb2-913d-6881231945b1.disk0_data in use.\n; Can't lvremove: exited with exit code 5 -   Logical volume ganetivg0/bd7b9e6c-418d-4bb2-913d-6881231945b1.disk0_meta in use.\n

To clean it up

primary# mount |grep /tmp
primary# lsof /tmp/whatever
primary# kill pid from ^^^
primary# kpartx -p- -d /dev/drbd6

reserved IPs

Sometimes ganeti will think an IP is reserved when it’s not, here is how to unreserve it

gnt-network modify --remove-reserved-ips=198.252.153.189 riseup_pub0