How to use the initramfs shell to save the day

for some reason the scsi modules weren't initialized early enough in the bootup so that when the raid assembly was attempted, the disks weren't available, so the boot fails. This is how I fixed this.

For some reason the SCSI modules that were required to get my RAID online weren’t added early enough in the bootup so that when the RAID assembly was attempted, the disks weren’t available, so the boot failed.

The failure looked like this:

Begin: Assembling all MD arrays ... mdadm: No devices listed in conf file were found.
Failure: failed to assemble all arrays.
done.
Begin: Waiting for udev to process events ... done.
[   11.218869] device-mapper: uevent: version 1.0.3
[   11.224825] device-mapper: ioctl: 4.13.0-ioctl (2007-10-18) initialised: dm-devel@redhat.com
  Volume group "vg_fulmar0" not found
cryptsetup: source device /dev/md2 not found
done.

Oddly, immediately afterwards, the scsi module seemed to be loaded, but it was too late and the boot hung there waiting for the root device. If you wait long enough, you will get dropped into the initramfs shell, which is a handy thing to use for debugging (it even has vi!).

So I used this handy shell to poke around and get things manually bootstrapped:

Gave up waiting for root device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/mapper/vg_fulmar0-root does not exist. Dropping to a shell!


BusyBox v1.10.2 (Debian 1:1.10.2-2) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/bin/sh: can't access tty; job control turned off
(initramfs) 

First I looked at my mdadm.conf and found that it looked right, so I assembled my raid devices:

(initramfs) cat /etc/mdadm/mdadm.conf 
DEVICE partitions
ARRAY /dev/md0 level=raid1 num-devices=3 UUID=91a6fb32:068ddf98:abb9717f:82498d3b
ARRAY /dev/md1 level=raid1 num-devices=3 UUID=92074456:1970e3c1:080ac5b4:08dc21c7
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=c94d1439:061b6537:8a478b6b:fc2267fc
(initramfs) mdadm --assemble --scan
[  283.492059] md: md0 stopped.
[  283.657941] md: bind<sdb1>
[  283.668071] md: bind<sdc1>
[  283.668179] md: bind<sda1>
[  283.700036] raid1: raid set md0 active with 3 out of 3 mirrors
mdadm: /dev/md0 [  283.704560] md: md1 stopped.
has been started with 3 drives.
[  283.759516] md: bind<sdb3>
[  283.762288] md: bind<sdc3>
[  283.769290] md: bind<sda3>
[  283.797292] raid1: raid set md1 active with 3 out of 3 mirrors
mdadm: /dev/md1 [  283.803697] md: md2 stopped.
has been started with 3 drives.
[  283.870076] md: bind<sdc4>
[  283.874307] md: bind<sdb4>
[  283.908465] raid1: raid set md2 active with 2 out of 2 mirrors
mdadm: /dev/md2 has been started with 2 drives.
(initramfs) cat /proc/mdstat 
Personalities : [raid1] 
md2 : active (auto-read-only) raid1 sdb4[0] sdc4[1]
      107820160 blocks [2/2] [UU]
      
md1 : active (auto-read-only) raid1 sda3[0] sdc3[2] sdb3[1]
      34081792 blocks [3/3] [UUU]
      
md0 : active (auto-read-only) raid1 sda1[0] sdc1[2] sdb1[1]
      489856 blocks [3/3] [UUU]
      
unused devices: <none>

Then I got the crypto layer setup, as this layer is on top of the RAID (with LVM on top of the crypto):

(initramfs) cryptsetup luksOpen /dev/md1 md1_crypt
Enter LUKS passphrase: 
key slot 0 unlocked.
Command successful.
(initramfs) cryptsetup luksOpen /dev/md2 md2_crypt
Enter LUKS passphrase: 
key slot 0 unlocked.
Command successful.

Then I brought up the LVM:

(initramfs) lvm
lvm> vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "vg_fulmar0" using metadata type lvm2
lvm> vgchange -ay
  3 logical volume(s) in volume group "vg_fulmar0" now active
lvm> lvs
  LV   VG         Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  root vg_fulmar0 -wi-a- 952.00M                                      
  usr  vg_fulmar0 -wi-a-   4.66G                                      
  var  vg_fulmar0 -wi-a-   4.66G                                      
lvm> 
(initramfs)

Then I could exit the initramfs shell and the system booted fine!

That was a relief, but my actual problem wasn’t solved, and I didn’t want to do that every time. So I did some sleuthing around and ended up finding out that there is a kernel boot parameter called ‘rootdelay’ that I could set so that things would get delayed and wait for the scsi devices to settle, so I just added ‘rootdelay=10’ to my kopt line in grub/menu.lst and updated grub so that it was set:

# kopt=root=/dev/mapper/vg_fulmar0-root ro console=tty0, console=ttyS0,115200n8 rootdelay=10

Then the boot works perfect!