The Great Migration of Manjaro
It was just before lunch in the library, and I was checking my university emails on my travelling laptop that runs Manjaro OpenRC. While that was going on, I was inducing a few updates that it notified me about - and I started to install them with yaourt -Syua
. First mistake.
During the installation, it decided to upgrade OpenRC to the version in the AUR (Arch User Repository), but I didn't think anything of it particularly - I knew that Manjaro OpenRC was dying deprecated. Second mistake.
Once the updates were complete, I shut it down and sent on my way - or at least I tried to - it wouldn't shut down, instead proceeding to log out and leave it at that. I resolved to investigate the problem when I got home. Third mistake.
By the time I came to use it again, I was greeted with an ominous message:
[Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x52 (or later)
Failed to execute /init (error -2)
Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.2-1-MANJARO #1
Hardware name: Entroware Apollo/Apollo, BIOS 1.05.05 04/27/2017
Hrm. That's odd. Maybe something went wrong in the update? Linux has what's called kernel parameters that tell it how to boot. They specify things like "here's the root partition of the system", and "please let me edit files on the system after booting". To understand how this fits into the next part of the story, it's first necessary to take a quick retour and look at how, precisely the linux kernel goes about booting a system. This is best explained with a diagram:
(Rendered with Ascidia. Textual diagram source available here)
- BIOS / UEFI POST - The starting point of the boot process. The BIOS / UEFI turns on all the devices, runs some basic hardware checks, and (usually) gives the user a choice of what they want to boot from.
- rEFInd - grub may be used instead of rEFInd, but the basic principle is the same: it asks the user how they want to boot from the hard drive. Kernel parameters are decided on here.
- Initialisation: The Linux kernel is executed by the bootloader, and it proceeds to initialise itself and the connected devices.
- Mount initial RAM disk: The Linux encounters a chicken-and-egg problem rather early on: How can it start talking to the connected devices if it doesn't know how to talk to them? The initial RAM disk solves that problem: It contains a bunch of drives and other such components that the kernel needs to initialise all the connected devices. It's like a cut-down root file system, in a sense.
- Load drivers: The Linux kernel loads the drivers from the initial RAM disk (aka
initrd
) and starts initialising all the connected devices. - Mount root (read-only): The main root file system is mounted next, but only in read-only mode while the boot process finishes.
- Execute init: It is at this point that the very first process is executed. It usually presides at
/sbin/init
, but this can be changed through theinit
kernel parameter. - Mount root (read-write): The
init
process (under SysVinit at least) then remounts the root filesystem such that it is writeable. - Mount other partitions: The next job is the mounting of the other partitions in
/etc/fstab
. This is also done by SysVinit if I recall correctly. - Reach runlevels: The main runlevels managed by the service manager (e.g. OpenRC) are now executed in order by the service manager.
Phew, that took more explaining than I thought! And to think it all happens in the span of about 10 seconds....! With that out of the way, let's continue with the story.
Let's try specifying the init
kernel parameter - maybe the update cleared it for some random reason....? I had no idea what I was getting myself into :P
Unexpectedly, specifying init=/sbin/init
didn't work. Neither did specifying init=/bin/sh
. At this point, I suspected that there was something seriously wrong. I (correctly) guessed that it was the update I performed that morning that was to blame. After a bunch of backing and forthing, I managed to get hold of a previous copy of the openrc
package that was replaced by the 0.27 version from the AUR. After doing a full backup, I tried installing it and removing the new openrc-sysvinit
package that was also installed.
Before we continue further, I should probably explain how I managed to install the previous package version. Didn't I just explain that my system wasn't bootable? Well, yes. But I also had the original manjaro-architect installation media that I used to build the system in the first place. With that in hand, I could use rEFInd to boot from that (my UEFI firmware makes it a bit of a pain otherwise!), and then mount the root partition of the broken system and chroot into it. This process allows me to pretend that the system is actually booted, while piggybacking off the live installation media of the boot process. It works a bit like this:
lsblk # Find the root partition
mkdir /mnt/os;
mount /dev/sdZY /mnt/os # Mount the root partition
mount /dev/sdAB /mnt/os/boot/efi # Mount the EFI partition
manjaro-chroot /mnt/os bash # Enter the chroot and execute bash
Back to the story. Sadly, valiant though my effort was to replace the openrc
and openrc-sysvinit
packages was, it did not solve the problem. Eventually, I ended up having to perform a blind migration to Artix Linux, the spiritual successor to both Manjaro OpenRC and Arch OpenRC (apparently the developers of both came together to create Artix Linux).
Eventually, I ended up with a successful migration that I performed inside the chroot, and the system was bootable again! Next time, I'll always run pacman -Syu
before yaourt -Syua
. I'll also set up a temporary backup solution for my system files (I've already got one in place for my personal files) while I figure out a more permanent one that backs up across the network.