Part 6: Root filesystems

In the second module we established that a Linux-based system is composed of three main pieces: the bootloader, the kernel and the root filesystem. So far, we have studied the first two, focusing on understanding the process through which we can generate them by ourselves. In this module we will complete our (basic!) understanding of Linux in the context of embedded systems by taking some time to properly define what a root filesystem is, consider the options available when it comes to choosing its type, and dissect its contents. We will also discuss how to create a root filesystem making use of a tool that will automate the process, and which constitutes a valid option to consider for a real-world project.

Dissecting a root filesystem

Computers store files in a tree-like hierarchy, with directories as internal nodes and proper files as leaves. In the context of UNIX-like operating systems, the root filesystem is the collection of directories and files that “start” at the root of this tree. The full name of a file in this tree is the path from the root directory to the file itself, using the / (slash) sign as directory separator. The root directory is denoted by /.

The entry that corresponds to the root directory has no name – its path is the “empty” part before the initial directory separator.

In principle, the Linux kernel does not require a specific layout for the root filesystem, but in practice the vast majority of systems abide by the so-called Filesystem Hierarchy Standard. While not an enforced standard, the document details the traditional purpose of top-level directories such as /bin, /etc, /var and so on.

Broadly speaking, directories can be classified into two categories:

Static directories typically contain executable programs and configuration data that does not change between boots of the system.
Dynamic directories contain information that must be written (or re-written) by the running system. The directories are further categorized into volatile, for which the data is normally lost with each reboot, and non-volatile, for which the data must persist between boots.

The files and directories in a root filesystem do not all have to “come” from the same “source”. Some of them can be stored on physical devices such as a hard drive or an SD card, but they can also exist only in volatile RAM. It is also possible to have a root filesystem in which files and directories are stored in different partitions of the same physical device. Linux offers the illusion of a single, uniform tree by using mount points. Simply put, the kernel allows to mount a filesystem onto another, making the contents of the former appear as a directory within the latter. The process can be thought of as attaching a whole subtree to a leaf node of the original tree.

The Filesystem Hierarchy Standard specifies the following directories and their intended usage:

/sbin is meant to contain essential system programs that must be present to boot the system. Typically, the init program (the first user-space process) resides in this directory.
/bin contains essential command programs, such as ls, mkdir, rm, etc. The system shell is usually stored here.
/lib contains essential binary libraries, including libraries used for dynamic linking with programs at execution time.
/dev is used as the connection between user-space programs the kernel-space device drivers.
/etc contains host-specific configuration information. Traditionally, this directory contains no binary programs.

Virtually all root partitions will include the five directories listed above, which were designed to comprise the minimum components necessary to bring the system to a fully booted stage.

The /usr hierarchy is usually static, generally containing only program and information files that do not change from one boot to the next. This directory contains files that are not needed to boot the system, making it possible to package the directory in a separate partition if desired. This directory traditionally contains the following subdirectories:
- /usr/bin and /usr/sbin contain user and system programs that are not essential during system startup (and therefore do not need to be available until a separate partition is mounted).
- /usr/lib contains libraries needed only by programs in the /usr tree.
- /usr/include contains header files needed to compile C and C++ programs.
- /usr/share contains architecture-independent data files used by programs in the /usr tree.

Filesystem types

Filesystems are an abstraction provided by the operating system to the user. As such, they require data structures to be represented internally, and can also be backed by real, physical storage (such as the blocks in a hard drive) or completely exist in volatile RAM. In practice, there are quite a few implementations of the concept of a filesystem, each with its own pros and cons.

Technically speaking, any of the many filesystem types supported by the Linux kernel may be used in an embedded project. That being said, it is usually preferred to use those that work well in small-footprint systems using Memory Technology Devices (MTDs) and RAM, require minimum maintenance, and have adequate ways to trade space for speed and vice versa.

Journaling filesystems are preferred in systems that might reboot unattended after a power failure. With such filesystems, whenever a file is to be written, an entry is first made to a journal, detailing the operation that will occur. The operation is then performed, and the journal entry removed. In the usual case, multiple journal entries are made concurrently by different client programs, with locking mechanisms making each journal operation atomic. On boot, the system looks for orphaned journal entries, completing their operations if the entry is coherent, and ignoring the operation if the entry is invalid. Examples of journaling filesystems supported by the Linux kernel include ext3, ext4, JFS, XFS and btrfs.
Compressed read-only filesystems are particularly important in embedded projects, given that they are quite conservative with space and relatively impervious to accidental corruption.
- cramfs is an acronym for Compressed RAM Filesytem, which is also read-only. It is being incrementally superseded by squashfs.
- squashfs is similar to cramfs, but also more capable: it is faster and allows a maximum size of up to 2⁶⁴ bytes for the filesystem as well as individual files. It is also read-only. squashfs tends to be smaller and faster than journaling filesystems, so it is often the case that embedded projects will package static directories in it.
RAM filesystems are used in virtually all Linux-based systems. Typically, the /tmp directory is mounted on a RAM filesystem.
- ramfs uses interfaces provided by the kernel to dynamically allocate and free memory used to implement a filesystem. Starting with zero memory, the amount used keeps up with the amount demanded, and then is freed automatically for other uses if the filesystem empties. A potential problem with ramfs is that there are no limits to the amount of phyisical memory that can be allocated, so usually only root has access to a ramfs filesystem.
- tmpfs is a derivative of ramfs with size limits and the ability to write data to swap space.

Putting it all together, the following could be an approach to package the root filesystem used by an embedded project:

Static directories could be packaged in a squashfs filesystem that is backed by an MTD device.
Variable, non-volatile directories could be packaged in a journaling filesystem that makes careful use of an MTD device.
Variable, volatile directories could be packaged in a tmpfs filesystem.

Please note that there is no “right” answer here. Each filesystem type comes with a set of tradeoffs, and ultimately it is up to the engineer to make a choice based on the requirements of the project and the characteristics of the hardware.

Generating a root filesystem

Creating a root filesystem is a process that could conceivably be done manually – this could, unironically, be worth doing for optimization purposes, to only include the bare minimum required by the system to do its job. The most common approach is to use a third-party tool that automates the process.

There are quite a few options to consider here, such as OpenEmbedded or Yocto. These two, and particularly the last one, are commonly used by teams in embedded projects, so it is worth taking some time to familiarize yourself with them. That being said, due to time constraints, we will focus on a simpler alternative, Buildroot, which is perfectly fine for our purposes.

Buildroot is able to “build” a complete Linux-based system. As we have mentioned before, such as system requires three components: a bootloader, a kernel and a root filesystem. Buildroot can generate all three.

Download the latest version of Buildroot and uncompress the archive.

$ wget https://buildroot.org/downloads/buildroot-2024.02.7.tar.xz
$ tar -xf buildroot-2024.02.7.tar.xz
$ cd buildroot-2024.02.7

Buildroot uses the now-familiar process based on Kconfig, where we first create a configuration tailored to our needs, and then proceed to “build” the root filesystem. If you take a look inside the configs directory, you will notice many configuration files; we are interested in beaglebone_defconfig.

$ make beaglebone_defconfig

Take some time to walk through the configuration to familiarize yourself with it.

$ make menuconfig

Under Kernel, disable Linux Kernel, since we already built it separately.
Under Bootloaders, disable U-Boot, since we already built it separately.
Under System configuration, modify the string in System banner. This will be shown above the login prompt once the system finishes initializing. It doesn’t matter what you put in here, the goal is to set something that can later be used to check that we are using the generated root filesystem.

Build the root filesystem with the configuration generated in the previous step.

$ make -j $(nproc)

(It may be necessary to install the rsync package on the build platform.)
This will take a while, so you can go grab a coffee in the meantime.

This version of Buildroot terminates with the following error:

ERROR: file(MLO): stat(...) failed: No such file or directory
ERROR: vfat(boot.vfat): could not setup MLO

This seems to be caused by Buildroot itself not paying attention to the fact that we told it to not build U-Boot. It is safe to ignore this error.

Once the build process finishes, the root filesystem can be found at output/images/rootfs.tar. If you tell Buildroot to also build the bootloader and the kernel, those files will be located here as well.

Try and boot the BeagleBone Black using this new root filesystem. If you are booting from the uSD card, mount it on the host platform and remove the contents of the second partition (e.g., /media/training/root). If you are booting through NFS, remove the contents of /srv/beagle/nfs/*. Extract the new root filesystem at the appropriate location and reboot the board. You should notice that the process is now significantly faster, mainly because the new root filesystem is also smaller and requires fewer steps to be completed before the login prompt appears. You should also see the message specified under System banner in a previous step.

Busybox

With this root filesystem, if you inspect the programs under /bin, you will notice something peculiar about them: they are links to a program called busybox.

Busybox implements many different programs within a single binary, traditionally installed at /bin/busybox. Many other files having the same name of familiar utilities, such as /bin/ls, are actually links to /bin/busybox. Since the name by which the program is invoked is available (as argv[0]), Busybox mimics the behavior of the program used to invoke it.

Busybox provides the vast majority of the command line utilities that one would require. These are usually simplified to the bare minimum to reduce memory and storage requirements, so not all features are normally included. The tool is also very modular and allows commands to be selectively included or excluded from the resulting binary.

Busybox can be found as the basis of almost all embedded Linux system. It is very relevant for your day-to-day job as an embedded engineer to familiarize yourself with it. As such, take some time to try and build Busybox by yourself directly from its source code.

Apart from the utilities under /bin and /sbin, Busybox also serves as the init process, that is, /sbin/init. If you recall from an earlier module, at the very end of the boot sequence, Linux start the first user process, loading it from a file in the root filesystem (the process ID of init is always 1). The code to start this process is in /path-to-linux-source/init/main.c:

static int kernel_init(void* unused) {
  /* (...) */
  if (ramdisk_execute_command) {
    ret = run_init_process(ramdisk_execute_command);
    if (!ret)
      return 0;
    pr_err("Failed to execute %s (error %d)\n",
           ramdisk_execute_command, ret);
  }
  
  /*
   * We try each of these until one suceeds.
   *
   * The Bourne shell can be used instead of init if we are
   * trying to recover a really broken machine.
   */
  if (execute_command) {
    ret = run_init_process(execute_command);
    if (!ret)
      return 0;
    panic("Requested init %s failed (error %d).",
          execute_command, ret);
  }

  if (CONFIG_DEFAULT_INIT[0] != '\0') {
    ret = run_init_process(CONFIG_DEFAULT_INIT);
    if (ret)
      pr_err("Default init %s failed (error %d)\n",
             CONFIG_DEFAULT_INIT, ret);
    else
      return 0;
  }
  
  if (!try_to_run_init_process("/sbin/init") ||
      !try_to_run_init_process("/etc/init") ||
      !try_to_run_init_process("/bin/init") ||
      !try_to_run_init_process("/bin/sh"))
    return 0;

  panic("No working init found.  Try passing init= option to kernel. "
        "See Linux Documentation/admin-guide.rst for guidance.");
}

The above information, especially the fact that the location of init on the filesystem can be passed to the kernel as an argument, can be useful in debugging. Also notice how the kernel assumes that the init process does not terminate.

init is the ultimate parent of all process in the system. When a process terminates (either normally or through a fault or fatal signal), the kernel cleans up all resources used by the process except the exit status code, which is kept so that the parent may learn the disposition of the child. However, in the case where the actual parent of the process has already terminated, the kernel then adjusts the process residue, marking the parent process as PID 1 (init). After this, the kernel will send a notification to init (signal SIGCHLD) alerting it to the need to go through the motions of reaping the status of the child so that the residue of its existence can be reclaimed and expunged.

As we have seen, Buildroot will build a root filesystem that uses Busybox as the init process by default. The configuration data used by this program is stored in /etc/inittab. It is fairly straightforward to read:

$ cat /etc/inittab
# /etc/inittab
#
# Format for each entry: <id>:<runlevels>:<action>:<process>
#
# id        == tty to run on, or empty for /dev/console
# runlevels == ignored
# action    == one of sysinit, respawn, askfirst, wait and once
# process   == program to run

# Startup the system
::sysinit:/bin/mount -t proc proc /proc
::sysinit:/bin/mount -o remount,rw /
::sysinit:/bin/mkdir -p /dev/pts /dev/shm
::sysinit:/bin/mount -a
::sysinit:/bin/mkdir -p /run/lock/subsys
::sysinit:/sbin/swapon -a
null::sysinit:/bin/ln -sf /proc/self/fd /dev/fd
null::sysinit:/bin/ln -sf /proc/self/fd/0 /dev/stdin
null::sysinit:/bin/ln -sf /proc/self/fd/1 /dev/stdout
null::sysinit:/bin/ln -sf /proc/self/fd/2 /dev/stderr
::sysinit:/bin/hostname -F /etc/hostname
# now run any rc scripts
::sysinit:/etc/init.d/rcS

# Put a getty on the serial port
console::respawn:/sbin/getty -L  console 0 vt100 # GENERIC_SERIAL

# Stuff to do for the 3-finger salute
#::ctrlaltdel:/sbin/reboot

# Stuff to do before rebooting
::shutdown:/etc/init.d/rcK
::shutdown:/sbin/swapoff -a
::shutdown:/bin/umount -a -r

The comment above omits to mention the action keywords ctrlaltdel and shutdown. However, a quick read of /path-to-busybox-source/init/init.c gives full information:

sysinit executes the command once (on system initialization) and waits for the command to complete before moving on.
respawn executes the command once, and then again if the command exists or is killed. A common use of respawn is to restart the mechanism for logging in once a user has logged off.
askfirst requires the user to press the Enter key before the command is run. A child process is created for the job of prompting for a response.
wait will cause init to wait for the command to complete before moving on, like the sysinit entries. The difference is that all of the sysinit commands will be run before the wait ones. (This is regardless of the order in which they appear in /etc/inittab.)
once is run and forgotten: the process is not respawned if it exits.
ctrlaltdel is run if so-called three finger salute (CTRL-ALT-DEL) has been pressed on the console.
shutdown executes the commands and then the kernel will be halted or rebooted.

Labs

Taking into account everything we have discussed so far, start with an empty root filesystem and create one yourself, using Busybox as the foundation. Make it so that you are able to at least do the following:
- Log in as root, but also be able to create new users.
- Execute programs written in C that depend on the standard library.