Thursday, August 24, 2023

The trap of Unix

Ever since the initial implementations of Unix at Bell Labs in the early 1970s, we are stuck with some basic design decisions that have persisted in the programming interfaces of our operating systems for decades. To name a few - a process hierarchy based on a parent-child relationship between processes, a hierarchical filesystem with an embedded permission model, IO as a stream of bytes, programs stored as yet another binary file, the notion of a controlling terminal attached to user processes, the notion of a "shell" that mediates an interactive session between a terminal and the user.

While those decisions may have been very adequate in the "time-sharing", "minicomputer" era of PDP-11, I think we are long past their potential. In 2023, we build and use complex systems solving completely new high-level problems, using these primitives as the basic building blocks, without ever questioning their existence. How can we do better by rethinking those primitives?

The parent-child process hierarchy

In a typical Unix system, the process hierarchy is made up of independent processes having a parent-child relationship between them. The parent needs to wait for the termination of its direct children in order to obtain their exit code and release their PID for reuse.

In the image above, the init process has a number of getty processes and daemons as its direct children. Each getty process listens for an incoming connection from a particular terminal, identified by a filename. Once a terminal connection is established, the getty process prompts the user for their username, and invokes the /bin/login program with the entered username as an argument. This invocation is done in-place using the exec() system call, which replaces the whole program image and preserves the place of the process in the hierarchy, retaining the PID. Then, in turn, /bin/login prompts for the user's password and if it is correct, starts the /bin/sh program, again using exec().

Now, the shell process has a "controlling terminal" (for example, /dev/tty2) and will handle the interactive session with the user. The user would type a command, the shell would interpret it and start a sub-hierarchy of child processes with the same controlling terminal, waiting for their termination. The beauty of this approach is that for the user programs (cat and grep in the example), the input and output is done via the standard 0 and 1 file descriptors, with standard system calls. Those programs normally don't care about the specificity of the particular terminal they are writing to - its driver is implemented in the kernel.

There are also "daemon" programs, distinguished by the fact that they are not attached to a terminal, and having their 0, 1 and 2 descriptors closed or redirected to files.

You can now see that the whole idea of this hierarchy revolves around the way this system was used in the 1970s. Multiple users would log into the system from various terminals. They would start small programs via the shell, and those small programs would interact with the terminal seamlessly. Had there been a "flat" process arrangement, this wouldn't be so easy to achieve - individual user processes would have to "open" the active terminal in one way or another.

The question is, can we do with a simpler process hierarchy, now that physical terminals are long gone? What if every process was independently started and terminated, uniquely identified by non-reusable system-wide identifiers, handed to individual processes based on some security policy? Then we wouldn't need to wait() for their termination, or worry about "zombie processes".

Hierarchical filesystem

The primary metaphor for a hierarchical filesystem is that of a file cabinet with sorted folders inside. While this can be very convenient for some kinds of files - for example, user documents, photos or songs, it can cause a lot of headaches when you start putting binary programs and their configuration state as text files in that same global tree. Things break because stuff in that tree often isn't found or is in an undesired state.

We could still retain that tree structure for what it's good at, but we could also implement a binary versioning mechanism at the lowest level in the kernel, without relying on fragile package/container/whatever managers dealing with the complexity of the file tree. Programs, libraries and their configuration state can be managed via global identifiers and version numbers, and a corresponding system API.

IO as a (blocking) stream of bytes

The default mode of operation for Unix programs is to block while waiting for input data from the terminal or from a file, and ... to block while writing data to the terminal or to a file. To overcome this, a number of APIs have appeared since the select() system call.

What if programs are just long-living services responding to events from the kernel, with their own registered callbacks? The kernel would wake the service with some structured data ready for consumption. We wouldn't need to do so many trips and context switches around a "file descriptor".