My Work on Oils for Unix

home posts

I’m a contributor to a project called Oils for Unix (Oils for short). Oils is a new Unix shell, the most widely deployed of which is bash. Oils aims to replace bash1 because it has some notable warts. Most notably are its unreliable (by default) error handling, its dynamic parsing of commands, which has lead to major exploits like shellshock, and the fact that it represents everything as strings, which makes manipulating data cumbersome. These things all make the shell inaccessible to programmers who spend more time writing in languages like Python, Javascript, Golang, etc... Removing these barriers will allow more people to confidently use the shell, which is an essential skill for building complex software systems.

If we want to fix the problems described above or meaningfully change the experience of using a Unix shell, it is helpful to recognize that a shell is two things:

  1. an interface to your computer’s operating system, and
  2. a programming language

When you open your terminal to check the status of a network interface or monitor CPU usage, you’re using your shell primarily as an interface. When you write a script to simplify some repetetive task, you’re using it as a language.

Regardless of which mode you’re using it in, your shell must implement an interpreted programming language. However, many shells, bash included, are implemented primarily as an OS interface. Shells implemented this way interleave the parsing and execution of commands (the mutation of OS state). This leads to awkward syntax and a brittle design that is difficult to evolve, which is why we haven’t moved beyond the status quo for over thirty years. By constrast Oils decouples these concerns, which yields a design that can more readily accomodate new features and constructs. We achieve this decoupling by embracing the fact that we’re implementing a programming language.

Bash’s ubiquity makes migrating away from it very difficult, though. So, Oils exists in two parts. OSH is a bash-compatible shell with static parsing and better error handling. It is an intermediate step away from bash: you can switch existing bash scripts to OSH today with minimal effort. Then, by gradually enabling features, you can migrate to YSH, the new shell language with structured data and types other than strings.

Oils has working implementations of OSH and YSH written in Python. This is an unconvential choice for systems software like a shell (many of which are written in, say, C). Andy Chu, the project's maintainer, describes why on the project blog:

I actually started writing it in C++. But after getting to 3K lines of code in the spring, it began to feel onerous.

The challenge is really understanding all the nooks and crannies of the shell language. If I misunderstood a syntactic feature, which happened constantly, I would have to tweak a class definition. Redundant header files and long build times make that an annoyance.

So often I would write little sketches in Python first, to test if my parsing algorithm matched reality. Over time I decided to just implement the whole thing in Python, and port it to C++ later.

Python is malleable and good at text manipulation, so it was pretty nice for writing the lexer and multiple parsers. But it also has built-in bindings to raw system calls like fork(), exec(), and dup2(), so I wrote the runtime portion as well. (Though perhaps it isn't "production quality" due to issues with signals and so forth).

In order for OSH to be a viable stepping stone away from bash it has to do more than just faithfully execute existing scripts. It also needs to be able to execute those scripts about as fast as bash does and be able to run everywhere bash runs, including on resource constrained devices. The Python implementation is incompatible with these goals for two big reasons:

  1. You end up running two interpreters! First Python has to interpret the Oils language and runtime, which then interpret’s user code. This is a lot of overhead.
  2. You need to be able to use a shell in environments that don’t have a Python interpreter.

To overcome these obstacles, you could rewrite the entire project in a static language like C++ or Rust, but this would require substantial effort: a lot of code goes into a programming language.

So, Oils has a subproject called mycpp that we use to translate the Python implementation to C++. Python and C++ are fundamentally different languages with completely different memory models. So, to make this translation possible we have to constrain the Python implementation of OSH. This means writing it in a style that only uses constructs that map cleanly to similar constructs in in C++. We also make extensive use of type annotations that get checked by mypy, a popular Python type checker. mycpp is an extension to mypy (hence the “my”). Andy has written several good posts about mycpp that are worth reading for more details.

Even the limited subset of Python used by the project is pretty broad in scope, so mycpp was written incrementally. Not all of the functionality supported in the Python implementation was initially supported in the translated C++ version. Much of my work on the project has involved extending mycpp to support the language features required by functionality that makes the C++ version of Oils viable. At the time of writing these contributions include:

I will write more detailed posts about each of these projects and hopefully this post has provided some helpful background to motivate and contextualize them.


1

Andy Chu, the project’s maintainer, has written posts on the project blog that give a much more throrough treatment to the question of why we might want yet another Unix shell.

Last updated