What Made UNIX Great and Why the Desktop is in Such Bad Shape

by Casandro

A few years ago, I wrote my diploma thesis.

For this I had to do a lot of data processing.  Now I'm a Pascal person.  I don't like C particularly, so whenever I need something, I write a little Pascal program.

During my thesis, I was amazed at how well Pascal fit into the other tools I have on my little Linux box.  For example, Sound Exchange (SoX) has a special text-based format which is trivial to read and write in Pascal.  GNUplot also takes text input and produces beautiful graphs.  It all just seemed to click into place, just like LEGO.  It was great fun to play around with it, and any idea I had could be realized within minutes.  Later, I heard of something called the "UNIX Philosophy" and I have read The Art of UNIX Programming (available online).  In this article, I'm going to be lazy and use the word "UNIX" for systems following that philosophy.  "UNIX" is simply shorter than "UNIXoid system" or "system complying to the UNIX philosophy."

Suddenly this all fell into place.  In my view, the main ingredient of UNIX is the idea that everything is a file, and those files are, if possible, simple text files in one of a few basic formats.  Look at the password file inside every UNIX system.  It simply is a text file, with columns separated by colons.  It is trivial to parse.  You read in a line, look for colons, and separate the fields.  There is nothing programming language or processor specific in those files.

In fact, there are UNIX tools like AWK, cut, and paste which thrive on those simple text formats.  Again, it all just simply clicks into place.  Just because it's all text and simple commands.

Imagine running the computer system at a school.  If you'd like to have a Windows user account for every pupil, you would have to either manually create those accounts, or use a special tool which may or may not read your source list and add the users.  On UNIX systems,the problem is trivial to solve.  You make sure you have a list of all pupils and write a little shell script executing the adduser command for each one of them.  Within a short amount of time, you will have all users added.  If you want to make the process faster, you can even create new password files directly.  Things which are trivial are trivial.  You don't need to mess with complicated interfaces.  Everything you need is documented precisely where you need the documentation.

I believe the reason for relying on text lies within the weaknesses of the C language.  C is not actually very portable.  For example, I used to have an iBook running Linux.  Since it had a G3 processor, it stored integers in a different direction than my desktop PC.  While my PC stored the least significant digits first, and then progressed to the more significant ones, the Mac did it precisely the other way around.  And those machines still were fairly similar; both were 32-bit machines.  In the past, there were 18- or 36-bit machines, so the number of bits in an integer was very different.  Transferring binary files between one computer and the next must have been a nightmare.  However, if you use text, it's trivial.  You can always get text to some standard format, for example, Baudot on five column paper tape, or perhaps punch cards.  The problem of transferring text from one machine to the other was already solved when UNIX emerged.

There is another point where text is used.  If you want to interface with a subsystem on UNIX, you traditionally use text.  For example, there is a sendmail command which takes text as an input and sends out emails.  Since it is a command, you can simply add options to it.  However, since the scope of the command is limited (another great idea behind UNIX), you'd rarely need to completely rework the interface.  If you do, you can simply start a new tool, or you can write a tool taking the new format of input and reformatting it for the old format.  In fact, this is what old versions of bc, a UNIX "desktop calculator" tool, used to do.  It reformatted its input into the form needed by dc (another similar tool) which did the actual calculation.  That way, you didn't need to maintain two sets of algorithmic routines.

Now there is an unsettling development in the UNIX world.  It probably started with the TCP/IP stack.  Suddenly you had to use special functions to open network sockets.  People didn't mind yet, as it still was a file, and after all today you can simply use Netcat to open sockets in shell scripts.

Then came things like Advanced Linux Sound Architecture (ALSA) and Open Sound System (OSS).  Back when I started with Linux, you could simply type cat /dev/dsp > somefile and record audio.  You could play it back with cat somefile > /dev/dsp.

The sound card was just a device you could read from and write to, just like a serial port.  Then came ALSA.  You suddenly had to link against a library.  At least there still were decent command line tools so you could set things like the volume without having to link to libraries.  Now we have PulseAudio, an overly complex and fragile system.  Yes, it does have a command line to control it... but it uses locale.  It's virtually impossible to reliably parse its output.

More and more systems build on top of in-transparent systems.  There is, for example, dbus, a system apparently designed to state the obvious... in 400 messages if necessary.  Sure, it seems like a good idea to be able to pass around messages, but aren't there simpler ways other than creating a daemon which sometimes even crashes?

I could go on ranting about various systems, but there is little point.  Everyone knows the problems, and, in fact, there are valid reasons for doing it the way the developers have done it.  Maybe the problems lie in our current UNIXes themselves.

Let me talk to you about a world where people have taken the philosophy behind UNIX to the next level - the world of Plan 9.  Unfortunately, I haven't been able to try out this operating system, named after the popular U.S. science fiction movie Plan 9 from Outer Space.  So a lot of what I say is based on hearsay.  Nevertheless, there are ideas which are worth considering for future versions of UNIX systems.  First, let me remind you of two features that actually have made it to Linux.  The first, and probably the most popular, is UTF-8.  With it, I have a fairly compatible way of simply using multi-language text wherever I previously was able to use plain ASCII.  The other feature is the /proc file system which includes a lot of information about the system as well as all processes currently running.

Plan 9 takes the idea that everything is a file to the next level.  File systems are natural interfaces between any part of the system.  For example, networking is part of a file system.  You can open a socket by writing to a file.  An IRC client would provide you with a directory where you could write into a file to open a connection to an IRC server.  This would create a directory.  In that directory, you could write to another file which causes the client to join a channel and create a directory for that channel containing files representing everything being said in that channel, and a file to say something to that channel.  Of course, they have their own network file system which allows you to export those virtual file systems.  That way you can export the networking stack via the network, a useful feature when you only have a limited number of public IP addresses.

Now imagine we had a similar system on the desktop.  Instead of having to link GUI toolkit libraries into your program, you could just call a program which will open up a GUI element on the screen as well as a directory in your virtual file system.  You can then add more and more GUI elements.  The great thing is, if you want to change or extend your GUI toolkit, you'd just change programs.  It won't even matter what language those programs are written in.  You could try out new elements in shell script and then later move them to C or Pascal or whatever.  If you want to port your GUI toolkit to a mobile device, you'd just replace the executables.  And even if you added new features, it'll still be compatible.

This is the great thing about text-based formats.  It's trivial to write software that can just ignore columns at the end of a line.  It's much harder to write software which can deal with unknown sizes of binary structures.  It is also trivial to call a program with command line options you don't know about - you simply don't set them.  It's much harder to dynamically link to a binary library if you don't know the complete structure of the interface.

Text interfaces are simply more versatile and flexible.  They can tolerate quite some amount of changes.  And changes are a good thing.  Designing interfaces is hard.  Virtually nobody gets it right the first time.  So it's good to have several chances.

To me, this is what UNIX is all about.

Return to $2600 Index