unix pipe as functional language

perm url with updates: http://xahlee.org/comp/unix_pipes_and_functional_lang.html.

Unix Pipe As Functional Language

Xah Lee, 2010-01-25

Found the following juicy interview snippet today:

Is there a connection between the idea of composing programs together from the command line throught pipes and the idea of writing little languages, each for a specific domain?

Alfred Aho: I think there's a connection. Certainly in the early days of Unix, pipes facilitated function composition on the command line. You could take an input, perform some transformation on it, and then pipe the output into another program. ...

When you say “function composition”, that brings to mind the mathematical approach of function composition.

Alfred Aho: That's exactly what I mean.

Was that mathematical formalism in mind at the invention of the pipe, or was that a metaphor added later when someone realized it worked the same way?

Alfred Aho: I think it was right there from the start. Doug McIlroy, at least in my book, deserves the credit for pipes. He thought like a mathematician and I think he had this connection right from the start. I think of the Unix command line as a prototypical functional language.

It is from a interview with Alfred Aho, one of the creators of AWK. The source is from this book: Masterminds of Programming: Conversations with the Creators of Major Programming Languages (2009), by Federico Biancuzzi et al. (amazon)

Since about 1998, when i got into the unix programing industry, i see the pipe as a post fix notation, and sequencing pipes as a form of functional programing, but finding it overall extremely badly designed. I've wrote a few essays explaining the functional programing connection and exposing the lousy syntax. (mostly in years around 2000) However, i've never seen another person expressing the idea that unix pipes is a form of postfix notation and functional programing. It is a great satisfaction to see one of the main unix author state so.

Unix Pipe As Functional Programing

The following email content (slighted edited) is posted to Mac OS X mailing list, 2002-05. Source

From: xah / xahlee.org
Subject: Re: mail handling/conversion between OSes/apps
Date: May 12, 2002 8:41:58 PM PDT
Cc: macosx-talk / omnigroup.com

Yes, unix have this beautiful philosophy. The philosophy is functional programing. For example, define:

power(x) := x*x

so “power(3)” returns “9”.

Here “power” is a function that takes 2 arguments. First parameter specifies the number to be raised to power, the second the number of times to multiply itself.

functions can be nested,

f(g(h(x)))

or composed

compose(f,g,h)(x)

Here the “compose” itself is a function, which take other functions as arguments, and the output of compose is a new function that is equivalent to nesting f g h.

Nesting does not necessarily involved nested syntax. Here's a post fix notation in Mathematica for example:

x // h // g // h

or prefix notation:

f @ g @ h @ x

or in lisp

(f (g (h x)))

The principle is that everything is either a function definition or function application, and function's behavior is strictly determined by its argument.

Apple around 1997 or so have this OpenDoc technology, which is similar idea applied more broadly across OS. That is, instead of one monolithic browser or big image editors or other software, but have lots of small tools or components that each does one specific thing and all can call each other or embedded in a application framework as services or the like. For example, in a email apps, you can use BBEdit to write you email, use Microsoft's spell checker, use XYZ brand of recorder to record a message, without having to open many applications or use the Finder the way we would do today. This multiplies flexibility. (OpenDoc was killed when Steve Jobs become the iCEO around 1998 and did some serious house-keeping, against the ghastly anger of Mac developers and fanatics, I'm sure many of you remember this piece of history.)

The unix pipe syntax “|”, is a postfix notation for nesting. e.g.

ps auwwx | awk '{print $2}' | sort -n | xargs echo

in conventional syntax it might look like this:

xargs(  echo, sort(n, awk('print $2', ps(auwwx)))  )

So when you use “pipe” to string many commands in unix, you are doing supreme functional programing. That's why it is so flexible and useful, because each component or function does one thing, and you can combine them in myriad of ways. However, this beautiful functional programing idea, when it is implemented by the unix heads, becomes a fucking mess. Nothing works and nothing works right.

I don't feel like writing a comprehensive exposition on this at the moment. Here's a quick summary:

  • Fantastically stupid syntax.
  • Inconsistencies everywhere. Everywhere.
  • Fucking stupid global variables reliance called environment variables, which fucks up the whole functional programing paradigm.
  • Implicit stuff everywhere.
  • Totally incompetent commands and their parameters. (promiscuously non-orthogonal, and missing things, and fucked up in just more ways than one can possibly imagine. there are one million way to do one thing, and none are correct, and much simple needs CANNOT be done! (that's why there are gazillion shells each smart-ass improving upon the other, and that's why Perl is born too! But asinine Larry Wall don't know shit but smugly created another complexity that don't do much.))

Maybe some other day when i'm pissed, i'll write a better exposition on this issue. I've been wanting to write a final-say essay on this for long. Don't feel like it now.

Unix Syntatical and Semantical Stupidity Exposition

The following is posted to a Mac OS X mailing list. Source

From: xah@xahlee.org
Subject: unix inanity: shell env syntax
Date: June 7, 2002 12:00:29 AM PDT
To: macosx-talk@omnigroup.com

Unix Syntatical and Semantical Stupidity Exposition. (this is one of the many technical expositions of unix stupidity)

(this is currently unpolished, but the meat is there. Input welcome.)

arguments are given with a dash prefix. e.g.

ls -a -l

Order (usually) does not matter. So,

ls -a -l

is the same as

ls -l -a

but arguments can be combined, e.g.

ls -al

means the same thing as

ls -a -l

However, some option consists of more than one character. e.g.

perl -version
perl -help

therefore, the meaning of a option string "-ab" is ad hoc dependent on the program. It can be "-a -b" or just a option named "ab".

Then, sometimes there are two versions of the same optional argument. e.g.

perl -help
perl -h
perl -version
perl -v

this equivalence is ad hoc.

Different program will disagree on common options. For example, to get the version, here are common varieties:

-v
-V
-version

sometimes v/V stands for "verbose mode", i.e. to output more detail.

Sometimes, if a option is given more than once, then it specifies a degree of that option. For example, some command accept the -v for "verbose", meaning that it will output more detail. Sometimes there are few levels of detail. The number of times a option is given determines the level of detail. e.g. on Solaris 8,

/usr/ucb/ps -w
/usr/ucb/ps -w -w

Thus, meaning of repeated option may have special meaning depending on the program.

Oftentimes some options automatically turn on or surpress a bunch of others. e.g. Solaris 8,

/usr/bin/ls -f

When a named optional parameter is of a boolean type, that is a toggle of yes/no, true/false, exist/nonexist, then it is often times that instead of taking a boolean value, their sole existence or non-existence defines their value.

Toggle options are sometimes represented by one option name for yes, while another option name for no, and when both are present, the behavior is program dependent.

Toggle options are represented by different option names.

For named options, their syntax is slack but behavior is usually dependent on the program. i.e. not all of the following works for every program

command -o="myvalue"
command -omyvalue
comand -o myvalue

Often one option may have many synonyms...

A example of a better design... (Mathematica, Scheme, Dylan, Python, Ruby... there's quite a lot elegance and practicality yet distinct designs and purposes and styles ...)

(recall that unix have a bad design to begin with; it's a donkey shit pile from the beginning and continuation. Again, unix is not simply technically incompetent. If that, then that's easy to improve, and i don't have a problem with, since there are things in one way or another considered horrendous by today's standard like COBOL or FORTRAN or DOS etc. But, unix is a brain-washing idiot-making machine, churning out piles and piles of religiously idiotic and pigheaded keyboard punchers. For EVERY aspects of good engineering methodology improvement or language design progress opportunity, unixers will unanimously turn it down.

Inevitably someone will ask me what's my point. My point in my series of unix-showcasing articles have always been clear for those who studies it: Unix is a crime that caused society inordinate harm, and i want unix geeks to wake up and realize it.

Microsoft PowerShell

Note: Microsoft's new shell programing language, PowerShell (b2006), adopted much of unix shell's syntax and the pipe paradigm, but with a much consistent and formal design. (see: Xah's PowerShell Tutorial)

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs