One Language to Rule Them All?

Perm url with updates: http://xahlee.org/comp/what_lang_to_use_for_find_replace.html

One Language to Rule Them All?

Xah Lee, 2011-02-08

This is my personal account of a struggle in choosing languages and trying to maintain some concept of efficiency of using one single system instead of mishmash of components. This essay is originally a post from comp.lang.lisp Source groups.google.com.

Prolog: How to Write grep in Emacs Lisp.

Or, What Language to Use for Find Replace?

... never really got into bash for shell scripting... sometimes tried but the ratio of power/syntax isn't tolerable. Knowing perl well pretty much killed any possible incentive left.

... in late 1990s, my thoughts was that i'll just learn perl well and never need to learn other lang or shell for any text processing and sys admin tasks for personal use. The thinking is that it'd be efficient in the sense of not having to waste time learning multiple langs for doing the same thing. (not counting job requirement in a company) So i have written a lot perl scripts for find & replace and file management stuff and tried to make them as general as possible. lol. But what turns out is that, over the years, for one reason or another, i just learned python, php, then in 2007 elisp. Maybe the love for languages inevitably won over my one-language efficiency obsession. But also, i end up rewrote many of my text processing script in each lang. I guess part of it is exercise when learning a new lang.

... anyway, i guess am random babbling, but one thing i learned is that for misc text processing scripts, the idea of writing a generic flexible powerful one once for all just doesn't work, because the coverage are too wide and tasks that needs to be done at one time are too specific. (and i think this makes sense, because the idea of one language or one generic script for all stem from ideology, not from real world practice. If we look at the real world, it's almost always a disparate mess of components and systems.)

my text processing scripts ends up being a mess. There are several versions in different langs. A few are general, but most are basically used once or in a particular year only. (many are branched off from a generic one but customized into specific needs that are used and thrown away). When i need to do some particular task, i found it easier just to write a new one in whatever lang that's currently in my brain memory than trying to spend time fishing out and revisit old scripts.

some concrete example...

e.g. i wrote this general script in 2000, intended to be one-stop for all find/replace needs. See: Perl: Find & Replace on Multiple Files.

in 2005, while i was learning python, i wrote (several) versions in python. e.g. Python: Find & Replace Strings in Unicode Files.

it's not a port of the perl code. The python version doesn't have much features as the perl. But for some reason, i have stopped using the perl version. Didn't need all that perl version features for some reason, and when i do need them, i have several other scripts that address a particular need. (e.g. one for searching unicode encoded files, one for change Windows/unix line ending, one for converting file encoding, one for multiple pairs find/replace in one shot, one for regex one for plain text, one for find only one for find+replace, several for find/replace only if particular condition is met (e.g. if the file contains a particular string, or the search string is inside a particular tag), etc.)

then in 2006, i fell into the emacs lisp hole. In the process, i realized that elisp for text processing is more powerful than perl or python. Not due to lisp the lang, but more due to emacs the text-editing environment and system. I tried to explain this in few places but mostly here: Text Processing: Emacs Lisp vs Perl.

so, all my new scripts for text processing are in elisp. A few of my perl and python scripts i still use, but almost everything is now in elisp.

also, sometimes in 2008, i grew a shell script that process weblog using the bunch of unix bag cat grep awk sort uniq. It's about 100 lines. You can see it here: weblog_process.sh.

at one time i wondered, why do i have this 100 lines shell script? Where did my idea go that perl should replace all shell scripts? I gave it a little thought, and i think the conclusion is that for this task, the shell script is actually more efficient and simpler to write. Possibly if i started with perl for this task and i might end up with a good structured code and not necessarily less efficient... but you know things in life isn't all planned. It began when i just need a few lines of grep to see something in my web log. Then, over the years, added another line, another line, then another, all need-based. If in any of those time i thought “let's scratch this and restart with perl” — that'd be wasting time. Besides that, i have some doubt that perl would do a better job for this. With shell tools, each line just do one simple thing with piping. To do it in perl, one'd have to read-in the huge log file then maintain some data structure and try to parse it... too much memory and thinking would involved. If i code perl by emulating the shell code line-by-line, then it makes no sense to do it in perl, since it's just shell bag in perl.

Also note, this shell script can't be replaced by elisp, because elisp is not suitable when the file size is large.

well, that's my story — extempore! ☺