2011-07-23

Lisp Celebrities and Computing History from Worse Is Better

Perm url with updates: http://xahlee.org/comp/lisp_celebrities_worse_is_better.html

Lisp Celebrities and Computing History from Worse Is Better

Xah Lee, 2011-07-24

I just discovered the identies of 2 semi-fictional character in lisp lore.

There's a infamous article, known as Worse is Better. Very popular in the 1990s, and still so among lisp circles. The article is:

The Rise of “Worse is Better” (1991) By Richard P Gabriel. @ dreamsongs.com

In the article, there's this passage:

Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem. The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called PC loser-ing because the PC is being coerced into loser mode, where loser is the affectionate name for user at MIT.

The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing.

The New Jersey guy said that the Unix solution was right because the design philosophy of Unix was simplicity and that the right thing was too complex. Besides, programmers could easily insert this extra test and loop. The MIT guy pointed out that the implementation was simple but the interface to the functionality was complex. The New Jersey guy said that the right tradeoff has been selected in Unix -- namely, implementation simplicity was more important than interface simplicity.

I discovered, that the MIT guy is Daniel Weinreb, and the New Jersey guy is Bill Joy.

I discovered this from Daniel's blog. The “Worse is Better” idea and the future of Lisp (2009-06-07) By Daniel Weinreb. @ Source danweinreb.org.

The Worse Is Better is a characterization of software success. It is a seminal article, and in my opinion one of the best essay on the topic. See also: The Nature of the Unix Philosophy.

When you look at computing history, many well-known figures had various connections. Daniel and Gabriel are both from the lisp circle. There are many blogs i've written in the past involving these programing celebrities from my diggings of computing history, especially involving lisp, emacs, unix. Following is a summary.

• I do a lot provocative writings. Around 2000, one time Richard P Gabriel made some posts to comp.lang.lisp, and i criticized one of his post about software engineering. He politely asked what's my opinion. See: What and Why of Math. (this is the period when i was reading comp.lang.lisp mostly for Erik Naggum's posts. See: Death of a Troll — My Memory of Erik NaggumWhy do I Rant In comp.lang.lisp? )

• My review of Richard Gabriel's 1996 book. It's quite scathing. Book Review: Patterns of Software.

• Sometimes in 2007, lisp cons cropped up again in comp.lang.lisp. I usually attack it to no ends. Daniel kindly asked what's my objection to lisp's cons. Here's my reply among other meandering rants on lisp's cons: Lisp's List Problem. See also: Programing Language: The Glory of Lisp's cons.

• Bill Joy, is a founder of Sun Microsystems, and is the author of vi. (See: Emergency vi (vi tutorial)) In 2000, he wrote a popular essay titled “Why the future doesn't need us”. I wrote a blog about it: Futuristic Calamity.

For how vi's keys began, in particular the H J K L keys for cursor movement, see: Keyboard Hardware's Influence on Keyboard Shortcut Design.

In GNU Emacs Manual, it began thus:

Emacs is the extensible, customizable, self-documenting real-time display editor. This is the Sixteenth edition of the GNU Emacs Manual, updated for Emacs version 23.3.

Wonder why it calls itself “real-time display editor”? Why “real-time”? Why “display”? And how vi's “modal” operation came to be? See bottom of: GNU Emacs and Xemacs Schism, by Ben Wing.

• Both Daniel and Gabriel are emacs users, of course, and it is a major cause of RSI. Daniel has mentioned how emacs's keys began, to a article i posted in a newsgroup post. Search for “daniel” in: Why Emacs's Keyboard Shortcuts Are Painful.

• In discussing how emacs keybinding came to be, Daniel mentioned Guy Steele (most famous for being the co-inventor of Scheme Lisp.). See: Guy Steele on Parallel Programing: Get rid of cons!.

• Jamie W Zawinski (JWZ) is hired by Gabriel to work in his company Lucid Inc, and one of the product is a IDE based on emacs, which eventually became XEmacs when the company disbanded. It was JWZ who's responsible for spreading the article Worse Is Better. (Since the beginning of the web (~1994), the most popular website that hosted the “Worse Is Better” article is JWZ's website. Only till ~2007 that Gabriel started to have a website and hosted his own article.).

Most of today's programers might have heard of Jamie, if you've heard one of them at all. He's the star back in Netscape days (1990s). He's one of the more provocative types and writes non-stop. There are plenty articles in mainstream media written about him. Richard Stallman, blames Jamie as the one who cause the emacs vs xemacs schism. Both Jamie and Stallman suffered severe RSI due to emacs.

2011-08-02

lisp history, MULTICS vs UNIX, PL/I, …

Got this accolade; made my day:

The original version of MULTICS (the predecessor of UNIX (TM) , the precedessor of Linux) was written in PL/I. (Yes, I'm as old as that.......) kind regards, andy

PS. and one more note: Xah Lee wrote very well about the history of LISP/AI/functional programming, to my opinion.

From this thread:

Newsgroups: comp.lang.lisp
From: “Antti Ylikoski” 〔antti.yliko…@elisanet.fi〕
Date: Sun, 31 Jul 2011 18:14:23 +0300
Local: Sun, Jul 31 2011 8:14 am
Subject: Re: Lisp Celebrities and Computing History from Worse Is Better

Source groups.google.com

It was a comment to my article: Lisp Celebrities from Worse Is Better. Mark Tarver, the elusive computer scientist who created the Qi language, also commented.

See also:

jcs's lisp code for validating matching pairs

jcs wrote 2 versions in elisp for our previous programing challenge on Validate Matching Brackets.

Ι'll be going over his code soon. His blog is well annotated, so is a good one if you are learning emacs lisp, from a different style than mine. His article is at: Xah's Challenge (Part 2) (2011-07-21) By jcs. @ Source irreal.org.

2011-07-22

Perl & Python: Print Version String from Script

Perm url with updates: http://xahlee.org/perl-python/print_version.html

Perl & Python: Print Version String from Script

Xah Lee, 2011-07-22

In perl or python, you can print the version of the interpreter from the command line. e.g. perl --version, python --version. However, often you have several versions installed, and you wanted to know which version your script is actually running from. On the command line, there are aliases, links, path environment variables, various shell init scripts (.rc, .profile, .profile, .bash_profile), and it is often complex to find out the exact steps which version your script running off.

One absolutely accurate way to know is simply have your script print the version string within.

Python

#-*- coding: utf-8 -*-
# python

import sys
print sys.version

# sample output:
# 2.6.5 (r265:79063, Jun 12 2010, 17:07:01) 
# [GCC 4.3.4 20090804 (release) 1]

http://docs.python.org/library/sys.html

Perl

#-*- coding: utf-8 -*-
# perl

print $^V; # prints version string

# sample output: v5.10.1

perldoc perlvar

Motherboard Specification: MSI MS-7548 (Aspen)

Perm url with updates: http://xahlee.org/mswin/msi_ms-7548_motherboard.html

Motherboard Specification: MSI MS-7548 (Aspen)

Xah Lee, 2011-05-25

The following info is from: Source bizsupport1.austin.hp.com

hpweb 1-2 topnav hp logo

HP Support document

c01635691

Figure 1: The MSI MS-7548 (Aspen) motherboard

Motherboard description

Manufacturer's motherboard name: MSI MS-7548

HP/Compaq name: Aspen-GL8E

Form Factor

Micro-ATX: 24.4 cm (9.6 inches) x 24.4 cm (9.6 inches)

Chipset

AMD 780G

Front-side bus speed

Up to 5200MT/s (5.2 GT/s) when using compatible AM2+ or AM3 processor

Processor upgrade information

Socket type: AM2+
TDP: 125 watt

Motherboard supports the following processor upgrades:

    • Athlon X2 BE-2xxx (Brisbane) (AM2)
    • Athlon 64 X2 up to 6000+ with Dual Core (Brisbane) technology (AM2)
    • Athlon 64 X2 up to 6000+ with Dual Core (Winsor) technology (AM2)
    • AMD Phenom Triple-Core (Toliman) up to 8xxx series (AM2+)
    • AMD Phenom Quad-Core (Agena) up to 9950, (AM2+)

Memory upgrade information

    • Dual channel memory architecture
    • Four DDR2 SO-DIMM (240-pin) sockets
    • Supported DIMM types:
         PC2-6400 (800 MHz)
         PC2-5300 (667 MHz)
    • Non-ECC memory only, unbuffered
    • Supports 4GB DDR2 DIMMs
    • Supports up to 16 GB on 64-bit PCs
    • Supports up to 4 GB* on 32-bit PCs

 NOTE: *Actual available memory may be less

Video

Integrated graphics using ATI Radeon™ HD 3200 (Supports DirectX® 10)

*Integrated video is not available if a graphics card is installed.

    • Integrated graphics using ATI Radeon™ HD 3200 (Supports DirectX® 10)
    • Supports concurrent use of dual displays connected to onboard DVI-D and VGA connectors.
    • Also supports PCI Express x16 graphics cards as independent graphics adapters.*

Audio

Integrated Realtek ALC888S Audio

*Integrated audio is not available if a sound card is installed.

    • Audio CODEC: Realtek ALC888S
    • High Definition 8 channel audio
    • Supports one S/PDIF digital connection

Network

LAN: 10-Base-T

    • Interface: Integrated into motherboard
    • Technology: Realtek RTL8111C
    • Data transfer speeds: up to 10/100 Mb/s
    • Transmission standards: 10-Base-T Ethernet

Expansion Slots

    • One PCI Express x16
    • Three PCI Express x1

I/O Ports

Back I/O ports

c01635692

Figure 2: Back I/O panel

1 - PS/2 mouse (green)
2 - S/PDIF coaxial out
3 - Video Graphics Adapter p
4 - IEEE 1394 (FireWire)
5 - RJ-45 Network (LAN)
6 - Audio: Center/Subwoofer (yellow orange)
7 - Audio: Rear Speaker Out (black)
8 - Audio: Line In (light blue)
9 - Audio: Line Out (lime)
10 - Audio: Microphone (pink)
11 - Audio: Side Speaker Out (gray)
12 - USB 2.0: 4
13 - DVI
14 - PS/2 keyboard (purple)

Internal Connectors

    • One 24-pin ATX power connector
    • One 4-pin ATX power connector
    • Six SATA connectors
    • One floppy drive connector
    • Two 12V fan connectors for CPU fan and chassis fan
    • One 9-pin header for power button, reset button, power LED, and HDD LED
    • One S/PDIF digital audio output header
    • One front line input connector
    • One 9-pin audio header for headphone-out and microphone-in (yellow, Vista capable, requires matching front audio jack module)
    • 6 USB 2.0 headers
    • One 1394a header
    • One SPI (ROM programming) connector
    • One jumper for resetting BIOS settings
    • One jumper to disable BIOS password checking

Motherboard layout

c01635690

Figure 3: layout

Clearing the CMOS settings

CAUTION: Do not change any jumper setting while the computer is on. Damage to the motherboard can result.

This motherboard has a jumper to clear the Real Time Clock (RTC) RAM in CMOS.

c01635693

To clear CMOS, follow these steps:

① Temporarily set jumper CLEAR_CMOS to pins 2-3
② Wait 5-10 seconds and then return the jumper to pins 1-2.
c01635694
③ When you start the PC you will need to enter BIOS setup to reset any custom BIOS settings

Clearing the BIOS password

The BIOS password is used to protect BIOS settings from unwanted changes. If you have forgotten your password you may disable password checking.

To erase the BIOS password follow these steps:

    ① Turn OFF the computer and unplug the power cord.
    ② Locate the jumper labeled CLEAR_PWD.
c01635695
③ Move the jumper CLEAR_PW to pins 2-3.
c01635696
    ④ Plug in the power cord and turn ON the computer.
    ⑤ Hold down the F10 key during the startup process and enter BIOS setup to change or clear the password.
    ⑥ After changing or clearing the BIOS passwords, remember to reset the jumper to pins 1-2.

Back to Switching from Mac/Unix To PC/Windows.

Emacs Lisp: Getting Command Line Arguments

Perm url with updates: http://xahlee.org/emacs/elisp_command_line_argv.html

Emacs Lisp: Getting Command Line Arguments

Xah Lee, 2011-07-22

Remember, that elisp scripts can be run from the OS's command line. Like this:

emacs --script process_log.el

(For detail, see: Text Processing with Emacs Lisp Batch Style.)

Sometimes you want to pass a argument. For example, like this:

emacs --script process_log.el ~/weblog.txt

In your elisp script, you can get the argument from the builtin variable “argv”. The “argv” is a alias to “command-line-args-left”. Its value is a list. Each element is a unprocessed item from the command line. The complete command line items, including emacs invocation and any option passed to emacs (e.g. your script name) is stored in “command-line-args”.

Here's a sample test script:

;; -*- coding: utf-8 -*-
;; emacs lisp
;; 2011-07
;; a test script. getting arguments from command line

(message "Index 0: %S" (elt argv 0)) ; %S is for lisp code (aka sexp, s-expression)
(message "Index 1: %S" (elt argv 1))
(message "Index 2: %S" (elt argv 2))
(message "Index 3: %S" (elt argv 3))

(message "Whole value of argv: %S" argv) ; “argv” is same as “command-line-args-left”
(message "Whole value of command-line-args: %S" command-line-args)

If you save and name the above as “xx.el” and run it like this:

emacs --script xx.el uni 2 -tri

Here's the output:

$ emacs --script xx.el uni 2 -tri
Index 0: "uni"
Index 1: "2"
Index 2: "-tri"
Index 3: nil
Whole value of argv: ("uni" "2" "-tri")
Whole value of command-line-args: ("emacs" "-scriptload" "xx.el" "uni" "2" "-tri")
Unknown option `-tri'

(info "(elisp) Command-Line Arguments")

2011-07-22 Thanks to Piotr Chamera 〔piotr_cham…@poczta.onet.pl〕, Swami Tota Ram Shankar 〔tota_…@india.com〕.

English Writing Style: on The Second Objection to Lots of Fun

Perm url with updates: http://xahlee.org/lit/english_style_lots_of_fun.html

English Writing Style: on The Second Objection to Lots of Fun

Xah Lee, 2011-07-22

On the Second Objection to Lots of Fun

By Xah Lee
July 21, 2011
Department of Philology
Bovine University

Abstract

In the human endeavor of composition with a writing system that is known as English language in most western world today, there is the objection of style known as the First Fundamental Objection to the use of lots in precedence of fun with a preposition of of as to form the clause lots of fun.

We present here the Second Fundamental Objection to lots of fun on the grounds of logic, from a application of logical positivism's interpretation of Occam's razor.

The First Fundamental Objection is well known in the work A Dialogue Between Men of Letters: “Lots” of “Fun”! [1]. we quote the passage:

lots has lots of meanings like a meaning lot, the pedantic lot, almost chosen by lot. When we are in the lot where lots meanings are allotted in lots of ways like lottery, it's a lot of trouble to decipher, and is not fun, despite lots of right in front.

Its objection is based on the philosophy of reductionism and the esthetic school of minimalist semantics, as a strategy of reducing the multiplicity valence of a sememe's adjectival power in pragmatics. The effectiveness of such theory has been demonstrated by the cohort model in neurolinguistics.

However, the recently compiled corpus of webologue showed that First Fundamental Objection has not stopped the populace in such a usage. Researchers in our field have been puzzled by this for over the past half century. Until recently, we discovered that it is caused by the guild of stylists's failure of ther collective force in molding a viscous infrastructure of writing. We think that the First Fundamental Objection suffered the so-called Loniness Syndrome, and must be complemented with its natural conjugate, the Second.

The Second Fundamental Objection is based on counting principles. Objective Noun can be had in the plurals, e.g. lots of chairs. When lots of is applied to chairs, we obtain multitudes of chairs. However, when lots of is appled to fun, we obtain lots of fun, but is it a multitude of fun? Here we arrived at a falsidical paradox. It suggests that fun is not countably infinite, but uncountably infinite, unlike chairs. Analogous of water is to salt, we know that water and fun both have cardinality of the continuum [Zermelo–Fraenkel set theory with or without Axiom of Choice]. Lots of has a equivalence relation to many, e.g. lots of chairs = many chairs, but now the absurdities comes to light when you replace chairs by fun. “lots of fun =? many fun”. It is on this basis we propose the Second Fundamental Objection.

For the complete monograph, please paypal to xah@xahlee.org USD$6.

References

2011-07-21

Lisp, Python, Perl, Ruby Code to Validate Matching Brackets

The verdict is out: Preliminary report of a little programing challenge last week: Lisp, Python, Perl, Ruby Code to Validate Matching Brackets.

A Dialog Between Men of Letters: “Lots” of “Fun”!

Perm url with updates: http://xahlee.org/lit/a_dialog_between_men_of_letters.html

A Dialog Between Men of Letters: “Lots” of “Fun”!

Xah Lee, 2011-07-21

Xah Lee wrote:

a very fun exercise. …

Peter Moylan wrote:

Since you're cross-posting this to alt.usage.english, it's worth pointing out that we don't say “very fun” here.

Xah Lee wrote:

very means like “very much”. Fun means fun. You don't say “vary fun”? Like, you mean varing degrees of fun or vagaries of fun but you don't say it? Or, but you write it and speak it?

I know not what it means.

Stephen wrote:

“Fun” is a noun and most English speakers object to it being used as an adjective, and even more to an adjectival intensifier such “so” or “very” being attached to it.

Unless you couldn't care less about English usage, as a native speaker you would write “so much fun.” Your example would have to be amended, too, but it's a little harder to find an alternative that does not require more words (maybe this is a reason for accepting such monstrosities, but I still don't like them). I would suggest:

“a very enjoyable exercise"

“an exercise that was lots of fun"

Xah Wrote:

thank you Stephen for your kind explanation but inline with alt.usage.english spirit and kind return to Petere Moylan, i object to the use of “lots” because it's contrary to the style in my book. [1]

lots has lots of meanings like a meaning lot, the pedantic lot, almost chosen by lot. When we are in the lot where lots meanings are allotted in lots of ways like lottery, it's a lot of trouble to decipher, and is not fun, despite lots of right in front.

[1] Lee, Xah. The Writing Style on XahLee.org @ http://xahlee.org/Periodic_dosage_dir/bangu/xah_style.html

Athel Cornish-Bowden 〔acorn…@ifr88.cnrs-mrs.fr〕 wrote:

Look. Are you interested in English usage, or not? If Peter says we don't say “very fun” then we don't say “very fun”.

Xah wrote:

… my writing is razor blades in hot buns to grammarians, chocking dagger to mouthing moralists, logic bomb to irreflecting morons, eye opener to epochal theorists, immaculate calculus to logicians, euphoric oxygen to English masters, orgasmic honey to poetic chicks. That is to say, when i wanna be on the right occasion, too.

you see, English under me is like a love slave. I say jump and she jumps, I say kiss and she kisses. And when i need to vent, she bends double and pleads cum. Of course, it is not to say my theories are unerring or i'm impeccable or sans foibles and grammatical trespassings. But all things considered…

it is often the case,
that i do contemplate,
the degree of cockiness,
that i should exhibit myself.

if showing-off too much,
i then beget revulsion.
if i show not enough,
then i'm not man enough.

therefore i trounce,
when being pressed,
on the delicate balance,
that i might have trashed.

now you understand,
'tis not my loftiness,
but my frailties,
that you should endorse.

on the other hand,
with regard to the universe,
my name is Xah Lee,
and i'm still matchless.

— Xah Lee, 2004

This is originally a thread in newsgroup “alt.usage.english” (2011-07-19) @ Source groups.google.com

new blog: Xah's Belles-lettres Blog

Started a new blog: Xah's Belles-lettres Blog @ http://xahlee.org/lit/blog.html.

It's on topics of {vocabulary, english writing, literature, languages, linguistics matters}. (go there to read the detail.)

If you want to subscribe just those, go there subscribe there. New articles there may or may not be shown here.

2011-07-20

Emacs Lisp: Batch Script to Validate Matching Brackets

Perm url with updates: http://xahlee.org/emacs/elisp_validate_matching_brackets.html

Emacs Lisp: Batch Script to Validate Matching Brackets

Xah Lee, 2011-07-19

This page shows you how to write a elisp script that checks thousands of files for mismatched brackets.

The Problem

Summary

I have 5 thousands files containing many matching pairs. I want to to know if any of them contains mismatched brackets.

Detail

The matching pairs includes these: () {} [] “” ‹› «» 〈〉 《》 【】 〖〗 「」 『』.

The program should be able to check all files in a dir, and report any file that has mismatched bracket, and also indicate the line number or position where a mismatch occurs.

For those curious, if you want to know what these brackets are, see:

For conveniences about selecting or navigating brackets in emacs, see:

For conveniences of inserting unicode brackets (in pairs) in emacs or OS system-wide, see:

Solution

Here's outline of steps.

  • Go thru the file char-by-char, find a bracket char.
  • Check if the one on stack is a matching opening char. If so remove it. Else, push the current onto the stack. (think of stack as stack of books. You put one on top (called “push”), and take one out from top too (called “pop”).)
  • Repeat the above till no more bracket char in the file.
  • If the stack is not empty, then the file got mismatched brackets. Report it.
  • Do the above on all files.

Here's some interesting use of lisp features to implement the above.

Define Matching Pair Chars as “alist”

We begin by defining the chars we want to check, as a “association list” (aka “alist”). Like this:

(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("“" . "”")
                   ("‹" . "›")
                   ("«" . "»")
                   ("【" . "】")
                   ("〖" . "〗")
                   ("〈" . "〉")
                   ("《" . "》")
                   ("「" . "」")
                   ("『" . "』")
                   )
      )

If you care only to check for curly quotes, you can remove elements above. This is convenient because some files necessarily have mismatched pairs such as the parenthesis, because that char is used for many non-bracketing purposes (e.g. ASCII smiley).

A “alist” in lisp is basically a list of pairs (called key and value), with the ability to search for a key or a value. The first element of a pair is called its key, the second element is its value. Each pair is a “cons”, like this: (cons mykey myvalue), which can also be written using this syntax: (mykey . myvalue) for more easy reading.

The purpose of lisp's “alist” is similar to Python's dictionary or Pretty Home Page's array. It is also similar to hashmap, except that alist can have duplicate keys, can search by values, maintains order, and alist is not intended for massive number of elements. Elisp has a hashmap datatype if you need that. (See: Emacs Lisp Tutorial: Hash Table.)

(info "(elisp) Association Lists")

Generate Regex String from alist

To search for a set of chars in emacs, we can read the buffer char-by-char, or, we can simply use “search-forward-regexp”. To use that, first we need to generate a regex string from our matchPairs alist. For example, if we want to search “〈〉《》”, then our regex string should be "〈\\|〉\\|《\\|》".

First, we define/declare the string. Not a necessary step, but we do it for clarity.

(setq searchRegex "")

Then we go thru the matchPairs alist. For each pair, we use “car” and “cdr” to get the chars and “concat” it to the string. Like this:

(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair)) "|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

Then we remove the ending |.

(setq searchRegex (substring searchRegex 0 -1)) ; remove the ending “|”

Then, change | to \\|. In elisp regex, the | is literal. The “regex or” is \|. Elisp does not have a special regex string syntax, it only understands normal strings. So, to feed to regex \|, you need to espace the first backslash. So, the string for regex needs to be \\|. Here's how we do it:

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t t)) ; change | to \\| for regex “or” operation

See also: emacs regex tutorial.

Implement Stack Using Lisp List

Stack is done using lisp's list. e.g. '(1 2 3). The top of stack is the first element. To add to the stack, do it like this: (setq mystack (cons newitem mystack)). To remove a item from stack is this: (setq mystack (cdr mystack)). The stack start as a empty list: '().

For each entry in the stack, we put the char and also its position, so that we can report the position if the file does have mismatched pairs.

We use a vector as entries for the stack. Each entry is like this: (vector char pos). (See: Emacs Lisp Tutorial: List & Vector.)

Here's how to fetch a char from alist, and push to stack, pop from stack.

; check if current char is a closing char and is in our match pairs alist.
; use “rassoc” to check alist's set of “values”. 
; It returns the first key/value pair found, or nil
(rassoc char matchPairs)

; add to stack
(setq myStack (cons (vector char pos) myStack) )

; pop stack
(setq myStack (cdr myStack) )

Complete Code

Here's the complete code.

;; -*- coding: utf-8 -*-
;; 2011-07-15
;; go thru a file, check if all brackets are properly matched.
;; e.g. good: (…{…}… “…”…)
;; bad: ( [)]
;; bad: ( ( )

(setq inputFile "xx_test_file.txt" ) ; a test file.
(setq inputDir "~/web/xahlee_org/p/time_machine/") ; must end in slash

(defvar matchPairs '() "a alist. For each pair, the car is opening char, cdr is closing char.")
(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("“" . "”")
                   ("‹" . "›")
                   ("«" . "»")
                   ("【" . "】")
                   ("〖" . "〗")
                   ("〈" . "〉")
                   ("《" . "》")
                   ("「" . "」")
                   ("『" . "』")
                   )
      )

(defvar searchRegex "" "regex string of all pairs to search.")
(setq searchRegex "")
(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair)) "|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

(setq searchRegex (substring searchRegex 0 -1)) ; remove the ending “|”

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t t)) ; change | to \\| for regex “or” operation

(defun my-process-file (fpath)
  "process the file at fullpath FPATH …"
  (let (myBuffer myStack ξchar ξpos)

    (setq myStack '() ) ; each entry is a vector [char position]
    (setq ξchar "") ; the current char found

    (when t
      ;; (not (string-match "/xx" fpath)) ; in case you want to skip certain files

      (setq myBuffer (get-buffer-create " myTemp"))
      (set-buffer myBuffer)
      (insert-file-contents fpath nil nil nil t)

      (goto-char 1)
      (while (search-forward-regexp searchRegex nil t)
        (setq ξpos (point)  )
        (setq ξchar (buffer-substring-no-properties ξpos (- ξpos 1))  )

        ;; (princ (format "-----------------------------\nfound char: %s\n" ξchar) )

        (let ((isClosingCharQ nil) (matchedOpeningChar nil) )
          (setq isClosingCharQ (rassoc ξchar matchPairs))
          (when isClosingCharQ (setq matchedOpeningChar (car isClosingCharQ) ) )

          ;; (princ (format "isClosingCharQ is: %s\n" isClosingCharQ) )
          ;; (princ (format "matchedOpeningChar is: %s\n" matchedOpeningChar) )

          (if
              (and
               (car myStack) ; not empty
               (equal (elt (car myStack) 0) matchedOpeningChar )
               )
              (progn
                ;; (princ (format "matched this top item on stack: %s\n" (car myStack)) )
                (setq myStack (cdr myStack) )
                )
            (progn
              ;; (princ (format "did not match this top item on stack: %s\n" (car myStack)) )
              (setq myStack (cons (vector ξchar ξpos) myStack) ) )
            )
          )
        ;; (princ "current stack: " )
        ;; (princ myStack )
        ;; (terpri )
        )

      (when (not (equal myStack nil)) 
        (princ "Error file: ")
        (princ fpath)
        (print (car myStack) )
        )
      (kill-buffer myBuffer)
      )
    ))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah match pair output*" )
  (with-output-to-temp-buffer outputBuffer
    ;; (my-process-file inputFile) ; use this to test one one single file
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$")) ; do all html files
    (princ "Done deal!")
    )
  )

I added many comments and debug code for easy understanding. If you are not familiar with the many elisp idioms such as opening file, buffers, printing to output, see: Emacs Lisp Idioms (for writing interactive commands)Text Processing with Emacs Lisp Batch Style.

To run the code, simply open it in emacs. Edit the line at the top for “inputDir”. Then call “eval-buffer”.

Here's a sample output:

Error file: c:/Users/h3/web/xahlee_org/p/time_machine/Hettie_Potter_orig.txt
[")" 3625]
Error file: c:/Users/h3/web/xahlee_org/p/time_machine/Hettie_Potter.txt
[")" 2338]
Error file: c:/Users/h3/web/xahlee_org/p/arabian_nights/xx/v1fn.txt
["”" 185795]
Done deal!

The weird ξ you see in my code is greek x. I use unicode char in variable name for experimental purposes. You can just ignore it. (See: Programing Style: Variable Naming: English Words Considered Harmful.)

Advantages of Emacs Lisp

Note that the great advantage of using elisp for text processing, instead of {perl, python, ruby, …} is that many things are taken care by the emacs environment.

I don't need to write code to deal with file encoding (emacs automatically does it). No reading file is involved. Just “open” or “save” the file. Processing a file is simply moving cursor thru characters or lines, changing parts of it. No code needed for doing safety backup. Emacs automatically does backup if you made any changes, and can be turned off by setting the builtin var “make-backup-files” to nil. For file paths in the output, you can easily open it by a click or key press. I can add just 2 lines so that clicking on the error char in the output jumps to the location in the file.

Any elisp script you write inside emacs automatically becomes a extension of emacs and can be used in a interactive way. Or, you could run it in a command line shell, e.g. emacs --script process_log.el.

This problem is posted to a few comp.lang newsgroups as a fun challenge. See: Lisp, Python, Perl, Ruby Code to Validate Matching Brackets.

2011-07-19

emacs tip: inserting source code in org-mode

emacs tip: inserting source code in org-mode

When using org-mode, you can insert a snippet of proraming language code.

Type <s then Tab. It will insert this markup:

#+begin_src ▮

#+end_src

The above is the syntax for literal text. (similar to the concept of perl and PHP'S Heredoc.)

Then, type “emacs-lisp” so you have:

#+begin_src emacs-lisp

#+end_src

this tells org-mode this snippet is emacs-lisp code. The “emacs-lisp” there can be any mode. e.g. “html”, “perl”,“haskell”, etc. (technically: the value of the variable “major-mode”, then minus the “-mode” string at the end.)

orgmode.org 15.2 Easy Templates

For basics of org-mode, see: Emacs: outline-mode and org-mode tutorial.

2011-07-18

Emacs Lisp: Processing HTML: Transform Tags from ‹span class=w› to ‹b›

Perm url with updates: http://xahlee.org/emacs/elisp_batch_html_tag_transform_bold.html

Emacs Lisp: Processing HTML: Transform Tags from ‹span class=w› to ‹b›

Xah Lee, 2011-07-18

This page shows a simple practical elisp script for HTML tag transformation.

The Problem

Summary

I want batch transform the tag <span class="w">xyz</span> to <b>xyz</b>, for over a hundred files, and print a report of the changes so that i can scan to make sure there's no errors. (for example, in the case that the HTML file has a mismatched span tag.)

Detail

In my English vocabulary and literature study projects, many interesting words are marked up by this tag: <span class="w">xyz</span>. With CSS, it is rendered in bold. I think that markup is too elaborate, and i want to replace it simply with <b>xyz</b>, for over a few hundred files.

Sidenote

The following is a little sidenote on why i had “span.w” in the first place. (you can skip this section.)

I have the following “span” markups: { “span.w”, “span.b”, “span.r” }. The “span.w” means interesting word that's new, rendered as bold. They are typically difficult words new to me.

Sometimes many college-level words are still interesting, and i want to highlight them too, for highschool or ESL students and myself. Sometimes these are familiar words but used in a sense that's not commen (e.g. “seedy” hotel). For these words, i markup with “span.b”. They are rendered in blue (they are typically college level words).

The “span.r” is for highlighting interesting {word, phrase, sentence} of the work, not necessarily for vocabulary study purposes. e.g. a interesting thought, quotable passage, interesting deviation from standard grammar. They are rendered in red.

As a example of how i used these markups, here's a excerpt from Gulliver's Travels. PART I — A VOYAGE TO LILLIPUT. Quote:

The declivity was so small, that I walked near a mile before I got to the shore, which I conjectured was about eight o'clock in the evening. I then advanced forward near half a mile, but could not discover any sign of houses or inhabitants; at least I was in so weak a condition, that I did not observe them. I was extremely tired, and with that, and the heat of the weather, and about half a pint of brandy that I drank as I left the ship, I found myself much inclined to sleep. I lay down on the grass, which was very short and soft, where I slept sounder than ever I remembered to have done in my life, and, as I reckoned, about nine hours; for when I awaked, it was just day-light. I attempted to rise, but was not able to stir: for, as I happened to lie on my back, I found my arms and legs were strongly fastened on each side to the ground; and my hair, which was long and thick, tied down in the same manner. I likewise felt several slender ligatures across my body, from my arm-pits to my thighs. I could only look upwards; the sun began to grow hot, and the light offended my eyes.

  • Note the word “declivity”, rendered in bold. (primary interesting words)
  • Note the word “ligatures”, rendered in blue. (secondary interesting words)
  • Note the phrase “light offended my eyes”, rendered in red. (interesting phrase and usage.)

Here's some annotated works you might find interesting:

Solution

Here's outline of steps.

  • Open the file. Use regex to search the span markup.
  • Make the replacement.
  • Add the replacement to a list, for later report.
  • Repeat the above.
  • Use a dir traverse function to apply the above to every file.
  • When done, print the list of changes.

Here's the code:

;; -*- coding: utf-8 -*-
;; 2011-07-18
;; replace <span class="w">…</span> to <b>…</b>
;;
;; do this for all files in a dir.

(setq inputDir "~/web/xahlee_org/PageTwo_dir/Vocabulary_dir/" ) ; dir should end with a slash

(setq changedItems '())

(defun my-process-file (fpath)
  "process the file at fullpath FPATH ..."
  (let (mybuff myword)
    (setq mybuff (find-file fpath))

    (widen)
    (goto-char 0) ;; in case buffer already open

    (while (search-forward-regexp "<span class=\"w\">\\([^<]+?\\)</span>" nil t)
      (setq myword (match-string 1))
      (when (< (length myword) 15) ; a little double check in case of possibe mismatched tag
        (replace-match (concat "<b>" myword "</b>" )  t) 
        (setq changedItems (cons (substring-no-properties myword) changedItems ) )
        ) )

    ;; close buffer if there's no change. Else leave it open.
    (when (not (buffer-modified-p mybuff)) (kill-buffer mybuff) )
    ) )

(require 'find-lisp)

(setq make-backup-files t)
(setq case-fold-search nil)
(setq case-replace nil)

(let (outputBuffer)
  (setq outputBuffer "*xah span.w to b replace output*" )
  (with-output-to-temp-buffer outputBuffer
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (print changedItems)
    (princ "Done deal!")
    )
  )

The above is fairly easy to understand. You might refresh the elisp basics at: Text Processing with Emacs Lisp Batch Style and Emacs Lisp Idioms (for writing interactive commands)

Here's the output: elisp_batch_html_tag_transform_bold_output.txt.

There are over 1k changes. The output is extremely useful because i can just take a few seconds to glance at the output to know there are no errors. Errors are possible because whenever using regex to parse HTML, a missing tag in HTML can mean disaster, or even a unexpected nested tag.

PS I run a Word-English blog. If you are interested in vocabulary, please subscribe at: Wordy English — the Making of Belles-Lettres.

2011-07-17

Dalai Lama meeting President Obama

Dalai Lama Meeting President Obama

Recently the Dalai Lama went to meet president Obama. See:

  • Obama meets with Dalai Lama: US “does not support independence for Tibet” (2011-07-16) By Xeni Jardin. @ Source www.boingboing.net

folks, truth must be known. Li Ao has pointed out several things about Dalai Lama and Tibet issues: Li Ao on Tibet and Dalai Lama.

If you take the time to read it and take it seriously, then it's quite shocking. The question is: are Li Ao's statements actually verifiable facts? That, you'll have to decide yourself.

Test your English vocabulary size

Test your English Vocabulary Size

A very nice vocabulary testing page. Test your vocabulary here: http://testyourvocab.com/.

There are 3 pages, but actually just 2 pages of testing. (the 3rd page is optional survey on Age, Gender, etc.) Be sure to take 10 minutes for the test. And, be honest. Don't check if you are not certain about the word's meanings.

After you've done the test, it'll give you a score, which is a estimate of how many words you know. My score, turns out to be 24.5k words (second time 26.3k). This score is actually below average english-speaking adults, according to their survey. Quote:

Based on over 8,000 participations so far, we've got some initial statistics already. Most adults fall in the range 20,000–35,000, with the exact median score being 27,123 words.

I went thru the list carefully. The first page is trivial, the hard ones are in second page. On second page, almost none of the words in the last column i know of. But most others i've seen, half of it i forgot what it means without context. Those i didn't check-mark.

I was rather surprised by my below-average score, since i have 2 decades obsession with vocabulary size. From my experience, my vocabulary size is probably average, or slight above college educated adults. I'm guessing, most people simply put a check-mark on words they think they know, but actually is wrong. If the test actually give a multiple-choice question for each word, am sure the score will be much lower.

See also, collection of about 5k words with usage examples. Wordy English — the Making of Belles-Lettres

Take the test, and comment to let me know what you think of it!

Little Parser Problem Challenge: Matching Pairs Validation

Perm url with updates: http://xahlee.org/comp/validate_matching_brackets.html

Lisp, Python, Perl, Ruby Code to Validate Matching Brackets

Xah Lee, 2011-07-21

This is a preliminary report on scripts of several languages to validate matching brackets.

Problem Description

Little Parser Problem Challenge: Matching Pairs Validation

The problem is to write a script that can check a dir of text files (and all subdirs) and reports if a file has any mismatched matching brackets.

  • The files will be utf-8 encoded (unix style line ending).
  • If a file has mismatched matching-pairs, the script will display the file name, and the line number and column number of the first or last instance where a mismatched bracket occures. (or, just the char position (as in emacs's “point”)) Exactly which position is considered as the “first” or “last” doesn't matter much, as long as it report a char that breaks the nesting matching pair syntax.
  • The matching pairs are all single unicode chars. They are these and nothing else: () {} [] “” ‹› «» 【】 〈〉 《》 「」 『』 . Note that ‘single curly quote’ is not consider matching pair here.
  • You script must be standalone. Must not be using some parser tools. But can call lib that's part of standard distribution in your lang.

Here's a example of mismatched bracket: ([)], (“[[”), ((, 】etc. (and yes, the brackets may be nested. There are usually text between these chars.)

I'll be writing a emacs lisp solution and post in 2 days. Ι welcome other lang implementations. In particular, perl, python, php, ruby, tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp (clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure, Scala, C, C++, or any others, are all welcome, but i won't be able to eval it. javascript implementation will be very interesting too, but please indicate which and where to install the command line version.

I hope you'll find this a interesting “challenge”. This is a parsing problem. I haven't studied parsers except some Wikipedia reading, so my solution will probably be naive. I hope to see and learn from your solution too.

i hope you'll participate. Just post solution here. Thanks.

Solutions

Emacs Lisp

Detailed explanation at Emacs Lisp: Batch Script to Validate Matching Brackets.

Python

This report is incomplete. So far Raymond Hettinger's python 3 code is the only working code. None of the following works on my machine.

For the original post of this problem and the discussion, see: a little parsing challenge ☺ (2011-07-17) @ Source groups.google.com.

Thanks to the many who have written code and made helpful comments. I may come back to clean this up, in the coming weeks. If you can correct one of the following programs, please comment.

Pending Solutions

Python

Ruby

Perl

Common Lisp