starting a emacs community?

My emacs tutorial, is getting somewhat popular. I like to thank all the people who came and read it, and also many who has written to me and given feedback.

Over the past 13 years since i have a website, i got a lot people who have written to me in private email. Sometimes quite lengthy, and some are quite personal too. In early years (1997 to roughly 2004), vast majority are due to my math stuff. Though, there's always been a problem with keeping in touch. Often, i get their email, i heartily reply, and we may exchanged a few more emails, but often it's hard to keep in touch. This is the days before twitter and facebook. So, a lot of people i would really like to know, and become friends, are basically lost. But also, am not much of a social person. I don't go out much. I've visited bars maybe once in every 5 years. The handful friends i might have in my life, the relationship is almost all platonic and very much intellectual.

For the emacs blog, i've been thinking to start some sort of community. Perhaps a mailing list, or a online forum, so that people can gather, ask questions, make friends, or even rant back at some of my rants. But also, part of this desire is to monetize what value i might have created. The donation box brings me few bucks a month. Good for a few cup of good coffee. ☺

Right now, there are few existing emacs communities. There's newsgroup “comp.emacs”, which is pretty dead, maybe a few post per week. There's newsgroup “gnu.emacs.help”, mostly limited to Emacs questions specific to GNU Emacs. Perhaps 4 messages per day. There's emacswiki.org, which is mostly a wiki. Looking at the past few days, it got perhaps 5 edits per day by different people. There's http://planet.emacsen.org/, which is a collection of about 80 emacs blogs, perhaps 2 posts per day. There's some emacs related twitter groups. There's also “#emacs” on irc.freenode.net.

So, am thinking, perhaps it'd be nice to start some form of forum with web 2.0 technology where emacsers can all gather and make a community. We can ask each other questions and learn together. I really love emacs.

One good success is stackoverflow.com, which in just a few years become really popular and effective at getting all sort of programing questions answered.

Right now xahlee.org is a static site. No no forum or any type of interaction. Am not sure what technology i can use with my hosted server to make it Web 2.0. Also, i wish to keep the html valid, but i think that's hopeless. Possibly i could start a forum using WordPress.

So, would this be evil? Is there a need? Let me know what you think.

Short Intro of Mathematica For Lisp Programers (list processing example)

Perm url with updates: http://xahlee.org/UnixResource_dir/writ/notations_mma.html

Short Intro of Mathematica For Lisp Programers (list processing example)

Xah Lee, 2008-08

The following is a exposition on the how programing language Mathematica is used for a list processing problem. The essay particulars details the nature of nested syntax in comparison to lisp. This article is originally posted to “comp.lang.lisp” newsgroup.

Cortez wrote:

I need to traverse a list of lists, where each sublist is labelled by a number, and collect together the contents of all sublists sharing the same label. So if I have the list -

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r) (5 s t))

where the first element of each sublist is the label, I need to produce -

((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

I do this with the following -

(defun test (list)
  (loop for j in list
          for index = (first j)
          for k = (rest j)
          with indices = nil
          if (not (member index indices))
            do (pushnew index indices)
            and collect k into res
            do (nconc (nth index res) k)
          finally (return res)))

I suspect that there is a more efficient and elegant way of doing this, however. Any suggestions welcome.

Brief background: this is part of a program I've written for reading data from SDIF files, a binary format which stores sound description data. The labelled lists represent partials in spectral analysis data (partial-index, time, frequency).

Here's how one'd do it in Mathematica.

define the list:


then do this:

Sort@mylist //. {f___,x_[a__],x_[b__],l___} -> {f,x[a,b],l}

output is:

{0[a, b], 1[c, d, i, j], 2[e, f, k, l, o, p], 3[g, f], 4[m, n, q, r], 5[s, t]}

if you want the result cleaned up so that the integer labels are removed, do like this

result /. _Integer[b___] -> {b}


{{a, b}, {c, d, i, j}, {e, f, k, l, o, p}, {g, f}, {m, n, q, r}, {s, t}}


The sort@mylist is syntactically equivalent to Sort[mylist]. It just sorts it. The result is:

{0[a, b], 1[c, d], 1[i, j], 2[e, f], 2[k, l], 2[o, p], 3[g, f], 4[m, n], 4[q, r], 5[s, t]}

The //. {f___,x_[a__],x_[b__],l___} -> {f,x[a,b],l} means use a pattern matching so that if ajacent elements has the same head, merge them into one.

The shortcut syntax for structural transformation used above is this:

myExpr //. myPattern -> myNewForm

The “myExpr” is any expression. The “myPattern” is any structural pattern (i.e. like regex except it work on list structures and datatypes). The “myNewForm” is like the regex's replacement string.

The syntax myExpr //. myPattern -> myNewForm can also be written in purely nested form, like this:

ReplaceRepeated[myExpr, Rule[myPattern, myNewForm]]

Now, here's some explanation on the the shortcut syntax used to match patterns:

  • _ means any single symbol. FullForm syntax is Blank[].
  • __ means any 1 or more symbols. FullForm syntax is BlankSequence[].
  • ___ means any 0 or more symbols. FullForm syntax is BlankNullSequence[].

x_ is a syntax shortcut for Pattern[x,Blank[]], meaning it is a pattern, to be named “x”, and the pattern is Blank[]. We name the pattern so later you can refer to the captured element as “x”.

f___ is a syntax shortcut for Pattern[f,BlankNullSequence[]]. It means a pattern that matchs 0 or more elements, and any expression that matches can be later refered to as “f”.

The x_[a__] means basically a expression with 1 or more elements. The head we named “x”, and its elements we named “a”. FullForm syntax is Pattern[x, Blank[]][Pattern[a, BlankSequence[]]]. Similar for x_[b__].

So, all together, the {f___,x_[a__],x_[b__],l___} just matches a list and capture neighbors that has the same head.

The {f,x[a,b],l} just means the new form we want. Note the f,x,a,b,l are just names we used for the captured pattern.

So now, Sort@mylist //. {f___,x_[a__],x_[b__],l___} -> {f,x[a,b],l} gives us the desired result.

Now to clean up the head that function as integer labels, we do:

result /. _Integer[b___] -> {b}

which is another expression transformation with pattern. The _Integer[b___] just means any list who's head is of Integer type, and its element we name “b”. Any expression matching it is replaced by a list of its elements, expressed as {b}.

Now, all the above may seem like weired syntax. But actually it has a purely nested form.

For example, this whole expression:

Sort@mylist //. {f___, x_[a__], x_[b__], l___} -> {f, x[a, b], l}

is syntactically equivalent to this:

       Pattern[f, BlankNullSequence[]], 
       Pattern[x, Blank[]][Pattern[a, BlankSequence[]]], 
       Pattern[x, Blank[]][Pattern[b, BlankSequence[]]], 
       Pattern[l, BlankNullSequence[]]], 
  List[f, x[a, b], l] ] ]

In a lisp form, it'd be:

 (Sort mylist)
   (Pattern f (BlankNullSequence))
   ((Pattern x (Blank)) (Pattern a BlankSequence))
   ((Pattern x (Blank)) (Pattern b BlankSequence))
   (Pattern l (BlankNullSequence)))
  (List f (x a b) l) ) )

That's some power of fully nested, REGULAR, syntax.

Now, Qi support pattern matching in common lisp. I wonder if the above can be translated to Qi.


Sourav Mukherjee supplies this purely functional solution in R5RS Scheme lisp. Sourav_Mukherjee_sourav.work_gmail.scm

Pirate Bay, Open Source, Free Software, Copyright

Perm url with updates: http://xahlee.org/Periodic_dosage_dir/pirate_bay.html

Pirate Bay, Open Source, Free Software, Copyright

Xah Recommends:
Amazon Kindle. Read books under the sun. Review

Xah Lee, 2010-09-25

Read: The pirate bay.

Basically, it's a website for illegal downloading of stuff. Music, movies, games, software, books.

The mob want things free. So, there's a movement from the mob, especially from the poor and young who has nothing to lose (such as yourself), that says copyright or patent should be eliminated, that digital goods such as software should be shared. The excuse for this greed is goodness for society, such as freedom to share, and chanting in the ways of “evil” and “greed” of “BIG corporations”.

The website, as a indirect medium for others to steal anonymously, of course quickly become the most popular. Ranked in the world at 92th. It ran ads, and due to the traffic, the site is making in the ballpark of 1 million USD a year. (estimates vary) The people of the site, of course went political defending their actions, that the reason they do it is for “the goodness of humanity”, of course. According to Wikipedia, it seems some politicians have donated large sums of money anonymously, for their own selfish motives.

I think the copyright and patent laws, certainly should be modified today for digital goods. Fundamentally, because digital goods are new. Unlike physical goods, digital goods can be copied in unlimited ways and effectively without cost. But if we completely ban copying, we are also eliminating the opportunity for billions of other people who might also benefit from the product but otherwise wouldn't buy it.

But, writing a good book, thinking and writing a good song, producing a movie, usually costs thousands or millions of dollars. Writer takes years of training, paying for college, etc. Same for software programer and song writers, singers. Money is the primary incentive we produce. If producer can't make any money because whatever he produces can be freely obtained on piratebay, would we still have much good books, songs, or movies?

However, i think much of these file-sharing sites, and to some extend the Free Software and Open Source movements, are scumbags in society who just passed alone and seized a opportunity for their own “right” to get things for free. See: Anti-copyright.

This “free info” movement is what drove the Death Of Encarta.

See also:

The Problems of Open Source

A insightful article:

〈The Problems of Open Source〉 (2009). By Dr Mark Tarver. At: Source: lambdassociates.org.

Mark Tarver is the creator of the Qi lisp language, a lisp variant based on Common Lisp and major functional lang features comparable to OCaml/F#, Haskell . (See also: Qi Language Logo)

Here's excerpt from his article:

Free As in Free Speech becomes Free As in Free Beer


We need to get this one out of the way. “Free as in speech” does not mean “free as in beer”, but in fact if you look for GPL software it is almost always “free as in beer”. It's not hard to see why. If you try to sell GPL software, it is possible for your punter to buy it, write some trivial change and resell it under GPL for less, undercutting you. By parity of reasoning this can be repeated down the chain until the price tends to zero. So designers using the GPL nearly always make their work free.

Free Software proponents wouldn't admit, that if “FSF Free” Software is actually not $free$, it would immediatly lose all its force, and become obscure perhaps within a year in human society. In other words, the $free$ is used to sell FSF's concept of “Software Freedom”, because otherwise this ideology has no place to go in the real world, much like the situation in communism (which in practice is forced upon people by dictators.).

Note that the word “communism”, in Chinese is “共产主义”. The word 共 means “share; public”, the word 产 means “properties; resources; production”, the word 主义 means school of thought, philosophy.

Was this page useful? If so, please do donate $3, thank you donors!

Fredryk Phox (comedy video)

Perm url with updates: http://xahlee.org/funny/fredryk_phox.html

Fredryk Phox

One of Fredryk Phox's video.

Fredryk Phox seems to be a a internet personality. You can read more about him here:


Google's “following” a blog vs subscribe

What's the diff between Google's “following” a blog vs subscribe?

If you are using Google's services, you can “follow” a blog published on google's blogger. When you follow, that means, if the blog has a “follow widget”, you will show up there. (similar to Twitter's “followers”.) Follow also means that you are automatically subscribed to the blog, from your Google Reader.

Here's Google Help on the topic: Source.

Not that the “follow” is exclusively a Google thing. Blogs not hosted by Google doesn't have that concept. Google Reader also lets you “like” any particular blog article. “Like” is something like a “up vote” in many websites. Author can also see how many times a particular article has been “liked”, if they are using Google's service.

Open Source Freesoftware problems

A insightful article:

〈The Problems of Open Source〉 (2009). By Dr Mark Tarver. At: Source: lambdassociates.org.

Mark Tarver is the creator of the Qi lisp language, a lisp variant based on Common Lisp and major functional lang features comparable to OCaml/F#, Haskell . (See also: Qi Language Logo)

Here's excerpt from his article:

Free As in Free Speech becomes Free As in Free Beer


We need to get this one out of the way. “Free as in speech” does not mean “free as in beer”, but in fact if you look for GPL software it is almost always “free as in beer”. It's not hard to see why. If you try to sell GPL software, it is possible for your punter to buy it, write some trivial change and resell it under GPL for less, undercutting you. By parity of reasoning this can be repeated down the chain until the price tends to zero. So designers using the GPL nearly always make their work free.

Free Software proponents wouldn't admit, that if “FSF Free” Software is actually not $free$, it would immediatly lost all its force, and become obscure perhaps within a year in human society. In other words, the $free$ is used to sell “Freedom”, because otherwise this ideology has no place to go in the real world, much like the situation in communism (which in practice is forced upon people by dictators.).

See also:

lisp, python, lojban, tidbits

PLT Scheme lisp is now named Racket. See: http://racket-lang.org/new-name.html.

Also, here's a classic piece: The Fate of Lambda in Python 3000 and Scheme v300.

Discovered that Guy L Steele , most famous as a inventor of Scheme lisp, and Robert J Chassell, best known as the author of 《An Introduction to Programming in Emacs Lisp》 amazon, both are apparantly lojban speakers! Yay! See: http://www.lojban.org/files/papers/4thtense.

What is lojban? see intro at: Xah's lojban Tutorial.


3DXM new version


Mathematicians Richard Palais and Hermann Karcher, have released a new version of their math visualization software, the 3DXM. The main change is that it now has button-like interface in place of menus, where each button is a icon of the surface or math subject. This makes it much more attractive, and easier to use. Check it out.

3dxm 2

3DXM screenshot.

Note: the new version is for Mac only. For Windows or Linux users, there's always the Java version at the same download location. Though, the Java version has only some 50% of surfaces or other math objects.

Difference Between Emacs's “(getenv PATH)” and “exec-path”

Perm url with updates: http://xahlee.org/emacs/emacs_env_var_paths.html

Difference Between Emacs's “(getenv PATH)” and “exec-path”

Xah Lee, 2009-08-04, 2010-09-23

This page explains the mechanisms of setting environment variables in emacs, especially if you have problems in Windows emacs of getting aspell or other unix utils to run.

  • When you start emacs from a shell, emacs inherits shell's environment variables. (true on Windows, Mac, Linux)
  • On Windows, when you start emacs from GUI, emacs also inherit environment variables, from the Registry.
  • On Mac OS X, when you start emacs from GUI, emacs does not inherit environment variables from your shell, but does inherit the system-wide environment variables from 〔~/.MacOSX/environment.plist〕.
  • On Mac OS X, you can start GUI emacs from shell, like this: 「nohup /Applications/Emacs.app/Contents/MacOS/Emacs &」.

If you are not familiar with env var on Windows or Registry, see:

Setting Environment Variable Within Emacs

You can also set environment variables within emacs without actually setting them in the OS. To do so, use this sample code:

; show env var named path
(getenv "PATH")

; example of setting env var named “path”
; by appending a new path to existing path
(setenv "PATH"
   "C:/cygwin/usr/local/bin" ";"
   "C:/cygwin/usr/bin" ";"
   "C:/cygwin/bin" ";"
   (getenv "PATH")

In some situations, it's better to set some env var inside emacs for emacs only. This way, you are free to have a different env var value in your cmd.exe or cygwin shell, independent of emacs.

Emacs's “exec-path”

Emacs has a variable named “exec-path”. Its value is a list of dir paths. Emacs uses “exec-path” to find executable binary programs. For example, when spell checking, emacs will try to find ispell or aspell in exec-path. When you press 【Z】 to compress file in dired, emacs will try to find gzip or gunzip in exec-path. When you type 【Alt+x diff】 or 【Alt+x grep】or 【Alt+x shell】, emacs will try to find the program in exec-path too.

If emacs complains that it cannot find ispell, aspell, ftp, gzip, etc, the problem is probably with your “exec-path”.

By default, emacs copies the value of 「(getenv "PATH")」 to “exec-path”. So, their values should be identical.

Difference between “exec-path” and “PATH”

The value of “PATH” is used by emacs when you are running a shell in emacs, similar to when you are using a shell in a terminal.

The “exec-path” is used by emacs itself to find programs it needs for its features, such as spell checking, file compression, compiling, grep, diff, etc.

If you did set the PATH env var within emacs, you probably also want to adjust your “exec-path”. Here's a example of setting exec-path:

(when (string-equal system-type "windows-nt")
  (setq exec-path
"C:/Program Files (x86)/Emacs/emacs/bin/"
"C:/Program Files (x86)/Emacs/EmacsW32/gnuwin32/bin/"

The value of 「(getenv "PATH")」 and “exec-path” do not need to be the same.

As of today (2009-08-04), my emacs has this setup:

(when (string-equal system-type "windows-nt")
    ;; am using cygwin
    (setenv "PATH"
             "/usr/local/bin" ":"
             "/usr/bin" ":"
             "/bin" ":"
             "/usr/X11R6/bin" ":"
             "/cygdrive/c/Windows/Program Files (x86)/PHP/" ":"

             "/cygdrive/c/Windows/system32" ":"
             "/cygdrive/c/Windows" ":"
             "/cygdrive/c/Windows/System32/Wbem" ":"
             ) )
    (setq exec-path
            "C:/Program Files (x86)/Emacs/emacs/bin/"
            "C:/Program Files (x86)/PHP/"
            "C:/Windows/system32/WindowsPowerShell/v1.0/" )
          ) ) )

Note: the above example works for me in Windows Vista, but is not necessarily ideal.

For example, the path syntax used are different. Some uses “/” while others uses “\”, and some contains the drive name while others doesn't. Some with lower case drive letter, while other don't. Emacs has a wrapper of path conventions to OS, cygwin has its own, and also has some drive mapping mechanism. It is not clear to me how which path convention is best used in elisp to set PATH or exec-path.

See also:

Was this page useful? If so, please do donate $3, thank you donors!

Chinese Pinyin Letter Frequency and Dvorak Layout

Perm url with updates: http://xahlee.org/Periodic_dosage_dir/bangu/pinyin_frequency.html

Chinese Pinyin Letter Frequency and Dvorak Layout

Xah Lee, 2005-09-01

The following is a letter frequency of Chinese in pinyin. The purpose of this study is to find out whether the Dvorak Keyboard Layout is efficient for inputing Chinese with pinyin too.

 4   9714
 2   7137
 1   6805
 3   5125
 5   1547

 i  12620
 n  11269
 a   9314
 u   7075
 g   6922
 e   6851
 h   6815
 o   5519
 z   3545
 d   3363
 s   2585
 y   2571
 j   2299
 l   1522
 b   1422
 x   1361
 c   1150
 w   1097
 r   1073
 m    930
 f    925
 t    881
 q    717
 k    448
 p    255
 v     12 (v is u umlaut as in nv (woman) etc)

This table is compiled by Dylan Sung, taken from his post in newsgroup sci.lang of 2005-08-27, subject: “Letter frequency of Chinese pinyin”. (Source)

Originally, i'm curious about frequency of pinyin because i'm wondering whether Dvorak keyboard is also very efficient in typing pinyin than qwerty.

Pinyin on Dvorak Keyboard Pinyin on QWERTY

Arrangement of the Dvorak Keyboard Layout and the traditional QWERTY. The red are the most frequent letters used in pinyin, followed by yellow, then green.

For the list of letter frequencies of English text, see Wikipedia: Letter frequencies.


The following data are from http://fatduck.org/dvorak/, accessed on 2010-09-22. The author is 潘永之.

The following tables are letter distributions on qwerty and dvorak. The input is a 403 words chinese blog written in pinyin.


q 0.56% w 2.01% e 6.51% r 0.32% t 1.77% y 2.81% u 7.40% i 12.70% o 6.51% p 0.16% 40.76%
a 13.75% s 1.53% d 3.54% f 0.72% g 4.18% h 6.27% j 1.77% k 0.80% l 2.41% 34.97%
z 1.93% x 1.61% c 3.22% v 0.00% b 2.01% n 8.52% m 1.45% , 1.61% . 3.94% 24.28%


' 0.00% , 1.61% . 3.94% p 0.16% y 2.81% f 0.72% g 4.18% c 3.22% r 0.32% l 2.41% 19.37%
a 13.75% o 6.51% e 6.51% u 7.40% i 12.70% d 3.54% h 6.27% t 1.77% n 8.52% s 1.53% 68.49%
q 0.56% j 1.77% k 0.80% x 1.61% b 2.01% m 1.45% w 2.01% v 0.00% z 1.93% 12.14%

The following is distribution of qwerty and dvorak. The input file is all characters in GB2312, a total of 6727 chars. (chinese_characters_GB2312.txt)


q 1.54% w 0.97% e 4.94% r 0.53% t 1.29% y 2.78% u 9.94% i 13.26% o 6.11% p 1.10% 42.46%
a 11.80% s 2.29% d 1.54% f 0.97% g 6.53% h 6.25% j 2.49% k 0.94% l 2.25% 35.06%
z 2.63% x 1.93% c 2.06% v 0.12% b 1.52% n 12.88% m 1.35% 22.48%


' 0.00% , 0.00% . 0.00% p 1.10% y 2.78% f 0.97% g 6.53% c 2.06% r 0.53% l 2.25% 16.22%
a 11.80% o 6.11% e 4.94% u 9.94% i 13.26% d 1.54% h 6.25% t 1.29% n 12.88% s 2.29% 70.30%
q 1.54% j 2.49% k 0.94% x 1.93% b 1.52% m 1.35% w 0.97% v 0.12% z 2.63% 13.48%

See also: Chinese Input with Dvorak Layout (Microsoft Pinyin IME).


clojure lisp and f# interview

Just watched a great video:

〈ELC 2010: Rich Hickey and Joe Pamer - Perspectives on Clojure and F#〉 (2010-08-09) http://channel9.msdn.com/blogs/charles/emerging-langs-clojure-and-f

It's 24 minutes. It's interview with Clojure inventor Rick Hickey, and Microsoft's F# compiler writer Joe Pamer.

Also: Robin Milner is dead. (1934-2010) He is known as the father of ML (OCaml, F#)

speech recognition software

Today i started to use Microsoft's speech recognition software. It comes with Windows Vista and Windows 7. To use it, you will need a good microphone or headset. (See: Gaming Headset Reviews.)

To open it, just go to the “Control Panel”, then choose “Speech Recognition Options”. Then, you might want to go thru the various options and go thru 30 min of training and learning how to use it.

You can use it to control windows and menu in some basic way, but I'd use it mostly for dictation when writing essays and emails. Not sure how well it will work out, but i'll give it a shot.

HTML6, Your HTML/XML Simplified

Perm url with updates: http://xahlee.org/comp/html6.html

HTML6: Your JSON and SXML Simplified

Xah Lee, 2010-09-21, 2010-09-27, 2010-12-17

Tired of the standard bodies telling us what to do and change their altitude? Tired of the SGML/HTML/XML/XHTML/HTML5 changes? Tire no more, here's a new proposal that will make life easier.

Introducing HTML6

HTML6 is based on HTML5, XML, and a rectified LISP syntax. It is inspired from JSON and SXML. HTML6 is 100% regular at syntax level, and is not a valid javascript expression or lisp expression. The syntax can be specified by about 3 short lines of parsing expression grammar.

The aim is a very simple syntax, 100% regularity, leaner, trivial to parse in any language.

Like XML in theory, no error should be accepted. If a source code has incorrect syntax, the application should report a error.

Syntax Example

Here's a standard ATOM webfeed XML file.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://xahlee.org/emacs/">

 <title>Xah's Emacs Blog</title>
 <subtitle>Emacs, Emacs, Emacs</subtitle>
 <link rel="self" href="http://xahlee.org/emacs/blog.xml"/>
 <link rel="alternate" href="http://xahlee.org/emacs/blog.html"/>
   <name>Xah Lee</name>
 <rights>© 2009, 2010 Xah Lee</rights>

   <title>Using Emacs's Abbrev Mode for Abbreviation</title>
  <link rel="alternate" href="http://xahlee.org/emacs/emacs_abbrev_mode.html"/>


Here's how it looks like in html6:

?xmlversion1.0” encoding “utf-8”」〕
〔feedxmlnshttp://www.w3.org/2005/Atom” xml:base “http://xahlee.org/emacs/”」

  〔title Xah's Emacs Blog〕
  〔subtitle Emacs, Emacs, Emacs〕
  〔updated 2010-09-19T14:53:08-07:00〕
name Xah Lee〕
   〔uri http://xahlee.org/〕

  〔id http://xahlee.org/emacs/blog.html〕
  〔icon http://xahlee.org/ics/sum.png〕
  〔rights © 2009, 2010 Xah Lee〕

title Using Emacs's Abbrev Mode for Abbreviation〕
   〔id tag:xahlee.org,2010-09-19:215308〕
   〔updated 2010-09-19T14:53:08-07:00〕
   〔summary tutorial〕

Simple Matching Pairs For Tag Delimiters

The standard xml markup bracket is simplified using simple lisp style matching pairs. For example, this code:


Is written as:

h1 HTML6〕

The delimiter used is:

CharacterUnicode Code PointUnicode Name

Syntax for XML Attributes

In xml:

<h1 id="xyz" class="abc">HTML6</h1>

In html6:


The attributes are specified by matching corner brackets. Items inside are a sequence of pairs. The value must be quoted by curly double quotes.

CharacterUnicode Code PointUnicode Name

Escape Mechanisms

To include a literal tortoise shell character in data, use &#x3014; and &#x3015;, similarly for other unicode chars.

Unicode; No More CD Data and Entities &amp;

There's no Entities. Except the unicode in hexadecimal format e.g. &#x3b1; for 「α」.

For example, &amp; is literal, it does not get displayed to &.

Treatment of Whitespace

Identical to XML.

Char Encoding; UTF8 and UTF16 Only

Source code must be UTF8 or UTF16, only. Nothing else.

File Name Extension

File name extension is “.html6”.


The semantics should follow xhtml5.

Questions and Answers

What's wrong with xhtml/html5 exactly?

The politics of standard body changes, and their attitude about what is correct also changes unpredictably. In around 2000, we are told that XML and XHTML will change society, or, at least, make the web correct and valid and far more easier to develop and flexible. Now it's a decade later. Sure the web has improved, but as far as html/xhtml and browser rendering goes, it's still syntax soup with extreme complexities. 99.99% of web pages are still not valid, and nobody cares. Google doesn't care. Apple doesn't care. In Google's hundreds of tips to webmasters, almost none of it ever mentions html validation. Google Earth itself generates invalid KML. Some 99.9% of the html files produced by Google or Apple are not valid html. Major browsers still don't agree on their rendering behavior. Web dev is actually far more complex, involving tens or hundreds of tech that hardly a person even knows about (ajax, JSON, lots xml variations). It's hard to say if it is better at all than the HTML3 days with “font” and “table” tags and gazillion tricks. The best practical approach is still trial n error with browsers.

And, now HTML5 comes alone, from a newfangled hip group primarily from current big corporations Google and Apple, with a attitude that validation is overrated — a insult to the face about the XML mantra from w3c, just when there starts to be more and more sites with correct XHTML and Microsoft's Internet Explorer getting on track about correctness.

XML is break from SGML, with many justifications why it needs be, and with some backward compatible trade-offs, and now HTML5 is a break from both SGML and XML.

For some personal story about how the change of standard body attitude effect practical programing, see:

Why not just adopt SXML from the lisp world?

Lisp's SXML is not a stand-alone syntax for the need of the web. SXML's syntax is designed to be compatible with lisp lang's existing syntax, tradition, parsers. Lisp syntax (aka sexp) has several syntactical irregularities. It is not 100% of nested paren of the form (a b c ...). SXML is easy for lispers to adopt, but harder for other languages and communities. (For detail of lisp's syntax irregularities, see: Fundamental Problems of Lisp.)

The following are explanation on how several of lisp's syntax for xml breaks the tree-and-syntax structural correspondence that is inherent in XML.

XML as textual representation of a tree has a quirk, in that each node has this special thing called “attributes” (aka “properties”). The “attribute” is not a node of the tree, but rather, is info attached to a node. Here's a example html:

<h1 id="xyz" class="abc">A B C</h1>

The standard lisp syntax to represent attributes, adopted from lisp's similar concept of “properties” of lisp's “symbols”, is this:

(h1 :id "xyz" :class "abc" A B C)

The way this works is by creating a extra rule on the first char of a name. If the name starts with :, then that name is considered the name of a property, and the next element is considered its value. This special rule breaks a fundamental principle of XML syntax. That is, the lexical structure of the source code no longer corresponds to the semantic structure. The semantics of the source code changes depending on the first char of a atom.

Another way to represent xml's attribute, adopted in some lisp code based on lisp's “alist” (aka associative array) syntax, is this:

(h1 ((id . "xyz") (class . "abc")) A B C)

This too, has syntactical ambiguity.

The whole ((id . "xyz") (class . "abc")) can be interpreted as a node by itself, where the first element is again a node. But also here, it uses lisp's special “cons” syntax (id . "xyz") which is itself ambiguous at the syntax level. It can be considered as a node named “id” with 2 branches . and "xyz", like this:


or it can be considered as a node named “cons” with 2 branches id and "xyz", like this:


Another common lisp syntax for attributes, from SXML, is this:

(h1 (@ (id . "xyz") (class . "abc")) A B C)

Here, a special rule is created if a name is just 「@」. When a first element's name is just 「@」, then that parenthesized expression is considered to be a property list, not a node.

So, in conceiving html6, i thought a solution for getting rid of syntax ambiguity for node vs attributes is to use a special bracket for properties/attributes of a node. e.g. 〔h1「id “xyz” class “abc”」A B C〕.

Why use weird Unicode characters for matching pair?

Unicode are widely adopted today and is very practical. (See: Unicode Popularity On Web.) It is the default char set for many langs (e.g. Java, XML). Unicode also has a lot proper matching pairs. (See: Matching Brackets in Unicode.) It seems today is the right time to adopt the wide range of proper symbols provided in unicode, instead of relying on the very limited number of ASCII characters of the 1960s.

The straight double quote character 「"」 (ascii 34) is not a matching pair, and as computer source code it has several problems. For example, it needs context to know which quote chars are paired. Also, it is difficult to recover from a missing quote. (this problem is especially pronounced in text editors for syntax highlighting.) A proper matching pair allow programs and editors to more easily correctly determine the quoted string, and thus easier to know its position in a tree, and makes it easier to implement features such as navigating the tree in a editor.

The problem of inputting special chars of unicode can be trivially solved by text editors. For example, Emacs, Mathematica, Microsoft Word, all has simple and efficient ways to enter commonly used special chars such as ™ © é ¶. (See: Emacs xmsi-mode for Math Symbols InputHow Mathematica does Unicode?Designing a Math Symbols Input System.) Also, many special char are part of the keyboard layout for fast input. (See: International Keyboard LayoutsDvorak, Maltron, Colemak, NEO, Bépo, Turkish-F, Keyboard Layouts Fight!)

Possibly, the special bracketing chars can be replaced by () and [] for html6. Though, that also means a lot ugly escape will need to happen in the content text. If not escaped, that means incorrect syntax for the whole file.

The core idea of html6 is that the syntax is designed specifically as a 2-dimentional textual representation of a tree, and with a attribute quote that attaches a limited form of info (sequence of pairs for attributes) to any node to fit existing structure of XML.

The advantage of this is that it should be extremely easy to parse. The syntax can be specified in perhaps just 3 lines of parsing expression grammar (PEG), and PEG libraries exists for Perl, Python, Ruby, Lua, C, C#, Java, OCaml/F#, Clojure, ... A parser for html6 can be trivially written without relying on PEG.

Any thoughts about flaws?

It is probably hopeless for browsers to adopt this. But if you are involved in standard bodies of xml or html5, please consider this, and consider more about correctness and validation. XML is a move in the right step, with huge consequences in various xml languages and formats (JSON, XSLT, XSL, XQUERY, o:XML..., Microsoft Office Open XML, etc.) HTML5's features in theory is simply a XML with a proper DTD. HTML5 was created in part to address w3c's slowness in responding to industrial changes, and in part to address verbosity of XML syntax. HTML5 by itself does not introduce any new technical concepts. The force behind HTML5 is almost purely corporate adoption, and mostly existing practices from corporations. But the attitude it brought about seems to be a step backward, towards corporate sponsored tags (much from Google) and technologies (e.g. much of canvas is from Apple, a low-level pixel-drawing garbage in comparison to SVG), odd-end special tags, more special syntaxes, less focus about correctness, another new syntax/format in the html/xml/xhtml/dtd-sniffing soup.

Was this page useful? If so, please do donate $3, thank you donors!


Calorie restriction diet

Perm url with updates: http://xahlee.org/Periodic_dosage_dir/pd.html

Been aware of Calorie restriction since about 2004, and have been somewhat practicing it. Basically, “Calorie restriction” is the scientific discovery, started in 1930s. Many animals, when put on a diet of minimal food supply barely enough to sustain life (e.g. the animal is constantly in a hunger state), but with sufficient nutrients, that the animals live much longer and healthier. (e.g. 30% to 40% increase in lifespan.)

In the past 10 years, this has been tested on human animals, and the result so far is positive.

In the past about 5 years, i've been living on a $3 per day diet. (See: Diet of Xah Lee.)

Body Mass Index (BMI) is a easy way to tell if you are too fat or too thin, based on your height and average weight for their height.


“F Sharp”/OCaml Books and People

Perm url with updates: http://xahlee.org/comp/F_Sharp_OCaml_books_and_people.html

“F Sharp”/OCaml Books and People

Xah Lee, 2010-09-20

This page is a short intro of F#/OCaml books and their authors as of 2010.

Discovered a new book on F#/OCaml.

  • 《The F# Survival Guide》 By John Puopolo et al. At: ctocorner.com

A look at Amazon, there are quite a few books on F#/OCaml too.

  • 《Expert F# 2.0 (Expert's Voice in F#)》 (2010) By Don Syme, Adam Granicz, Antonio Cisternino. amazon

Don Syme designed F#. He has a website with lots of news on F# at msdn.com Don Syme. Apparently, a lot is going on.

  • 《Real World Functional Programming: With Examples in F# and C#》 (2009) By Tomas Petricek, Jon Skeet. amazon

Thomas Petricek is a master student specializing in programing models, and interned at Microsoft under Don Syme. His home page is tomasp.net Thomas Petricek.

  • 《Beginning F#》 (2009) By Robert Pickering. amazon
  • 《Foundations of F# (Expert's Voice in .Net)》 (2007) By Robert Pickering. amazon

Robert Pickering seems to have 10 years of coding experience according to his resume. His blog is at: strangelights.com Robert Pickering.

  • 《F# for Scientists》 (2008) By Jon Harrop. amazon

Jon Harrop is well known online. I read the first chapter of his book in 2008, and it is one of the best online short intro to OCaml by far.

In online programing forums, he often taunts other languages. He's known as a troll. (me too) He has a blog at http://fsharpnews.blogspot.com/.

  • 《Programming F#: A comprehensive guide for writing simple code to solve complex problems》 (2009) By Chris Smith. amazon

Chris Smith seems to have 8 years coding experience; At Microsoft. His blog is at blogs.msdn.com Chris Smith

There are apparently more F# book coming:

  • 《Professional F# 1.0》 By Ted Neward, Aaron Erickson, Talbott Crowell, Rick Minerich. amazon

Interestingly, many of them are also available in Kindle Edition. (See: What's Kindle, iPad, Android, and All That Jazz??.)

There's also 《Practical OCaml》 By Joshua B Smith, but on amazon it got very bad reviews.

Note that F# (F Sharp) and OCaml are basically the same practically speaking. F# is implemented on top of Microsoft's .NET, while OCaml is mostly from the unix world.

The history of OCaml is rather confusing. Basically, it all began as ML (programming language) in 1973. The “ML” stand for metalanguage. Originally designed for theorem proving related tasks. Thru the years, many variations came, including Standard ML, Caml, OCaml; Moscow ML, Alice, F#. F# and OCaml being the current 2 most popular and mostly compatible. Here's Wikipedia quote:

ML is a general-purpose functional programming language developed by Robin Milner and others in the late 1970s at the University of Edinburgh,[1] whose syntax is inspired by ISWIM. Historically, ML stands for metalanguage: it was conceived to develop proof tactics in the LCF theorem prover (whose language, pplambda, a combination of the first-order predicate calculus and the simply typed polymorphic lambda-calculus, had ML as its metalanguage). It is known for its use of the Hindley–Milner type inference algorithm, which can automatically infer the types of most expressions without requiring explicit type annotations.

See also:

Was this page useful? If so, please do donate $3, thank you donors!

Shocking Asian (film)

In around 1987, i rented these movies on VHS. Quite interesting.

shocking asia

〈Shocking Asia〉 amazon

shocking asia 2

〈Shocking Asia 2〉 amazon

As a Chinese grew up in Taiwan, i can tell you these are not made up. Here's Wikipedia quote from Shocking Asia:

Shocking Asia is a 1974 documentary film written and directed by Rolf Olsen with Ingeborg Stein Steinbach. The film was banned in Finland due to its graphic content. A sequel titled Shocking Asia II: The Last Taboos was released in 1985.


Iron Man 2

Watched 〈Iron Man 2〉 amazon today. I think it is not as good as the first Iron Man, but for you tech lovers out there, it's fantastic.

Using Emacs's Abbrev Mode for Abbreviation

Perm url with updates: http://xahlee.org/emacs/emacs_abbrev_mode.html

Using Emacs's Abbrev Mode for Abbreviation

Xah Lee, 2010-09-19

This page is a short tutorial on how to use emacs's abbrev mode for abbreviation.

Emacs has a abbrev feature that's really useful, and is usually not known to beginners. I lived in emacs daily in a programing day job since 1998, but only learned and started to use abbrev mode in about 2007. Close to 10 years i didn't use abbrev feature, and was too lazy to read about it.

Defining Your Abbrevs

Create a file with the following content:

; my personal abbreviations
(define-abbrev-table 'global-abbrev-table '(

    ;; math/unicode symbols
    ("tin" "∈" nil 0)
    ("tnin" "∉" nil 0)
    ("tinf" "∞" nil 0)
    ("tluv" "♥" nil 0)
    ("tsmly" "☺" nil 0)

    ;; email
    ("twdy" "wordy-english@yahoogroups.com" nil 0)

    ;; computing tech
    ("twp" "Wikipedia" nil 0)
    ("tms" "Microsoft" nil 0)
    ("tg" "Google" nil 0)
    ("tqt" "QuickTime" nil 0)
    ("tit" "IntelliType" nil 0)
    ("tmsw" "Microsoft Windows" nil 0)
    ("twin" "Windows" nil 0)
    ("tie" "Internet Explorer" nil 0)
    ("tahk" "AutoHotkey" nil 0)
    ("tpr" "POV-Ray" nil 0)
    ("tps" "PowerShell" nil 0)
    ("tmma" "Mathematica" nil 0)
    ("tjs" "javascript" nil 0)
    ("tvb" "Visual Basic" nil 0)
    ("tyt" "YouTube" nil 0)
    ("tge" "Google Earth" nil 0)
    ("tff" "FireFox" nil 0)
    ("tsl" "Second Life" nil 0)
    ("tll" "Linden Labs" nil 0)
    ("tcs" "Chthonic Syndicate" nil 0)
    ("tee" "ErgoEmacs" nil 0)

    ;; normal english words
    ("talt" "alternative" nil 0)
    ("tchar" "character" nil 0)
    ("tdef" "definition" nil 0)
    ("tbg" "background" nil 0)
    ("tkb" "keyboard" nil 0)
    ("tex" "example" nil 0)
    ("tkbd" "keybinding" nil 0)
    ("tenv" "environment" nil 0)
    ("tvar" "variable" nil 0)
    ("tev" "environment variable" nil 0)
    ("tcp" "computer" nil 0)

    ;; sig
    ("txl" "Xah Lee" nil 0)
    ("txs" " Xah ∑ xahlee.org ☄" nil 0)

    ;; url
    ("tuxl" "http://xahlee.org/" nil 0)
    ("tuxp" "http://xahporn.org/" nil 0)
    ("tuee" "http://ergoemacs.org/" nil 0)
    ("tuvmm" "http://VirtualMathMuseum.org/" nil 0)
    ("tu3dxm" "http://3D-XplorMath.org/" nil 0)

    ;; emacs regex
    ("tnum" "\\([0-9]+?\\)" nil 0)
    ("tstr" "\\([^\"]+?\\)\"" nil 0)
    ("tcurly" "“\\([^”]+?\\)”" nil 0)

    ;; shell commands
    ("tditto" "ditto -ck --sequesterRsrc --keepParent src dest" nil 0)
    ("tim" "convert -quality 85% " nil 0)
    ("tims" "convert -size  -quality 85% " nil 0)
    ("tim256" "convert +dither -colors 256 " nil 0)
    ("timf" "find . -name \"*png\" | xargs -l -i basename \"{}\" \".png\" | xargs -l -i  convert -quality 85% \"{}.png\" \"{}.jpg\"" nil 0)

    ("t0" "find . -type f -empty" nil 0)
    ("t00" "find . -type f -size 0 -exec rm {} ';'" nil 0)
    ("tchmod" "find . -type f -exec chmod 644 {} ';'" nil 0)
    ("tchmod2" "find . -type d -exec chmod 755 {} ';'" nil 0)

    ("tunison" "unison -servercmd /usr/bin/unison c:/Users/xah/web ssh://xah@example.com//Users/xah/web" nil 0)
    ("tsftp" "sftp xah@xahlee.org" nil 0)
    ("tssh" "ssh xah@xahlee.org" nil 0)
    ("trsync" "rsync -z -r -v -t --exclude=\"*~\" --exclude=\".DS_Store\" --exclude=\".bash_history\" --exclude=\"**/xx_xahlee_info/*\"  --exclude=\"*/_curves_robert_yates/*.png\" --exclude=\"logs/*\"  --exclude=\"xlogs/*\" --delete --rsh=\"ssh -l xah\" ~/web/ xah@example.com:~/" nil 0)

    ("trsync2" "rsync -r -v -t --delete --rsh=\"ssh -l xah\" ~/web/ xah@example.com:~/web/" nil 0)
    ("trsync3" "rsync -r -v -t --delete --exclude=\"**/My *\" --rsh=\"ssh -l xah\" ~/Documents/ xah@example.com:~/Documents/" nil 0)

;; stop asking whether to save newly added abbrev when quitting emacs
(setq save-abbrevs nil)

Now, put the above file at 〔~/.emacs.d/my_emacs_abbrev.el〕. Now, in your emacs init file, put this line:

(load "my_emacs_abbrev")

You can now restart emacs, or just select the line then call “eval-region”. That will load your abbrev definitions.

Now, type for example “t0” then press Space, then it'll expand to:

find . -type f -empty

I add the letter “t” in front of all my abbrevs. For example, my abbrev for “character” is “tchar” not “char”. This way, it is easier for me to avoid expanding when i don't want to.

The “t” is right under the right hand's index finger on The Dvorak Keyboard Layout. If you are using QWERTY, you can use “k”.

Uses for Abbreviation

You can use abbrev for inserting Math Symbols or other Unicode Characters (e.g. “∈ → ⇒ ∞ π”), or normal long English words (e.g. “alternative”, “organization”, “PowerShell”, “Wikipedia”), or your email address, signature, urls you need often, and most useful to me is long unix shell commands. For example, i type “trsync” and it expands to: 「rsync -z -r -v -t --exclude="*~" --exclude=".DS_Store" --exclude=".bash_history" --exclude="**/xx_xahlee_info/*" --exclude="*/_curves_robert_yates/*.png" --exclude="logs/*" --exclude="xlogs/*" --delete --rsh="ssh -l 104454842" ~/web/ 0 104454842@s368352655.example.com:~/ 」.

You can also use it to insert programing language's function templates, or xml templates, but a more powerful feature for that is Emacs Templates with YASnippet.

Adding and Removing Your Abbrevs

You may want to put your abbrev file in Emacs's Bookmark, so you can easily open it.

To add a new abbrev, just create a new line of definition, then call “eval-buffer”. To remove, just remove a line.

Emacs's Way

Emacs has many shortcuts and over ten commands for adding, removing, listing, editing abbrevs. You can read the manual at: (info "(emacs) Abbrevs").

For example, to add a new abbrev, 【Ctrl+x a g】 will promp you to enter a abbrev for the word in front of the cursor.

Emacs does not automatically save your abbrev, but will ask you when you quit emacs. By default the abbrev file is saved at: 〔~/.emacs.d/abbrev_defs〕.

I didn't find the many extra add/remove/prevent/“mode specific” abbrev features useful. But if you like to use emacs's abbrev shortcuts and commands, just remember its file location.

If you always add new abbrevs to the definition file yourself, then you may add the following code in your emacs init file:

;; stop asking whether to save newly added abbrev when quitting emacs
(setq save-abbrevs nil)

(info "(elisp) Abbrevs")

Was this page useful? If so, please do donate $3, thank you donors!