2009-01-24

Language, Purity, Cult, and Deception

perm url: http://xahlee.org/UnixResource_dir/writ/lang_purity_cult_deception.html

Xah Lee, 2009-01-24

[this essay is roughly a 10 years personal retrospect of some languages, in particular Scheme and Haskell.]

I learned far more Ocaml in the past 2 days than the fucking 2 months i tried to learn Haskell, with 10 years of “I WANT TO BELIEVE” in haskell.

The Haskell's problem is similar to Scheme lisp, being academic and of little industrial involvement. About 10 years ago, during the dot com era around 1999, where scripting war is going on (Perl, tcl, Applescript, Userland Frontier, with in the corner Python, Ruby, Icon, Scheme, in the air of Java, HTML 3, CSS, CGI, javascript), i was sold a lie by Scheme lisp. Scheme, has a aura of elegance and minimalism that's one hundred miles in radius. I have always been a advocate of functional programing, with a heart for formal methods. Scheme, being a academic lang, has such a association. At the time, Open Source and Linux have just arrived on the scene and screaming the rounds in the industry, along with Apache & Perl. The Larry Wall scumbag and Eric Raymond motherfucker and Linus T moron and Richard Stallman often appears in interviews in mainstream media. Richard Stallman's FSF with its GNU, is quick to make sure he's not forgotten, by a campaign on naming of Linux to GNU/Linux. FSF announced that Scheme is its chosen scripting lang for GNU system. Plans and visions of Guile — the new Scheme implementation, is that due to Scheme Lisp's power will have lang conversion abilities on the fly so programers can code in other lang if they wanted to, anywhere in the GNU platform. Around that time, i also wholeheartedly subscribed to some A Brave Gnu World bulletin of FSF with high expectations.

Now, it's 2009. Ten years have passed. Guile disappeared into oblivion. Scheme is tail recursing in some unknown desert. PHP, one of the ugly kludge pidgin, practically and quietly surpassed the motherfucking foghorn'd Perl in early 2000s to become the top 5 languages. (remember? Larry Wall scumbag said P is for “Practical”. PHP's got two “P”s.) Python has surfaced to became a mainstream. Ruby is the hip kid on the block. Where is Scheme? O, you can still hear these idiots fluttering tail recursions among themselves in newsgroups. Tail recursion! Tail recursion! And their standard the R6RS in 2007, by their own consensus, is one fucked up shit.

In 2000, i was a fair expert of unix technologies. Sys admin to several data center's solaris boxes each costing some 40 grands. Master of Mathematica and Perl but don't know much about any other lang or lang in general. Today, i am a expert of about 5 languages and working knowledge with tens or so diverse ones. There is nothing in Scheme i'd consider elegant, not remotely, even if we only consider R4RS.

Scheme, like other langs with a cult, sold me lie that lasted 10 years. Similarly, Haskell fucked me with a tag of “no assignment” lure. You can try to learn the lang for years and all you'll learn is that there's something called currying and monad. I regret i learned python too in 2006. Perl is known for its intentional egregious lies, lead by the demagogue Larry Wall (disclaimer: opinion only). It fell apart unable to sustain its “post-modernistic” sophistry. Python always seemed reasonable to me, until you walked in. You learned that the community is also culty, and is into certain grand visions on beauty & elegance with its increasingly complex syntax soup with backward incompatible python 3.0. The python fuckheads sport the air of “computer science R us”, in reality they are idiots about the same level of Perl mongers. (Schemers and Haskell people at least know what they are talking about. They just don't have the know how of the industry.)

I think my story can teach tech geekers something. In my experience, the langs that are truely a joy to learn and use, are those sans a cult. Mathematica, javascript, PHP, are all extremely a joy to use. Anything you want to do or learn how to do, in so far that the lang is suitable, can be done quickly. Their docs are to the point. And today i have to include Ocaml. It's not about whether the lang is functional, or whether the lang is elegant, or what theoretical power it has. Also, lang of strong academic background such as Scheme and Haskell are likely to stay forever there, regardless what is the technical nature of the lang. The background of the community, makes half what the language is.

The above is not a terrible insight, but i suppose it should be useful for some application. Today, there's huge number of languages, each screaming ME! To name a few that are bubbled up by tech geekers: Arc, Clojure, Scalar, F#, Erlang, Ruby, Groovy, Python 3, Perl6. (for a big list, see: Proliferation of Computing Languages) So, if i want to learn another lang down the road, and wish it to be a joy to use, usable docs, large number of usable libraries, well supported, a community that doesn't loop into the estheticalities of tail recursion or monad every minute, then which one should i buy? With industrial background in mind, not culty, lang beauty matter not that much, i think Erlang, F# would be safe choices, while langs like Qi, Oz, Arc, Perl6, would be most questionable.

Disclaimer: All mentions of real persons are opinion only.

dehtmlize source code in emacs lisp

perm url: http://xahlee.org/emacs/elisp_htmlize.html

added elisp code to dehtmilze a block of htmlized source code.

DeHtmlize Text

2009-01-24

When a source code in a html file is htmlized, it is usually unreadible. Suppose you want to modify the source code presented in html. Usually, you view it in a browser, then copy the source code. Then create a new buffer, paste the code, to edit it. When done, you copy the newly edited text, close temp buffer, delete the htmilzed version in your html file, paste the new in, then htmlize it again. This process is painful.

It would be nice, if you can press a button, then the htmlized source code in your html will become plain. So you can modify it. Press a button again to have it htmlized again.

Here are 2 elisp code to dehtmlize. The dehtmilze-region will dehtmilze a selected region. The dehtmlize-block will dehtmlize code inside a pre block of the form “<pre class="langName">”.

(defun dehtmlize-block ()
  "Delete span tags inside a <pre> region.
For example, if the cursor somewhere inside the tag:

<pre class=\"code\">
codeXYZ...
</pre>

after calling, the “codeXYZ...” block of text's span tags will be removed.
dehtmlize-block in the reverse of htmlize-block."
  (interactive)
  (let (mycode tag-begin code-begin code-end tag-end mymode)
    (progn
      (setq tag-begin (re-search-backward "<pre class=\"\\([A-z-]+\\)\""))
      (setq code-begin (re-search-forward ">"))
      (re-search-forward "</pre>")
      (setq code-end (re-search-backward "<"))
      (setq tag-end (re-search-forward "</pre>"))
      )

    (let (myStr)
      (setq myStr (buffer-substring code-begin code-end))
      (setq myStr (replace-regexp-in-string "<span class=\"[^\"]+\">" "" myStr))
      (setq myStr (replace-regexp-in-string "</span>" "" myStr))
      (setq myStr (replace-regexp-in-string "&amp;" "&" myStr))
      (setq myStr (replace-regexp-in-string "&lt;" "<" myStr))
      (setq myStr (replace-regexp-in-string "&gt;" ">" myStr))
      (delete-region code-begin code-end)
      (goto-char code-begin)
      (insert myStr)
      )
    )
  )
(defun dehtmlize-span-region (p1 p2)
  "Delete HTML “span” tags on a region.
Note: only certain span tags are deleted."
  (interactive "r")

  (let (mystr)
    (setq mystr (buffer-substring p1 p2))

    (setq mystr
          (with-temp-buffer
            (insert mystr)
            
            (goto-char (point-min))
            (while (search-forward-regexp "<span class=\"[^\"]+\">" nil t) (replace-match ""))

            (goto-char (point-min))
            (while (search-forward "</span>" nil t) (replace-match ""))

            (goto-char (point-min))
            (while (search-forward "&amp;" nil t) (replace-match "&"))

            (goto-char (point-min))
            (while (search-forward "&lt;" nil t) (replace-match "<"))

            (goto-char (point-min))
            (while (search-forward "&gt;" nil t) (replace-match ">"))

            (buffer-string)
            ))
    (delete-region p1 p2)
    (insert mystr))
  )

2009-01-23

emacs command usage frequency

here's my own usage of emacs commands. From 2008-08-30 ta 2009-01-23. here's a list of the top 30 most used commands.
1119668   45.96%  self-insert-command
 203404    8.35%  next-line
 148571    6.10%  previous-line
 146318    6.01%  forward-word
 116557    4.78%  backward-word
  46370    1.90%  delete-backward-char
  44569    1.83%  isearch-printing-char
  41315    1.70%  forward-char
  36771    1.51%  backward-char
  36692    1.51%  backward-kill-word
  33890    1.39%  newline
  22912    0.94%  save-buffer
  22247    0.91%  yank
  18696    0.77%  mwheel-scroll
  18031    0.74%  kill-line
  16647    0.68%  close-current-buffer
  13485    0.55%  move-beginning-of-line
  13029    0.53%  scroll-up
  11420    0.47%  isearch-forward
  11380    0.47%  isearch-other-meta-char
  10840    0.44%  kill-word
  10363    0.43%  set-mark-command
  10349    0.42%  isearch-repeat-forward
  10256    0.42%  find-file
  10037    0.41%  execute-extended-command
   9328    0.38%  move-cursor-next-pane
   9012    0.37%  forward-paragraph
   8715    0.36%  scroll-down
   8612    0.35%  delete-char
   7776    0.32%  backward-paragraph
   7370    0.30%  undo
compared to my previous stat compilation with 2 other emacs users http://xahlee.org/emacs/command-frequency.html (smaller data points) i think that the top 10 most used commands are probably the same for vast majority emacs users, and the ordering is prob roughly the same too.

2009-01-21

Updating Atom/RSS with Elisp

Updating Atom/RSS with Elisp

Xah Lee, 2009-01-21

This page describes a real world example of using emacs to update a web syndication (RSS/Atom) page. If you don't know elisp, see: Emacs Lisp Basics.

The Problem

Summary

I want to write a command, so that, when invoked, the current selected text will be added as a entry in a RSS/Atom file.

This lesson will show you how write a command that grabs the region text, switch buffer, search string to locate position for inserting text, insert the text, and update date field in a file.

Detail

I run a website “xahlee.org”. The site is hosted by a website service provider. Typically, i create or edit my site on local disk. Then, i upload by switching to shell “Alt+a sh”, then type “trsync ”, it would automatically be expanded to:

rsync -z -av --exclude="*~" --exclude=".DS_Store" --delete --rsh="ssh -l xyz" ~/web/ xyz@xahlee.org:~/

This will update my website on the server.

You can define your keyboard shortcut, alias, abbreviation, like this:

(global-set-key (kbd "M-a") 'execute-extended-command) ;; easier typing

(defalias 'sh 'shell) ;; shorter command name

;; save typing
(define-abbrev-table 'global-abbrev-table '(
    ("trsync" "rsync -z -av --rsh=\"ssh -l xyz\" ~/web/ xyz@xahlee.org:~/" nil 0)
    ))

One of my site's page is a blog. I write the blog page and update my site daily using the above mechanism. But i also want to create a RSS so that readers don't have to keep visiting the blog site just to see if there's new entry. They can just subscribe using the RSS and use several RSS reader in browser or other services that can notify them new entries or send them email.

I use Atom. Atom is a standardized format for RSS.

Basically, a Atom file is a xml file, like this:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://xahlee.org/Periodic_dosage_dir/">

 <title>Xah's Blog</title>
 <subtitle>Ethnology, Ethology, and Tech Geeking</subtitle>
 <link rel="self" href="http://xahlee.org/Periodic_dosage_dir/pd.xml"/>
 <link rel="alternate" href="http://xahlee.org/Periodic_dosage_dir/pd.html"/>
 <updated>2006-09-11T02:35:33-07:00</updated>

 <author>
   <name>Xah Lee</name>
   <uri>http://xahlee.org/</uri>
 </author>

 <id>http://xahlee.org/Periodic_dosage_dir/pd.html</id>
 <icon>http://xahlee.org/siteicon.png</icon>
 <rights>© 2006 Xah Lee</rights>

 <entry>
   <title>Batman thoughts</title>
   <id>tag:xahlee.org,2006-09-09:015218</id>
   <updated>2006-09-08T18:52:18-07:00</updated>
   <summary>Some notes after watching movie Batman.</summary>
   <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">
      <p>I watched Batman today ...</p>
      <!-- more xhtml here -->
      </div>
   </content>
  <link rel="alternate" href="pd.html"/>
 </entry>

</feed>

The file's header contains standard info such as: blog title, author info, copyright info, blog url, (unique) id for this blog. Then, the main body is made of several “entry”. Each entry has a title, id, timestamp, summary, perm link url, and full content (optional).

What i want my emacs script to do is to grab the current selected text and insert it as a entry in the Atom file, and also update the “updated” tag in the header with the a time stamp.

Solution

Here's the solution. The following code will grab the current text selection, and insert it as a entry in a Atom file in the right location, and update the Atom file's “updated” tag with a new timestamp.

(defun make-pdxml-entry (begin end)
  "Insert current region as a Atom RSS entry to file “pd.xml”.

Detail: create a new Atom entry in pd.xml, with the current
region as its content, update the Atom file's update date."
  (interactive "r")
  (let ((meat (buffer-substring-no-properties begin end)))
    (find-file "~/web/Periodic_dosage_dir/pd.xml")
    (goto-char (point-min))
    (re-search-forward "<entry>" nil t)
    (move-beginning-of-line 1)
    (insert-atom-entry)
    (re-search-backward "<div xmlns=\"http://www.w3.org/1999/xhtml\">" nil t)
    (re-search-forward ">" nil t)
    (insert meat)
    (update-pdxml-date)
    (find-file "~/web/Periodic_dosage_dir/pd.xml")
    (goto-char (point-min))
    (re-search-forward ">ttt" nil t)))

The above code works by first grabbing the current text selection, save it to the variable “meat”. Then it opens the atom file using “find-file”. It finds the location to insert a new entry by searching for “<entry>”. Then, it calls “insert-atom-entry” to insert new entry template. Then, it places cursor location somewhere in the new entry to insert the text for content. The line “(insert meat)” inserts the selected text. Then, “(update-pdxml-date)” is called to update the “updated” tag in Atom. Finally, the file is opened again (because update-pdxml-date might have closed it), and cursor is moved to the right location for me to type a title or summary.

This code can be improved in many ways. For example, right now it is hard-coded into updating one specific atom file. What if you have more than on Atom feed? Also, the title and summary tag is not automatically generated. What if you also want a RSS format too? What if you want the feed automatically sent to server? To fix these, you'll have to go into designing a general system for dealing with RSS, but right now it just work for me. When i need more flexibility, i can easily modify my code to adopt. This is the beauty of emacs.

The following are supplementary functions called by make-pdxml-entry.

(defun insert-atom-entry ()
  "Insert a blank Atom RSS entry template."
  (interactive)
  (insert
   (concat " <entry>\n   <title>ttt</title>\n   <id>"
           (format-time-string
            "tag:xahlee.org,%Y-%m-%d:%H%M%S" (current-time) 1)
           "</id>\n   <updated>"
           (concat
            (format-time-string "%Y-%m-%dT%T")
            ((lambda (x)
               (concat (substring x 0 3) ":" 
                       (substring x 3 5))) (format-time-string "%z"))
            )
           "</updated>
   <summary>ttt</summary>
   <content type=\"xhtml\">
<div xmlns=\"http://www.w3.org/1999/xhtml\">
</div>
   </content>
  <link rel=\"alternate\" href=\"http://xahlee.org/Periodic_dosage_dir/pd.html\"/>
 </entry>\n\n"
           )))

Note that Atom spec requires that each entry has a world-wide unique id string, and this string must be a uri format. There are several methods discussed on the web about how to generate such a id. The method i used is a combination of domain name and timestamp, adopted from a online suggestion. You can search the web using “atom, entry, id” for these suggestions.

(defun update-pdxml-date ()
  "Update the Atom RSS updated tag in pd.xml.\n
That is, the first occurance of: <updated>2006-10-10T22:58:42-07:00</updated>"
  (interactive)
  (find-file "~/web/Periodic_dosage_dir/pd.xml")
  (goto-char (point-min))
  (let (x1)
    (setq x1 (re-search-forward "<updated>" nil t))
    (delete-region x1 (+ x1 25))
    (insert-date-time)))
(defun insert-date-time ()
  "Insert current date-time string."
  (interactive)
  (insert
   (concat
    (format-time-string "%Y-%m-%dT%T")
    ((lambda (x) (concat (substring x 0 3) ":" (substring x 3 5)))
     (format-time-string "%z")))))

With the above code, i write my blog html file as usual, then i select the region of text, then press “Alt+x make-pdxml-entry”, then i'm switched to the Atom file with the entry inserted. I edit the Title, Summary, and the perm url for that entry if any. Save file. Then i'm done, and can “Alt+x sh Enter trsync Enter”, then my web server is updated with blog and RSS.

Emacs is flexible!

For a simple intro to Atom, and links for Atom validation, tutorial, spec, sample Atom file, see: Intro to Atom.