2011-10-01

Emacs: Converting Decimal and Hexadecimal

Sometimes i need to convert between decimal and hexadecimal. Here's how to do that using emacs's built-in calculator.

  • Call “calc” 【Alt+x】.
  • Type any number. For example, 10.
  • Type “d6” to turn the display into hexadecimal form.
  • Type “d0” to turn the display into decimal form.

To type a hex number, type #, then type “16#aa” for the hex “aa”.

Another way, i find simpler, is using elisp. Open a new file, then type the following:

(format "%x" 10)  ; decimal to hex. Returns 「a」
(format "%d" #xa) ; hex 「a」 to decimal. Returns 「10」.

Select the code, then call “eval-region” 【Alt+x】. Or, put cursor at the end of the right parenthesis, then call “eval-last-sexp” 【Ctrl+x Ctrl+e】 (See: Emacs: How to Eval Elisp Code, Find Functions, Search Documentation.)

To open a new file in ErgoEmacs, press 【Ctrl+n】. In GNU Emacs, call “switch-to-buffer” 【Ctrl+x b】 then type a new name.

emacs lisp if function's grammar

Emacs Lisp's “if” function has a annoying form.

One would expect it to take 3 arguments, no more and no less, like this:

(if ‹test›
 ‹expression for true›
 ‹expression for false›
)

But it's actually takes many more args. From the 3rd args onward are all expressions for false. Like this:

(if ‹test›
 ‹expression for true›
 ‹expression for false›
 ‹expression for false 2›
 ‹expression for false 3›
 …
)

Here's a test code you can check:

(if nil
  (message "false")

  (message "true")
  (message "so true")
  (message "yes really")
  )

You can run the code by selecting it, then call “eval-region” 【Alt+x】. You can switch to the “*Messages*” buffer by 【Ctrl+h e】.

Is Common Lisp and Scheme Lisp the same way?

I think i'd prefer the simpler, logical, form: (if test true false).

2011-09-29

Emacs Lisp: Command to Replace HTML Entities with Unicode Characters

Perm url with updates: http://xahlee.org/emacs/elisp_replace_html_entities_command.html

Emacs Lisp: Command to Replace HTML Entities with Unicode Characters

Xah Lee, 2011-09-27

This page shows you how to write a elisp command to replace HTML entities such as é by its unicode character é.

The Problem

I have many HTML files from existing sources that contain many HTML Entities. I want to have a command that automatically change them to Unicode characters. Example:

  • ‘
  • ’
  • “
  • ”
  • éé

(For more about HTML entities, see: Character Sets and Encoding in HTMLHTML/XML Entities List.)

The command should work on the current paragraph, or text selection.

Solution

This is easy to write. One of the basic elisp idiom is find & replace on a region, like this:

(defun replace-html-chars-region (start end)
  "Replace some html entities in region …."
  (interactive "r")
  (save-restriction 
    (narrow-to-region start end)

    (goto-char (point-min))
    (while (search-forward "‘" nil t) (replace-match "‘" nil t))

    (goto-char (point-min))
    (while (search-forward "’" nil t) (replace-match "’" nil t))

    (goto-char (point-min))
    (while (search-forward "“" nil t) (replace-match "“" nil t))

    (goto-char (point-min))
    (while (search-forward "”" nil t) (replace-match "”" nil t))

    (goto-char (point-min))
    (while (search-forward "é" nil t) (replace-match "é" nil t))
    ;; more here
    )
  )

The (interactive "r") tells emacs that this is a command that can be called by “execute-extended-command” 【M-x】 and the "r" means emacs will feed the beginning and ending text selection positions to your function's parameters.

There are several problems with the above simple code.

① The code requires you to make a text selection first. It'd be better if it automatically work on text selection if there's one, else works on current paragraph.

② The elisp code above is too verbose. It'd be much better if we can write it like this:

(defun replace-html-named-entities ()
  "…"
  …
  (replace-pairs-in-string inputstr
    [
     ["‘" "‘"]
     ["’" "’"]
     ["“" "“"]
     ["”" "”"]
     ["é" "é"]
     more here …
     ]
  ))

③ Replacing multiple pairs of strings one by one may create incorrect behavior.

Tricky Issue with Sequential Replacement of Multi-Pairs

Suppose you are working on a html tutorial, and in that document, it contains the text: ©. The intended display is ©. However, if you are sequentially replacing each entities, the & part will become &, then © becomes just ©.

When you have many pairs of replacement, then doing them one by one, each time starting from the top of the document, may introduce unexpected changes. A solution is to replace them to a set of unique intermediate values, then replace these to the final values.

For the final code of “replace-html-named-entities” that fixes these problems, get it at xah_elisp_util.el.

You'll need to install 2 elisp libraries:

Emacs Regex Quirk: Matching beginning/end of line/string

Perm URL with updates: http://xahlee.org/emacs/emacs_regex_begin_end_line_string.html

, 2011-09-29, …, 2011-11-28

This page is a tutorial on emacs regex. Suppose you want to write a function that removes spaces in front of a string. You'd use a regex like this:

(replace-regexp-in-string "^ +" "" myString)

Here, the ^ means beginning of string, right?

WRONG!

In emacs regex, ^ matches beginning of string, but also beginning of each line in the string. Try to evaluate the following (place cursor at end then call eval-last-sexp.):

(replace-regexp-in-string "^ +" "•"
"
  like
    (1) this
    (2) that
")

Here's the result:

"
•like
•(1) this
•(2) that
"

To match just the beginning of a string, use \`. Like this:

;; Remove space/tab/newline in beginning myStr
(replace-regexp-in-string "\\`[ \t\n]*" "" myStr)

Similarly, the $ matches the endings of {buffer, string, line}. To just match ending of {buffer, string}, use \'. In lisp code, you'll need to double the backslash.

Summary

Special Regex CharMatches
^beginning of {line, string, buffer}
$end of {line, string, buffer}
\`beginning of {string, buffer}
\'end of {string, buffer}

See also: Text Pattern Matching in Emacs (emacs regex tutorial).

2011-09-26

Rhythmic Gymnastics videos

Perm url with updates: http://xahlee.org/vofli_bolci/rhythmic_gymnastics.html

Rhythmic Gymnastics (video)

Xah Lee, 2011-09-26

Rhythmic Gymnastics.
Rhythmic gymnastic Beijing 2008

Emacs Lisp: Fixing Dead Links

Perm url with updates: http://xahlee.org/emacs/elisp_fix_dead_links.html

Emacs Lisp: Fixing Dead Links

Xah Lee, 2011-09-25

This page shows you how to write a elisp script that checks thousands of HTML files and fix dead links.

The Problem

Summary

I have 2 thousands HTML files that contains about 70 dead local links. I need to write a elisp script to change these links to non-links. For example, this is a dead link:

<a href="../widget/index.html#Top">Introduction</a>

I need it to be:

<span class="εlink" title="../widget/index.html#Top">Introduction</span>

The script should run in batch. And it should generate a report.

Detail

I have copy of the emacs manuals, at:

These manual sometimes have links to other info files that's not emacs. For example, on this page Changing Files - GNU Emacs Lisp Reference Manual, it contains a link to GNU coreutils like this:

<a href="../coreutils/File-Permissions.html">File Permissions</a>

I need to change these links to non-links.

Solution

Here's outline of steps.

  • Open each file.
  • Search for “href=”.
  • Get the link url.
  • Check if the link is a local file and exists.
  • If not, change the entire link tag into a “span” tag.
  • Repeat the above, until no link found.

First, we start like this:

(setq inputDir "~/web/xahlee_org/emacs_manual/" )

(defun my-process-file (fpath)
  "process the file at fullpath FPATH …"
  …
)

;; traverse the directory on all html files
(require 'find-lisp)
(mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))

The important part is the “my-process-file” function. Here's the basic code:

(defun my-process-file (fpath)
  "process the file at fullpath FPATH …"
  (let (…)

    ;; open file
    (setq mybuff (find-file fpath))

    (while
        ;; search local link
        (search-forward "href=\"../" nil t)

      ;; get the url string
      (setq urlStr (thing-at-point 'filename) )

      ;; if the url is a dead link
      (when (not (file-exists-p urlStr))
        (progn

          ;; set p1 and p2 to be the start/end of the link tag
          ;; and get the entire link string
          (sgml-skip-tag-backward 1)
          (setq p1 (point) ) ; start of link tag
          (sgml-skip-tag-forward 1)
          (setq p2 (point) ) ; end of link tag
          (setq wholeLinkStr (buffer-substring-no-properties p1 p2) )

          ;; get link text
          (search-backward "</a>")
          (setq p4 (point) ) ; end of link text
          (search-backward ">")
          (forward-char 1)
          (setq p3 (point) ) ; start of link text
          (setq linkText (buffer-substring-no-properties p3 p4) )

          ;; remove the link, replace it with a non-link span text.
          (delete-region p1 p2)
          (insert 
           "<span class=\"εlink\" title=\""
           urlStr
           "\">"
           linkText
           "</span>"
           )
          )
        )
      )

    ;; close the file if no changes made
    (when (not (buffer-modified-p mybuff)) (kill-buffer mybuff) )

    ) )

Complete Code

Here's the complete code.

;; -*- coding: utf-8 -*-
;; 2011-09-25
;; replace dead links in emacs manual on my website
;;
;; Example. This:
;; <a href="http://xahlee.org/widget/index.html#Top">Introduction</a>
;;
;; should become this
;;
;; <span class="εlink" title="../widget/index.html#Top">Introduction</span>
;;
;; do this for all files in a dir.

;; rough steps:
;; go thru each file
;; search for link
;; if the link is 「../xx/」 where the file doesn't exist, then replace the whole link tag.

(setq inputDir "~/web/xahlee_org/emacs_manual/" ) ; dir should end with a slash

(defun my-process-file (fpath)
  "process the file at fullpath FPATH …"
  (let (
        mybuff
        urlStr
        linkText
        wholeLinkStr
        p1 p2
        p3 p4
        )
    (setq mybuff (find-file fpath))
    (widen) ; in case it's open and narrowed
    (goto-char (point-max)) ; work from bottom, so that changes in point are preserved. (actually, doesn't really matter for this script)

    (while
        (search-backward "href=\"../" nil t)
      (forward-char 7)
      (setq urlStr (replace-regexp-in-string "\\.html#.+" ".html" (thing-at-point 'filename) ) )

      (when (not (file-exists-p urlStr))
        (progn
          (sgml-skip-tag-backward 1)
          (setq p1 (point) )                      ; start of link tag
          (sgml-skip-tag-forward 1)
          (setq p2 (point) )                      ; end of link tag

          (setq wholeLinkStr (buffer-substring-no-properties p1 p2) )

          (search-backward "</a>")
          (setq p4 (point) )                      ; end of link text
          (search-backward ">")
          (forward-char 1)
          (setq p3 (point) )                      ; start of link text

          (setq linkText (buffer-substring-no-properties p3 p4) )

          (princ (buffer-file-name))
          (princ "\n")
          (princ wholeLinkStr)
          (princ "\n")
          (princ "----------------------------\n")

          (delete-region p1 p2)
          (insert 
           "<span class=\"εlink\" title=\""
           urlStr
           "\">"
           linkText
           "</span>"
           )
          )
        )
      )
    
    (when (not (buffer-modified-p mybuff)) (kill-buffer mybuff) )

    ) )

(require 'find-lisp)

(font-lock-mode 0)

(with-output-to-temp-buffer "*xah elisp dead link replace output*"
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (princ "Done deal!")
    )

(font-lock-mode 1)

Here's few interesting parts.

Turn Syntax Coloring Off

We turn font-lock off, by (font-lock-mode 0). When font-lock is on, processing 2 thousand HTML files will take ~50 minutes. With syntax coloring off, it's 3 minutes.

Leave Changed Files Open

If there are changes in the file, we leave it open. This way, we don't have to revert to backup files if there's a mistake. If we like the result, just call “ibuffer” and press 【* u】 to mark all un-saved, then S to save all. Then press D to close them all. If you do not want to save them, simply mark all unsaved 【* u】 then press D to close all.

This is extremely useful while you are still working on the code and doing some test runs. This interactive nature of emacs is what beats {perl, python, …} for text processing.

If you do want to save the file in the script, simply call (save-buffer) or (write-file (buffer-file-name))

When the file is not modified, we close it. Like this: (when (not (buffer-modified-p mybuff)) (kill-buffer mybuff) ).

Use sgml-skip-tag-forward

The “sgml-skip-tag-forward” and “sgml-skip-tag-backward” are from “html-mode”. They move the cursor to the beginning or ending of a tag. They are extremely useful. It saves you a lot time in writing code to parse tags, especially when tags are nested. Here's how we used it.

Suppose there's this link in a file:

<a href="../widget/index.html#Top">Introduction</a>

After we did the search with

 (while
  (search-backward "href=\"../" nil t)
  …
 )

the cursor is on the “h”. While the cursor is inside the tag, we call:

 (sgml-skip-tag-backward 1)
 (setq p1 (point) ) ; start of link tag
 (sgml-skip-tag-forward 1)
 (setq p2 (point) ) ; end of link tag

 (setq wholeLinkStr (buffer-substring-no-properties p1 p2) )

This sets the value of wholeLinkStr to the entire anchor tag <a …>…</a>.

Print Output to Your Own Buffer

Printing output is done here using “with-output-to-temp-buffer” and “princ”. Like this:

(with-output-to-temp-buffer "*xah elisp dead link replace output*"
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (princ "Done deal!")
    )

Inside the “my-process-file” function, we write:

 (princ (buffer-file-name))
 (princ "\n")
 (princ wholeLinkStr)
 (princ "\n")
 (princ "----------------------------\n")

Here's a output from the script: elisp_fix_dead_links_output.txt. It lets me easily see if there are any errors. There are a total of 68 changes.

For detail about printing in elisp, see: Emacs Lisp: print, princ, prin1, format, message.

2011-09-25

Emacs Quiz of the Day: replace-html-entities

Write a function “replace-html-entities”. If there is a text selection, work on the selection. Else, work on the current paragraph (defined by 2 line breaks)

Replace all named html entities such as &copy; to ©. (see entity list here: HTML/XML Entities List.)

i'll post a answer on Monday.

If you are new to elisp, the following articles will be helpful. One of the article basically spills out the solution.

Note: for those who know elisp well, your command should also replace all entities in decimal form (e.g. &#169;) or hexadecimal form &#xa9;. There's a tricky part in this problem. Your code should not introduce extraneous transformation. For example, suppose the input file discusses HTML language, and it has this text in it: &copy&#59;. It should not become ©.