2011-10-13

emacs lisp: replace-digits-by-subscript solutions

Perm url with updates: http://xahlee.org/emacs/elisp_replace_subscript.html

Emacs Lisp Exercise: replace-digits-by-subscript

, 2011-10-13, 2011-10-19

Here's a interesting elisp coding exercise. I have this elisp functon:

(defun replace-digits-by-subscript (string)
  "Replace digits by Unicode subscript characters in STRING.
For example, 「103 and 42」 ⇒ 「₁₀₃ and ₄₂」."
  (let ((myStr string))
    (setq myStr (replace-regexp-in-string "0" "₀" myStr))
    (setq myStr (replace-regexp-in-string "1" "₁" myStr))
    (setq myStr (replace-regexp-in-string "2" "₂" myStr))
    (setq myStr (replace-regexp-in-string "3" "₃" myStr))
    (setq myStr (replace-regexp-in-string "4" "₄" myStr))
    (setq myStr (replace-regexp-in-string "5" "₅" myStr))
    (setq myStr (replace-regexp-in-string "6" "₆" myStr))
    (setq myStr (replace-regexp-in-string "7" "₇" myStr))
    (setq myStr (replace-regexp-in-string "8" "₈" myStr))
    (setq myStr (replace-regexp-in-string "9" "₉" myStr))
    myStr
    ))

You might think it's a bit verbose, or inefficient. But i can't think of way to improve it. Can you come up with a better version?

See also: Semantics of Symbols: Use of Unicode Subscript Digit Characters @ http://xahlee.blogspot.com/2011/10/semantics-of-symbols-use-of-unicode.html

Solution

Rob Shinn suggested that the subscript chars can be obtained from the digit chars by a trip to their character set code points. I implemented his idea like this:

(defun replace-digits-by-subscript2 (string)
  (let ((myStr string) (ii 0))
    (while (< ii 10)
      (setq myStr (replace-regexp-in-string (char-to-string (+ ii 48)) (char-to-string (+ ii 48 8272)) myStr) )
      (setq ii (1+ ii))
      )
    myStr
    ))

This is a good solution, though a bit hack, because it depends on the code points in a charset. In this case, it can be carried out because their code point happens to have a constant difference. The char “0” has unicode code point 48, the char “1” has unicode code point 49, etc. The char “₀” has code point 8320, the char “₁” has code point 8321, etc. They have a constant difference of 8272.

Independently, Jon Snader (aka jcs) gave the following solution (irreal.org), similar in idea but without the loop. Here's the code:

(defun replace-digits-by-subscript3 (string)
  (replace-regexp-in-string "[0-9]"
    (lambda (v) (format "%c" (+ (string-to-number v) 8320))) string) )

This code is a excellent use of “format”. But more importantly, new to me is that:

• The second argument to “replace-regexp-in-string” can be a function. Elisp will feed this function the matched text and use the function's return value as replacement string.

Independently, Anonymous wrote this solution:

(defun replace-digits-by-subscript4 (string)
  (replace-regexp-in-string "[0-9]"
    (lambda (arg) (string (aref "₀₁₂₃₄₅₆₇₈₉" (string-to-number arg)))) string) )

This is a excellent solution, probably best of all possible solutions. It is very clever, yet doesn't rely on charset. It relies on this fact:

• The subscript chars can be indexed by the corresponding digits. e.g. (string (aref "₀₁₂₃₄₅₆₇₈₉" 3)).

The use of “aref” is also new to me. Salute to Anonymous!

Note that “aref” is for extracting elements of array (e.g. string, vector). “elt” is for extracting elements of any sequence (e.g. string, vector, list). “nth” is just for list.

(info "(elisp) Sequences Arrays Vectors")

See also: Emacs Lisp Tutorial: List & Vector.

2011-10-19 Extra thanks to Jon Snader for discussion.