How to Replace Multiple String Pairs in Emacs Lisp Buffer

Perm url with updates: http://xahlee.org/emacs/elisp_replace_string_region.html

How to Replace Multiple String Pairs in Emacs Lisp Buffer

Xah Lee, 2010-08-17

This article is a detailed tutorial for emacs lisp programers, on how to do multiple pairs of find/replace string for a given region in a buffer.

Problem

You have a given region in a buffer. You want to do one or more pairs of find/replace strings. For examples:

html entities
& ↔ &
< ↔ &lt;
> ↔ &gt;
url percentage encoding
“ ”  ↔ “%20”
~ ↔ %e7
_ ↔ %5f
writing math
alpha ↔ α
beta ↔ β
gamma ↔ γ
quote/unquote in elisp string
" ↔ \"
\ ↔ \\
\" ↔ "
converting paths (Unix, Windows, UNC, URL, local)
\ ↔ /
\ ↔ \\
C:\Users\mary ↔ ~/
file:///C:/Users ↔ C:
"../ ↔ "http://

Some of these tasks are generalized and well defined, so there may be existing lisp package to deal with them. Examples:

  • URL percentage encoding
  • HTML entities encoding
  • Windows/Unix path conversion
  • URL/UNC conversion

However, usually you don't want generalized solution, because your input is not a well defined file like XML, and you know exactly what you need.

The normal idiom to do find replace in a region is like this:

(defun replace-html-chars-region (start end)
  "Replace “<” to “&lt;” and some other chars in HTML.
This works on the current region."
  (interactive "r")
  (save-restriction 
    (narrow-to-region start end)
    (goto-char (point-min))
    (while (search-forward "&" nil t) (replace-match "&amp;" nil t))
    (goto-char (point-min))
    (while (search-forward "<" nil t) (replace-match "&lt;" nil t))
    (goto-char (point-min))
    (while (search-forward ">" nil t) (replace-match "&gt;" nil t))
    ) )

Basically, you narrow to region, and for each pair you use a while loop. This is quite cumbersome.

It would be nicer, if you can write it like this:

(defun replace-html-chars-region (start end)
  "Replace “<” to “&lt;” and some other chars in HTML.
This works on the current region."
  (interactive "r")
  (replace-pairs-region start end
 '(
 ["&" "&amp;"]
 ["<" "&lt;"]
 [">" "&gt;"]
 )
 ))

Solution

Here are several elisp functions that make this easy.

  • replace-pairs-in-string
  • replace-regexp-pairs-in-string
  • replace-pairs-region
  • replace-regexp-pairs-region

For each function, there's a plain text version and a regex version. Because, it is often a pain and error-prone to use regex when all you need is fixed string find/replace.

Each function also has a string and region version. The string version works on a given string, the region works on a region in buffer. This saves you time because when all you got is a string, you don't have to create a buffer just to do some find/replace then turn back to string. Same when you have a buffer to begin with.

The region versions call the string versions to do their work. This makes the code more manageable. That is:

  • “replace-regexp-pairs-region” prepares a string then calls “replace-regexp-pairs-in-string” then put it back in buffer.
  • “replace-pairs-region” prepares a string then calls “replace-pairs-in-string” then put it back in buffer.

Both the string versions call the builtin elisp function “replace-regexp-in-string” to do their work.

Note that emacs does not have a plain text version analogous to “replace-regexp-in-string”. So, when you want plain text find/replace, you warp “regexp-quote” on your string, then call “replace-regexp-in-string”.

Code

The code can be downloaded here: code.google.com.

The code is 130 lines (not counting comment header). Here's the main code that does the bulk of the work.

(defun replace-pairs-in-string (str pairs)
  "Replace string STR by find/replace PAIRS sequence.

Example:
 (replace-pairs-in-string \"abcdef\"
  '([\"a\" \"1\"] [\"b\" \"2\"] [\"c\" \"3\"]))  ⇒ “\"123def\"”.

The search strings are not case sensitive.
The replacement are literal and case sensitive.

If you want search strings to be case sensitive, set
case-fold-search to nil. Like this:

 (let ((case-fold-search nil)) 
   (replace-regexp-in-string-pairs ...)

Once a subsring in the input string is replaced, that part is not changed again.
For example, if the input string is “abcd”, and the pairs are
a → c and c → d, then, result is “cbdd”, not “dbdd”.
See also `replace-pairs-in-string-recursive'.

This function calls `replace-regexp-in-string' to do its work.

See also `replace-regexp-pairs-in-string'."
  (let (ii (mystr str) (randomStrList '()))
    (random t) ; set a seed

    ;; generate a random string list for intermediate replacement
    (setq ii 0)
    (while (< ii (length pairs))
      (setq randomStrList (cons
                    (concat "ㄓ" (number-to-string (random)) "ㄘ")
 ; use rarely used unicode char to prevent match in input string
                    randomStrList ))
      (setq ii (1+ ii))
      )

    ;; replace each find string by corresponding item in random string list
    (setq ii 0)
    (while (< ii (length pairs))
      (setq mystr (replace-regexp-in-string
                   (regexp-quote (elt (elt pairs ii) 0))
                   (elt randomStrList ii)
                   mystr t t))
      (setq ii (1+ ii))
      )

    ;; replace each random string by corresponding replacement string
    (setq ii 0)
    (while (< ii (length pairs))
      (setq mystr (replace-regexp-in-string
                   (elt randomStrList ii)
                   (elt (elt pairs ii) 1)
                   mystr t t))
      (setq ii (1+ ii))
      )
    
    mystr))

Find/Replace Feedback Loop Problem

One interesting issue about multiple find/replace is that the input string is recursively replaced, and you may end up with a substring that's not in the original input string nor in any of the find/replace pairs.

For example, if the input string is “abcd”, and the pairs are “a → c” and “c → d”, then, result is “dbdd”, though most of the time you want “cbdd”.

The function “replace-pairs-in-string” will not do feedback loop. It guarantees that a replacement is done IF AND ONLY IF the original input string contains a substring in one of your find string.

This is important when you do complex text processing such as transforming HTML4 to HTML5 or HTML to XHTML.

For a version that does feedback, use “replace-pairs-in-string-recursive”, also in the package.

To implement the non-feedback version, i first replace the string to a intermediate random string. For example, suppose the input pairs are “a → b” and “c → d”. Then, the code will actually do this:

  • “a → randomString1”
  • “c → randomString2”
  • “randomString1 → b”
  • “randomString2 → d”

The random string so generated should not happen in the input string. This is achived by using rarely used char in Unicode plus a random number, for the intermediate string.

Applications

Here are some commands i defined that make use of the replacement pair functions.

(defun space2underscore-region (start end)
  "Replace space by underscore in region."
  (interactive "r")
(replace-pairs-region start end '([" " "_"])))
(defun underscore2space-region (start end)
  "Replace underscore by space in region."
  (interactive "r")
(replace-pairs-region start end '(["_" " "])))
(defun replace-mathematica-symbols-region (start end)
  "Replace Mathematica's special char encoding to unicode of the same semantics.
For example:
 \\=\\[Infinity] ⇒ ∞
 \\=\\[Equal] ⇒ =="
  (interactive "r")
  (replace-pairs-region start end '(
 ["\\[Infinity]" "∞"]
 ["\\[Equal]" "=="])))
(defun replace-greek-region (start end)
  "Replace math symbols. e.g. alpha to α."
  (interactive "r")
(replace-pairs-region start end '(
["alpha" "α"]
["beta" "β"]
["gamma" "γ"]
["theta" "θ"]
["lambda" "λ"]
["delta" "δ"]
["epsilon" "ε"]
["omega" "ω"]
["Pi" "π"])))
(defun replace-html-chars-region (start end)
  "Replace “<” to “&lt;” and some other chars in HTML.
This works on the current region."
  (interactive "r")
  (replace-pairs-region start end
 '(
 ["&" "&amp;"]
 ["<" "&lt;"]
 [">" "&gt;"]
 )
 ))
(defun escape-quotes-region (start end)
  "Replace \" by \\\" in region."
  (interactive "r")
  (replace-pairs-region start end '(["\"" "\\\""])))
(defun unescape-quotes-region (start end)
  "Replace \\\" by \" in region."
  (interactive "r")
  (replace-pairs-region start end '(["\\\"" "\""])))
(defun replace-curly-apostrophe-region (start end)
  "Replace some single curly quotes ‘ or ’ to '."
  (interactive "r")
(replace-pairs-region start end '(
["‘tis" "'tis"]
["’s" "'s"]
["’d" "'d"]
["n’t" "n't"]
["’ve" "'ve"]
["’ll" "'ll"]
["’m" "'m"]
["’re" "'re"]
["s’ " "s' "])))
(defun replace-straight-quotes-region (p1 p2)
  "Replace straight double quotes to curly ones
Also replace “--” by “—”."
  (interactive "r")
  (let (quoteReplaceMap)
    ;; a map that helps converting straight quotes to double quotes in texts
    ;; (e.g. novels). Note: order is important since this is huristic.
    (setq quoteReplaceMap
          '(
["--" " — "]
["  —  " " — "]
[">\"" ">“"]
["(\"" "(“"]
[" \"" " “"]
["\" " "” "]
["\"," "”,"]
["\"." "”."]
["\"?" "”?"]
["\";" "”;"]
["\":" "”:"]
["\")" "”)"]
["\"]" "”]"]
[".\"" ".”"]
[",\"" ",”"]
["!\"" "!”"]
["?\"" "?”"]
;; ";
["\n\"" "\n“"]
[">\'" ">‘"]
[" \'" " ‘"]
["\' " "’ "]
["\'," "’,"]
[".\'" ".’"]
["!\'" "!’"]
["?\'" "?’"]
["(\'" "(‘"]
["\')" "’)"]
["\']" "’]"]
[" ‘em" " 'em"]))

    (replace-pairs-region p1 p2 quoteReplaceMap)))

Emacs is beautiful!

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs