emacs: convert Unicode chars to ASCII (Zap Gremlins)

Perm URL with updates: http://ergoemacs.org/emacs/emacs_zap_gremlins.html

This page shows a emacs lisp command that changes Unicode string into ASCII. For example “passé” becomes “passe”, “voilà” becomes “voila”.

Emacs Lisp Solution

Here's a solution.

(defun asciify-text (ξstring &optional ξfrom ξto)
"Change some Unicode characters into equivalent ASCII ones.
For example, “passé” becomes “passe”.

This function works on chars in European languages, and does not transcode arbitrary Unicode chars (such as Greek, math symbols).  Un-transformed unicode char remains in the string.

When called interactively, work on text selection or current block.

When called in lisp code, if ξfrom is nil, returns a changed string, else, change text in the region between positions ξfrom ξto."
   (if (region-active-p)
       (list nil (region-beginning) (region-end))
     (let ((bds (bounds-of-thing-at-point 'paragraph)) )
       (list nil (car bds) (cdr bds)) ) ) )

  (require 'xfrp_find_replace_pairs)

  (let (workOnStringP
        (charChangeMap [
                        ["á\\|à\\|â\\|ä\\|ã\\|å" "a"]
                        ["é\\|è\\|ê\\|ë" "e"]
                        ["í\\|ì\\|î\\|ï" "i"]
                        ["ó\\|ò\\|ô\\|ö\\|õ\\|ø" "o"]
                        ["ú\\|ù\\|û\\|ü"     "u"]
                        ["Ý\\|ý\\|ÿ"     "y"]
                        ["ñ" "n"]
                        ["ç" "c"]
                        ["ð" "d"]
                        ["þ" "th"]
                        ["ß" "ss"]
                        ["æ" "ae"]
    (setq workOnStringP (if ξfrom nil t))
    (setq inputStr (if workOnStringP ξstring (buffer-substring-no-properties ξfrom ξto)))
    (if workOnStringP
        (let ((case-fold-search t)) (replace-regexp-pairs-in-string inputStr charChangeMap) )
      (let ((case-fold-search t)) (replace-regexp-pairs-region ξfrom ξto charChangeMap) )) ) )

You'll need xfrp_find_replace_pairs.el


Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”
Code originally by Teemu Likonen."
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes just accents.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

Source groups.google.com

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs