Emacs Lisp: a Function That Works on String or Region

Perm url with updates: http://xahlee.org/emacs/elisp_command_working_on_string_or_region.html

Emacs Lisp: a Function That Works on String or Region

Xah Lee, 2011-10-02

This article shows you how write a elisp text-transform function that can be used in 2 ways: ① change text in a buffer region. ② takes a string argument and returns a string.

Emacs lisp level: advanced.

Problem

Summary

For a function that transform text, find a way to code it so that:

  • ① When called interactively: When there is a text selection, transform the selected text. Otherwise, use the current paragraph as input.
  • ② When called in elisp code, the function can take a string and return a string, or, it can take buffer positions {ξfrom, ξto} and work on that region (i.e. replace the region with result).

For example, suppose you have a command “remove-vowel” that works on a region, but you also want a version “remove-vowel-string” which just takes a string input and returns a string. The string version is very convenient in lisp code. But i don't want to keep 2 functions. I want just one single function.

Detail

Been coding elisp for 5 years now, perhaps about 2 hours a day. I have perhaps ~30 commands that do text transformation on text under cursor. For examples: changing URL into a HTML linkchanging the filename under cursor into a HTML image linkasciify-stringtransform date formatchanging a region into a standard citation formatcompact-css-regionchange source code text to syntax colored html, … etc.

In the past year, i find that i often need 2 versions of a function. One version for working in a buffer, while another version simply work on string. The string version is very convenient and simple when used in elisp code.

This is becoming a problem, because for every text processing function i seem to need to write and maintain 2 versions. For example, let's say i have a function named “remove-vowel” that changes “something” to “smthng”. Typically, i'd write a “remove-vowel-string” that takes a string as argument and output a string. Then i write another version “remove-vowel” that is a interface wrapper, and calls “remove-vowel-string” to do the actual work.

Having 2 versions of every function is becoming annoying. So, today i thought about it and came up with a solution.

Solution

The solution is this: The function would take 1 argument, and 2 more optional arguments, like tis:

(defun remove-vowel (ξstring &optional ξfrom ξto) …)
  • If “ξstring” is given, then the function take that as input and returns a string.
  • If “ξstring” is nil, then the function takes {ξfrom ξto} positions and change the text in the region.

When “remove-vowel” is called interactively, simply feed the function {nil, ξfrom, ξto}.

This way, the function can be used as a string manipulation function, or it can be used as a buffer text changing function, with no penalties or inefficiencies i can think of. Here's how it's done using “remove-vowel” as example:

(defun remove-vowel (ξstring &optional ξfrom ξto)
  "Remove the following letters: {a e i o u}.

When called interactively, work on current paragraph or text selection.

When called in lisp code, if ξstring is non-nil, returns a changed string.
If ξstring nil, change the text in the region between positions ξfrom ξto."
  (interactive
   (if (region-active-p)
       (list nil (region-beginning) (region-end))
     (let ((bds (bounds-of-thing-at-point 'paragraph)) )
       (list nil (car bds) (cdr bds)) ) ) )

  (let (workOnStringP inputStr outputStr)
    (setq workOnStringP (if ξstring t nil))
    (setq inputStr (if workOnStringP ξstring (buffer-substring-no-properties ξfrom ξto)))
    (setq outputStr
          (let ((case-fold-search t))
            (replace-regexp-in-string "a\\|e\\|i\\|o\\|u\\|" "" inputStr) )  )

    (if workOnStringP
        outputStr
      (save-excursion
        (delete-region ξfrom ξto)
        (goto-char ξfrom)
        (insert outputStr) )) ) )

The meat of this function is just (replace-regexp-in-string "a\\|e\\|i\\|o\\|u\\|" "" inputStr). But let's see how the input/output is done.

Use of (interactive)

The “interactive” is a declaration that lets emacs know how arguments are passed to the function when it is used interactively. For example, it can be user input from a prompt in minibuffer, or from “universal-argument” 【Ctrl+u】. Or, how to interpret the input, as a string, number, a buffer name, file name, etc.

When a function has (interactive) (usually placed right after the doc string), it means the function is a command (i.e. it can be called by “execute-extended-command” 【Alt+x】).

When a function has (interactive "r"), then emacs will take the {beginning, ending} cursor positions of a region and feed it to the function as the first 2 arguments. The "r" is called the “interactive code”. See: (info "(elisp) Interactive Codes").

Normally, the argument to “interactive” is a string, but it can be other lisp expression. When it is a lisp expression, the return value of the expression must be a list, and the items are feed to the function as arguments.

So, in our case of “remove-vowel”, our argument to “interactive” is a lisp expression that return a list of 3 items. Like this:

(defun remove-vowel (ξstring &optional ξfrom ξto)
 "…"
 (interactive
    (if (region-active-p)
        (list nil (region-beginning) (region-end))
      (let ((bds (bounds-of-thing-at-point 'paragraph)) )
        (list nil (car bds) (cdr bds)) ) ) )
…
)

If there's a text selection (region is active), it sets “ξstring” to nil and {ξfrom, ξto} to region {begin, end} positions.

If there's no text selection (region is not active), it sets “ξstring” to nil and {ξfrom, ξto} to paragraph's {begin, end} positions.

In both cases, the “ξstring” is set to nil, so the function will work on the region text.

(See: Using thing-at-pointWhat's Region, Active Region, transient-mark-mode?)

Rest of Code

The above takes care of interactive use of the function.

Now, remember that our function takes 3 arguments: {ξstring, ξfrom, ξto}. The {ξfrom, ξto} are optional. When “ξstring” is given (i.e. not nil), the function will take that as input and return a string. Otherwise, it takes {ξfrom, ξto} as region positions and transform text in the buffer.

For clarity, first we set “workOnStringP”:

 (setq workOnStringP (if ξstring t nil))

then we set the “inputStr” like this:

 (setq inputStr (if workOnStringP ξstring (buffer-substring-no-properties ξfrom ξto)))

Now, it works on the string, like this:

(setq outputStr
 (let ((case-fold-search t))
  (replace-regexp-in-string "a\\|e\\|i\\|o\\|u\\|" "" inputStr) ) )

Then, it either returns the outputStr or just change the region in buffer, depending whether “workOnStringP” is true, like this:

(if workOnStringP
        outputStr
      (save-excursion
        (delete-region ξfrom ξto)
        (goto-char ξfrom)
        (insert outputStr) ))

The weird ξ you see in my elisp code is Greek x. I use unicode char in variable name for experimental purposes. You can just ignore it. (See: Programing Style: Variable Naming: English Words Considered Harmful.)

Popular posts from this blog

Browser User Agent Strings 2012

11 Years of Writing About Emacs

does md5 creates more randomness?