Emacs Lisp: Count Words, Count Chars, Count Region

Perm url with updates: http://xahlee.org/emacs/elisp_count-region.html

Emacs Lisp: Count Words, Count Chars, Count Region

Xah Lee, 2010-03-23

A little elisp tip. Here's a short elisp i have been using since about 2006. It reports the number of words and chars in a text selection.

(defun count-region (beginning end)
  "Print number of words and chars in region."
  (interactive "r")
  (message "Counting ...")
  (save-excursion
    (let (wCnt charCnt)
      (setq wCnt 0)
      (setq charCnt (- end beginning))
      (goto-char beginning)
      (while (and (< (point) end)
                  (re-search-forward "\\w+\\W*" end t))
        (setq wCnt (1+ wCnt)))

      (message "Words: %d. Chars: %d." wCnt charCnt)
      )))

This code is largely from Introduction to Programming in Emacs Lisp by Robert J Chassell, when i was reading it sometimes in 2005. That tutorial is for people who never programed. It was quite frustrating to read, because for every sentence you are learning about emacs lisp, you have to scan some 20 pages of things you already know about programing, such as what's variables, assignment, syntax, etc. In the end, i didn't really read that book. This function is about the only thing i got out of it.

How It Works

Now let's explain about how this function works.

The function has this skeleton:

(defun count-region (pos1 pos2)
  "..."
  (interactive "r")
  ; ...
  )

This means, when you call the function with M-x, the region beginning as a integer will be fed to your variable “pos1”, and region's end will be fed to the argument “pos2”, automatically. This is caused by the line “(interactive "r")”.

The next part of the function is this:

(save-excursion
 (let (var1 var2 ...))
 (setq var1 ...)
 (setq var2 ...)
 ...
)

The “let” is lisp's way to have a block of local variables. We are going to be doing some cursor moving and searching. However, when the function count-region ended, the cursor should return to whatever its original position when user called our function. This is what the “save-excursion” does. Quote from its inline doc:

(save-excursion &rest body)

Save point, mark, and current buffer; execute body; restore those
things.
...

Now, to count the char, it is just the length of the beginning and ending position of the region. So, it is simple, like this:

(setq charCnt (- end beginning))

Now, we move the char to beginning of region, like this: “(goto-char beginning)”. The next part count the words, like this:

(while (and (< (point) end)
                  (re-search-forward "\\w+\\W*" end t))
        (setq wCnt (1+ wCnt)))

The “(< (point) end)” is for checking that the cursor havn't reached the end of region yet.

The “(re-search-forward "\\w+\\W*" end t)” means, keep moving the cursor forward by regex search a word pattern. The “end” argument there means don't search beyond the end of region. And the “t” there means don't report error if not found.

search-forward and re-search-forward are very important functions in elisp. I use them almost in all of my text processing script. If you are not familiar with them, lookup their inline doc. (use describe-function)

So, the above “while” blog, basically means keep moving the cursor and count words, until the cursor is at the end of region.

Finally, the program just print out the result, by:

(message "Words: %d. Chars: %d." wCnt charCnt)

Exercise

Try to write a version so that, when there is a text selection, count word and char in text selection, but if there's no text selection, just count the current line. You might want to read Emacs Lisp Idioms to refresh your memory about emacs's tech meaning of “region”, “active region”, transient-mark-mode.

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs