Inconsistency of Emacs Text-Searching Features

Few days ago, i wrote a tutorial on emacs text-searching commands. See: Emacs: Searching for Text in Files (grep, find). There are lots of them: {list-matching-lines, grep, rgrep, lgrep, grep-find, find-dired, dired-do-search, …}. However, they are inconsistent.

list-matching-lines (alias of occur) is implemented with elisp, while the others rely on unix utils {find, grep}.

The interface also isn't consistent. e.g. grep and grep-find (alias find-grep) both directly prompt you to enter unix command in one shot. But find-dired, rgrep, lgrep, do several prompts asking for: {search string, file extension, directory}. (though, they still require user to be familiar with the unix commands. e.g. When “find-dired” promps “Run find (with args):”, you have to give -name "*html" or -type f etc.)

People who are not unix sys admin or unix programer won't be able to search a folder. The unix find/grep parameters are quite complex, and emacs documentation doesn't try to explain them. You have to know about “man pages” to read its documentation, and even so it's pretty much incomprehensible.

Also, occur shows result with search term highlighted. Nice. But the grep command doesn't (at least not on emacs 23.2 for Windows).

It seems to me, they could all use elisp, with a single code engine, and with the same interface. The files to be searched can be from buffer list or dired marked files, or entered from prompt with emacs regex. The output can be a output like occur or can be dired list or buffer list. For example, you could have a command list-matching-lines, and then “list-matching-lines-directory” with options to do nested dirs, and options to use a pattern to filter files to search for, or options to get marked files in dired, and also options to show result as file list in dired.

Calling external util has lots problems, especially under Windows. First of all, on Windows, external util may not be installed. This is a show stopper. Searching text is a critical feature of emacs. Then, there's Cygwin, MinGW and other variety issues. Emacs has to go thru several layers to interface with them. On unix/linux including Mac OS X, there's also complex issues of identifying the version of grep/find used, which differ in options and behavior. I recall that rgrep didn't work on Mac OS X neither. Then, emacs has to process the output to make syntax coloring. Also, if your search string contains Unicode, then there's extremely complex problem about setting some emacs variables or shell environment variables related to character encoding.

On the other hand, these text searching task is what elisp is designed to do, is rather trivial to code. It would eliminate the whole external program problems.

The core are already in emacs. occur, dired-do-query-replace-regexp, dired-do-search, and probably something in eshell too. They are just scattered and not unified.

By the way, doing it inside elisp is not that slow. Slower than those written in C, but remeber that emacs has to parse their output and the communication overhead might just make it up. I've written a grep command in elisp (See: How to Write grep in Emacs Lisp.). My script is more or less equivalent to unix's grep -c. Just tested now, calling it on a dir with subdir with total of 1592 files. Using a search word that returns over a thousand files. Both my script and unix util are close to 3 seconds. (e.g. call shell in emacs, then give grep -c */*html) Calling grep -c */*html by shell-command is less than 1 second. Calling grep -c */*html by emacs's grep is about 3 seconds too.

I think am going to polish my elisp grep script so it's going to be a general emacs command for all things grep. To begin with, i shall improve it for my own use to the degree that i never need to call unix find/grep directly or indirectly when in emacs.

Has someone written other grep util in elisp other than the above mentioned? What do you think about making emacs not rely on unix find/grep?

comp.emacs discussion

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs