Regex Limits, or, Should You Read Mastering Regular Expressions?

Perm url with updates: http://xahlee.org/UnixResource_dir/writ/regex.html

Regex Limits, or, Should You Read Mastering Regular Expressions?

Xah Lee, 2010-05-06

On 2010-05, David wrote:

Go read O'Reilly's Mastering Regular Expressions by Jeffrey Friedl. ... good price, and explained a great deal.

I read the first edition in 1999. (see: Perl Book Reviews.)

Last i looked, the 3rd edition in 2006, they dropped coverage on emacs regex.

In general, i don't recommend the book if all you need is to master a regex for practical coding. I recommend the book highly if regex research is part of your job. e.g. you need to implement a regex, or get a intro of its history, theory, and available implementations.

The book gives a intro to the history and a bit of its original theory, but the large part is practical intro to regex engines as in unix grep, Perl, PHP, Java, “.NET”.

Regex is useful for matching simple words or phrases. When your need for text pattern matching is slightly more complex than phrases, such as parsing snippets of computer language source code, it quickly go beyond what regex is capable. For example, if your language contains nesting such as in lisp or html, xml, or if you frequently need to pattern match a chunk of text that span multiple lines, or you need to CORRECTLY search a pattern with many variations such as email address.

I've also came across a article that heavily criticize the book, and showing another regex engine that's much faster. (i haven't verified it or read it in depth) The article is Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) (2007-01), by Russ Cox, at Source.

Finally, any discussion of regular expressions would be incomplete without mentioning Jeffrey Friedl's book Mastering Regular Expressions, perhaps the most popular reference among today's programmers. Friedl's book teaches programmers how best to use today's regular expression implementations, but not how best to implement them. What little text it devotes to implementation issues perpetuates the widespread belief that recursive backtracking is the only way to simulate an NFA. Friedl makes it clear that he neither understands nor respects the underlying theory.

Also, today there's lots new techniques or tools for searching text pattern. One i recommend is Parsing Expression Grammar. There are 2 emacs draft implementations (on emacswiki.org), but both are hard to use and lack much documentation. (the “regular expression” we know today since unix grep of 1990s or earlier, is derived by happenstance from 4 decade old theory on parsing, based on then so-called theories of so-called automata)

If you need to use regex in emacs frequently, i just recommend reading the emacs info page on its regex in detail.

Similarly, if you need to use regex well in Perl, Python, PHP, i recommend their documentation. I have re-wrote the python one here: Pyhton Regex Documentation: String Pattern Matching.

If you wish to know some basic history and theory for curiosity, i recommend Wikipedia: Regular expression.

For some discussion on the limits of regex, see: Pattern Matching vs Lexical Grammar Specification.

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs