2010-05-08

My Dick (song)

Post deleted. See here instead: http://xahlee.org/music/my_dick.html

2010-05-07

Emacs Line Return And Dos, Unix, Mac, All That ^M ^J

Perm url with updates: http://xahlee.org/emacs/emacs_line_ending_char.html

Emacs Line Return and Windows, Unix, Mac, All That ^M ^J ^L

Xah Lee, 2010-05-07

This page explains line ending conventions Windows, Unix, Mac, and how to change them with emacs.

Here's a short table about different newline convention and how to input them:

NameASCII Codestring notationCaret NotationAbbrevInput method
line feed10\n^JLFCtrl+q Ctrl+j
carriage return13\r^MCRCtrl+q Ctrl+m or Ctrl+q Enter

Following is the newline convention in different operating systems.

Operating SystemNewline ConventionNotes
Unix, Linux, Mac OS X^JMac OS X prefers ^J, but accept the Mac OS Classic's ^M too.
Windows^M^J
Mac OS Classic^M

Why does emacs show ^M in a buffer?

The “^M” is ASCII caret notation for unprintable Carriage return char (ASCII 13). If emacs shows that, it's probably because you have mixed characters of ^M and ^J and emacs cannot interpret them consistently as newlines.

To fix it, call “set-buffer-file-coding-system”, then give one of: “mac”, “dos”, “unix”. Then, save the file. If that does not fix it, you can use find and replace to remove it manually.

How to delete ^M manually?

Call “query-replace” 【Alt+5】, then type the ascii 13 char by 【Ctrl+q Ctrl+m】 for the find string. For replacement string just press Enter for none.

What does 【Ctrl+q】 mean?

The 【Ctrl+q】 is the shortcut for the command “quoted-insert”. It will let you enter the next charater literally. For example, to type a literal tab, press 【Ctrl+q】 then the Tab key.

Some characters do not have a representation on the keyboard. For example, the Enter key is either Line Feed or Carriage Return, depending on the application, but cannot be both at the same time.

To input unprintable ascii chars, you can press 【Ctrl+q】 first, then type a letter indicated by the char caret notation. For example, the Carriage Return has caret notation of “^M”, so, press 【Ctrl+q Ctrl+m】 will type it. Tab is “^I”, so 【Ctrl+q Ctrl+i】 inserts a tab.

For detail about the unprintable ASCII chars, their printable notations, input methods, see: The Confusion of Emacs's Keystroke Representation.

Can i change newline convention from Windows to unix by just deleting ^M?

Not really. When emacs opens a file, it represent all line returns by (ascii 10; line feed; unix convention), doesn't matter what's the actual newline convention in the file. If emacs displays “^M” (ascii 13; carriage return), that's because the file has mixed line endings.

When you save a file, emacs automatically use the right newline when writing the file to disk.

Also, emacs may automatically add a newline to the end of the file when you save it. Which character is added as newline depends on the current file encoding system. So, if you manually change the newline char, emacs may add one that is inconsistent to what you expect. The auto adding newline is controlled by the variables “require-final-newline” and “mode-require-final-newline”.

If you want to convert line ending from different OS, best thing to do is call “set-buffer-file-coding-system”, with a value of “unix”, “mac”, “dos”. (on Mac OS X, use “unix”.)

How to know which newline convention is used by emacs for the current file?

Call “describe-variable” 【Ctrl+h v】, then “buffer-file-coding-system”.

How to quickly find out what ASCII char are those ^M ^J ^L?

Place your cursor on it, then call “describe-char”.

How to change file line endings between Windows/Unix/Mac?

Open the file, then call “set-buffer-file-coding-system” 【Ctrl+x Enter f】. When it prompts you for a coding system, type one of: “mac”, “dos”, “unix”. Then, when you save the file, it'll be saved with the proper encoding for newlines.

To do it batch on a list of files, use the following lisp code:

(defun to-unix-eol (fpath)
  "Change file's line ending to unix convention."
  (let (mybuffer)
    (setq mybuffer (find-file fpath))
    (set-buffer-file-coding-system 'unix) ; or 'mac or 'dos
    (save-buffer)
    (kill-buffer mybuffer)
   )
)

(mapc 'to-unix-eol
 (list
"~/jane/myfile1"
"~/jane/myfile2"
"~/jane/myfile3"
; ...
  )
)

To use the code, first edit the list of files above. Then, select all the code, type 【Alt+x eval-region】. That's it.

If you want the function to work on marked files in dired, then use the following code:

(defun to-unix-eol-on-marked-files ()
  "Change to unix line ending for marked (or next arg) files."
  (interactive)
  (mapc 'to-unix-eol (dired-get-marked-files))
)

Select the code, then call “eval-region”, then 【Alt+x dired】, then press “m” to mark the files you want, then call “to-unix-eol-on-marked-files”.

Thanks to Stefan Monnier for a major tip on this newline issue in emacs.

Was this page useful? If so, please do donate $3, thank you donors!

2010-05-06

Regex Limits, or, Should You Read Mastering Regular Expressions?

Perm url with updates: http://xahlee.org/UnixResource_dir/writ/regex.html

Regex Limits, or, Should You Read Mastering Regular Expressions?

Xah Lee, 2010-05-06

On 2010-05, David wrote:

Go read O'Reilly's Mastering Regular Expressions by Jeffrey Friedl. ... good price, and explained a great deal.

I read the first edition in 1999. (see: Perl Book Reviews.)

Last i looked, the 3rd edition in 2006, they dropped coverage on emacs regex.

In general, i don't recommend the book if all you need is to master a regex for practical coding. I recommend the book highly if regex research is part of your job. e.g. you need to implement a regex, or get a intro of its history, theory, and available implementations.

The book gives a intro to the history and a bit of its original theory, but the large part is practical intro to regex engines as in unix grep, Perl, PHP, Java, “.NET”.

Regex is useful for matching simple words or phrases. When your need for text pattern matching is slightly more complex than phrases, such as parsing snippets of computer language source code, it quickly go beyond what regex is capable. For example, if your language contains nesting such as in lisp or html, xml, or if you frequently need to pattern match a chunk of text that span multiple lines, or you need to CORRECTLY search a pattern with many variations such as email address.

I've also came across a article that heavily criticize the book, and showing another regex engine that's much faster. (i haven't verified it or read it in depth) The article is Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) (2007-01), by Russ Cox, at Source.

Finally, any discussion of regular expressions would be incomplete without mentioning Jeffrey Friedl's book Mastering Regular Expressions, perhaps the most popular reference among today's programmers. Friedl's book teaches programmers how best to use today's regular expression implementations, but not how best to implement them. What little text it devotes to implementation issues perpetuates the widespread belief that recursive backtracking is the only way to simulate an NFA. Friedl makes it clear that he neither understands nor respects the underlying theory.

Also, today there's lots new techniques or tools for searching text pattern. One i recommend is Parsing Expression Grammar. There are 2 emacs draft implementations (on emacswiki.org), but both are hard to use and lack much documentation. (the “regular expression” we know today since unix grep of 1990s or earlier, is derived by happenstance from 4 decade old theory on parsing, based on then so-called theories of so-called automata)

If you need to use regex in emacs frequently, i just recommend reading the emacs info page on its regex in detail.

Similarly, if you need to use regex well in Perl, Python, PHP, i recommend their documentation. I have re-wrote the python one here: Pyhton Regex Documentation: String Pattern Matching.

If you wish to know some basic history and theory for curiosity, i recommend Wikipedia: Regular expression.

For some discussion on the limits of regex, see: Pattern Matching vs Lexical Grammar Specification.

where and which emacs to download for windows and mac?

Perm url with updates: http://xahlee.org/emacs/which_emacs.html

Which Emacs to Download?

Xah Lee, 2010-05-06

This page is a guide on what emacs distributions are there, which one you should use, for Windows and Mac.

Windows

GNU Emacs

This is the official GNU Emacs, build for Windows.

http://ftp.gnu.org/pub/gnu/emacs/windows/

Just download, unzip, and use right there. No installation step needed.

NTEmacs

NTEmacs is the latest build of GNU Emacs for Windows. Plain GNU Emacs for Windows.

http://ntemacs.sourceforge.net/

EmacsW32+Emacs

This is a emacs distro build by Lennart Borgman. It is a emacs with Windows specific patches, with some extra elisp packages bundled.

Main feature includes the ability to use the Alt key to invoke menu like other Windows apps; fixes such as to make printing easier, makes emacs ftp work..., bundled nXhtml mode for mixed HTML/CSS/Javascript code, ...

Download at: ourcomments.org.

Download the patched version. Then, run the installer.

ErgoEmacs

Main feature includes a ergonomic based emacs keyboard shortcut set, fixes so that many emacs features work in Windows, bundles the whole unix command tool utilities (grep, find, diff, patch... from MinGW), and bundles many Windows specific language modes. For detail, see: ErgoEmacs Features.

Download at: ErgoEmacs.org.

Download and run the installer. (disclaimer: me and David Capello made this one)

You shouldn't worry about downloading multiple version of emacs that might interfere with each other. I have all the above three installed.

Mac

GNU Emacs

EmacsForMacOSX.com. This is plain GNU Emacs, built for Mac OS X, by David Caldwell.

Alternatively, you can get Mac OS X built by Ian Eure at http://atomized.org/wp-content/cocoa-emacs-nightly/

Aquamacs Emacs

Aquamacs Emacs is a emacs with complete Mac user interface as much as possible.

Aquamacs's interface is similar to BBEdit or TextMate or in fact any modern Mac software. It uses the same keyboard shortcuts Mac users are familiar. Also uses tabs, multiple windows, pop up dialogs, Apple's help documentation system, etc., and bundles many extra packages, in particular LaTeX and AUCTeX support.

If you never used emacs before, or never used a text terminal or unix, this is a great choice.

Download at: aquamacs.org.

Carbon Emacs

Carbon Emacs is a emacs build designed for Mac. You get all options such as using Cmd or Opt key for Meta, drag and drop support, and anything specific for Mac, and with many bundled packages including AUCTeX, etc.

If you are a traditional emacs user, and want a emacs on the Mac, Carbon Emacs is a good choice.

Download at: Carbon Emacs.

My Recommendations

Personally, on Windows, i use my own distribution ErgoEmacs, of course. On the Mac, my choice is Carbon Emacs.

2010-05-05

mac os x mouse too slow

Perm url with updates: http://xahlee.org/comp/mac_osx_mouse_too_slow.html

Mouse Speed Too Slow in Mac OS X?

Xah Lee, 2010-05-05

The mouse on Mac OS X is often too slow, even if you have used the Preference pane to set the tracking speed to the fastest. You can fix this. Start Terminal.app, then type:

defaults read -g com.apple.mouse.scaling

The above will show your current scaling value. To make it faster, do:

defaults write -g com.apple.mouse.scaling 5

You need to re-login for this to take effect.

2010-05-04

List Matching Lines and Delete Matching Lines in Emacs

Perm url with updates: http://xahlee.org/emacs/elisp_list_matching_lines.html

List Matching Lines and Delete Matching Lines in Emacs

Xah Lee, 2010-05-03

Emacs has a very useful command list-matching-lines. For example, open a file, then type “Alt+x list-matching-lines”. Then, give a word. Emacs will list all lines containing that word.

You can click on any matched line in the output, then emacs will put cursor at the position of the occurrence in your file.

There are also several other line processing commands for the current buffer that i use often:

list-matching-lines
delete-matching-lines
delete-non-matching-lines

sort-lines
sort-numeric-fields
reverse-region

Shortcuts and Aliases

If you use them often, you can give them a keyboard shortcut, like this:

(global-set-key (kbd "<f6>") 'list-matching-lines) ; F6 key
(global-set-key (kbd "M-8") 'list-matching-lines) ; Alt+8

For defining more complex key combos, see: How to Define Keyboard Shortcuts in Emacs.

My F key and Alt+num spots are already filled. So, i use a short command name alias instead. Define it like this:

(defalias 'lml 'list-matching-lines)
(defalias 'dml 'delete-matching-lines)

Delete Starts at Cursor Position or Text Selection

delete-matching-lines and delete-non-matching-lines starts at the line your cursor is on. So, if you want deletion to happen for the whole file, you need to move to the beginning of file first.

Also, if you have a text selection, the deletion happens in the text selection only.

Regex

All these commands uses regex to search. So, if you simply want to search plain words or phrases, and if your phrase contains any of regex characters, you need to escape them. Here some commonly used regex characters that you'll need to replace:

your search containsreplace it with
[\[
]\]
\\\
+\+
*\8
?\?
.\.

See also: common patterns in emacs regex.

Letter Case Sensitivity

In all these commands, if your search word contains upper case letters, then the search is automatically case sensitive. Otherwise, it is not case sensitive.

If you want the cases to be case sensitive (that is, literally what you gave), then you need to set the variable search-upper-case to “nil” (nil means false).

You can see the current value of a variable by the command describe-variable.

You can change a variable's value by the command set-variable.

Elisp Exercise

In recent months, i use list-matching-lines and delete-matching-lines often, in processing chat logs from Second Life and my own processed web logs. Of course, i can use unix shell tools such as “grep myPhrase myFile > outFile”, even inside emacs directly, or, i can easily write a Perl or Python script. However, the tasks that need to be done are spontaneous, and requires interactive feedback. For example, when i see the matching lines i want, i may need to call delete-matching-lines on the result to narrow down the lines i want. And this process may repeat. Calling shell to generate output requires some ten or twenty extra key press each time i need to do this. This is why emacs is so useful. (see: Text-Soup Situation and Lumberjack-Tasks.)

Here are some ideas of commands related to list-matching-line that would make emacs even more useful in my situation. They are good elisp exercises. Each of the following will take me 5 to 20 minutes to write. I'll be probably be writing them soon.

Write a list-non-matching-lines.

Often, you want to list lines by word or phrase, not regex. If your search text often contains regex chars, it'll take you extra ~3 seconds to escape them. Write a version of list-matching-lines that does not use regex. (hint: write a wrapper that calls list-matching-lines, using regexp-quote to quote the input.)

When using list-matching-lines, it would be nice if the current word under cursor will be the default search text. Or, if there's a text selection, use the text selection as default search phrase. This will save you 5 or more keystrokes or few seconds to mark and copy and paste. (hint: Emacs Lisp Idioms)

Often, i need to see what lines contains a certain word, then delete those lines. Effectively, i call list-matching-lines, then call delete-matching-lines. When i do this many times, the repetition in keystroke gets painful. It'd be nice, to have something like split-buffer-by-matching-lines, so that, it delete matching lines and show the deleted lines in a different buffer.

2010-05-02

Elements of Style in English

Perm url with updates: http://xahlee.org/Periodic_dosage_dir/bangu/elements_of_style.html

Elements of Style in English

Xah Lee, 2010-05-02

Was reading Wikipedia on The Elements of Style. Here's a interesting quote:

Edinburgh University linguistics professor Geoffrey Pullum has criticized The Elements of Style, saying:

The book’s toxic mix of purism, atavism, and personal eccentricity is not underpinned by a proper grounding in English grammar. It is often so misguided that the authors appear not to notice their own egregious flouting of its own rules . . . It’s sad. Several generations of college students learned their grammar from the uninformed bossiness of Strunk and White, and the result is a nation of educated people who know they feel vaguely anxious and insecure whenever they write 'however' or 'than me' or 'was' or 'which,' but can’t tell you why.[9]

Specifically, Pullum says Strunk and White were misguided in identifying the passive voice as incorrect, and in proscribing established usages such as the split infinitive and the use of "which" in a restrictive relative clause.[9] He also frequently criticizes Elements on Language Log, a linguists' blog focusing on portrayals of language in the popular media, for promoting linguistic prescriptivism and hypercorrection among English speakers,[10] referring to it as "the book that ate America's brain".[11]

The Boston Globe's review of the 2005 illustrated edition describes it as an "aging zombie of a book ... a hodgepodge, its now-antiquated pet peeves jostling for space with 1970s taboos and 1990s computer advice."[12]

Quite funny, and i'd agree. Much of the mouthings of the writing establishment is shit.

But also, from this i learned the word Atavism. Also, the term Hypercorrection. It is great to know the word hypercorrection, because that gives me another embellished artillery against the grammarian and pedant sophomorons.

Also, from Wikipedia's citation and references, i learned of Language Log. Yay. A blog dedicated to fucking with pedantic idiots, which i've been doing for the past decade. The blog itself is here: http://languagelog.ldc.upenn.edu/nll/.

I kept on reading a bit on Wikipedia about the various style guides. Fowler's Modern English Usage, seems like one i can endorse. There's also The Chicago Manual of Style, AP Stylebook, MLA. Actually, i think most of these so-called style “guides” are much ado about nothing. The only firm advice i can give about writing, besides knowing basic grammar and spelling, is: Study logic and critical thinking, obtain a analytical mind. This, will improve your writing by far, than a writing “style” per se. As to a writing guide, the only i can firmly recommend is: Simplified English. This is far more effective than any established style guides. Of course, all these style-talk about how to form words and punctuations into cogent sentences, are in the context of formal writing, in science, journal, reports, documentations, tutorials, textbooks, as opposed to literary tomfoolery as in essaying, novels, poetry, of which, pigs fly.