Emacs: How to Associate a File with a Major Mode

Perm url with updates: http://xahlee.org/emacs/emacs_auto-activate_a_major-mode.html

Emacs: How to Associate a File with a Major Mode?

Xah Lee, 2008-07, 2011-07-29

This page tells you how emacs determine what major mode to load when you open a file, and how you can change the setup.

Note that when you install a new package, some has the file association setting code within the package, while others ask you to put a few lines in your emacs init file instead.

How Emacs Determines Which Major Mode to Load

Emacs determines what mode to activate by the following mechanisms, in order. If a match is found, the process stops.

  1. Look for a special emacs-specific syntax in the file. For example: if first line in the file contains -*- mode: xyz-*-, emacs will load “xyz-mode”. This is from a general mechanism for emacs to load elisp variables. (See: (info "(emacs) File Variables").) This has the top priority, but this mechanism is not the usual way for programing language files to associate with a major mode.
  2. Check the first line in the file for unix “shebang” syntax (e.g. #!/usr/bin/perl) and match it with interpreter-mode-alist.
  3. Trys to match first line text with magic-mode-alist. (As of emacs 23.2.1, by default this list is empty.)
  4. Match the file name with auto-mode-alist.

(info "(emacs) Choosing Modes")

If you are installing a new package, or want to modify loading one of your special files, the most practical way is to add values to “magic-mode-alist” or “auto-mode-alist”.

magic-mode-alist for First Line

The “magic-mode-alist” is for associating first line of a file with a mode. (when the line otherwise isn't the unix shebang #!… or embedded elisp variable.) Use it like this:

;; if first line of file matches, activate nxml-mode
(add-to-list 'magic-mode-alist '("<!DOCTYPE html .+DTD XHTML .+>" . nxml-mode) )

The “magic-mode-alist” is variable. Its value is a list of pairs. ((info "(elisp) Association Lists")) The first element is a regex string, the second is a mode name (of type symbol). Emacs tries to match the first line of a file to values in “magic-mode-alist”. If there's a match, it sets the buffer to that mode.

auto-mode-alist for File Name

“auto-mode-alist” is for matching file name. Use it like this:

;; setup files ending in “.js” to open in js2-mode
(add-to-list 'auto-mode-alist '("\\.js\\'" . js2-mode))

“auto-mode-alist” is a variable. Its value is a list of pairs. First element is a regex string. The second element is a mode name.

Note: in the elisp code above, the double backslash in the string \\.js\\' is used to escape the backslash. So, the regex engine just got \.js\'. The \. is to match a period. The \' is one of emacs special regex syntax, to match end of a string.

See also: emacs regex tutorial.

(info "(elisp) Regexp Backslash")

You can see what are the values of “magic-mode-alist” or “auto-mode-alist” by calling “describe-variable”.

Back to How to Install Emacs Packages.


Perl, Unicode, Unicode 6 Fonts

Perl, Unicode, Unicode 6 Fonts

3 Bleeding-edge perl articles on unicode.

  • Perl Unicode Essentials (2011-07-26) By Tom Christiansen. @ training.perl.com
  • Unicode in Perl Regexes (2011-07-26) By Tom Christiansen. @ training.perl.com
  • Unicode Support Shootout: The Good, The Bad, & the (mostly) Ugly (2011-07-26) By Tom Christiansen. @ training.perl.com

According to one of the article above, at least one of them is inspired from this stackoverflow question: Why does modern Perl avoid UTF-8 by default? Source stackoverflow.com

Also, checkout

It contain font for some unicode 6 glyphs and lots of others that won't be in normal so-called unicode fonts. For example and explanation of some of these glyphs, see: Unicode 6 Emoticons. For overall best unicode fonts, see: Best Fonts for Unicode.

2011-07-29 Thanks to Andrew Kirkpatrick.

Portishead - Machine Gun (Song)

Perm url with updates: http://xahlee.org/music/portishead_machine_gun.html

Portishead - Machine Gun (Song)

Xah Lee, 2011-07-29

A fantastic song by Portishead: Machine Gun.

“Portishead - Machine Gun”

It is industrial.

I saw a saviour
a saviour come my way
I thought I'd see it
at the cold light of day
but now I realise that I'm 
Only for me

if only I could see
You turn myself to me
and recognise the poison in my heart
there is no other place
no one else I face
remedy, we'll agree, is how I feel
here in my reflecting
What more can I say?
for I am guilty
for the voice that I obey
too scared to sacrifice a choice
chosen for me

if only I could see
You turn myself to me
recognise the poison in my heart
there is no other place
no one else I face
The remedy, to agree, is how I feel

Google Logo Boobs

Perm url with updates: http://xahlee.org/funny/Google_image.html

Google Logo Boobs

Xah Lee, 2011-07-29

google boobs
Google Boobs?

Original source unknown. If you know the original, or background info, please comment. Are they Google employees? Am thinking the guy on the right looks like Vic Gundotra. Vic Gundotra photos on g+

Emacs Tip for YASnippet: Expand Input with Hyphen

Perm url with updates: http://xahlee.org/emacs/emacs_tip_yasnippet_expand_whole_hyphenated_word.html

Emacs YASnippet Tip: Expand Whole hyphenated-word as Input

Xah Lee, 2011-07-29

Emacs Tip for YASnippet: Expand Whole hyphenated-word as Input

In YASNippet, you can define your own templates. For example:

(buffer-substring-no-properties START▮ END)

But sometimes your word contains a hyphen, but upon expansion, it uses only part of the word as input. For example, you want:

(buffer-substring START▮ END)

But you get:

buffer-(substring STRING▮ FROM &optional TO)

How to fix this?

Put the following in your init file:

;; 2011-07-29 yasnippet. Make the “yas/minor-mode”'s expansion behavior to take input word including hyphen.
(setq yas/key-syntaxes '("w_" "w_." "^ ")) ; default is '("w" "w_" "w_." "^ ") as of 2011-07-29

Thanks to João Távora. Source groups.google.com.

Girl of Split Tongue

Perm url with updates: http://xahlee.org/funny/split_tongue.html

Girl of Split Tongue

Xah Lee, 2011-07-28

“Split tongue tricks- new and improved” (dropaheart37)


emacs list-non-matching-lines

You know about the following functions?

  • list-matching-lines
  • delete-matching-lines
  • delete-non-matching-lines

How about

  • list-matching-lines-all-buffers
  • list-non-matching-lines
  • list-matching-lines-no-regex
  • split-buffer-by-matching-lines

See update: List Matching Lines and Delete Matching Lines in Emacs for detail.

Eating Live Squid

Perm url with updates: http://xahlee.org/funny/eating_live_squid.html

Eating Live Squid

Xah Lee, 2011-07-28

A very creepy video.

“Dancing squid bowl dish in Hakodate” (richayanami)

After watching that, the question from commentators is whether the squid is alive thus “cruelty to animal”. Now, watch the following video. The question becomes moot.

“How To Eat Live Squid” (MichaelDola)


Emacs Lisp: Chinese character Reference Linkify

Perm url with updates: http://xahlee.org/emacs/elisp_chinese_char_linkify.html

Emacs Lisp: Chinese Character Reference Linkify

Xah Lee, 2011-07-27

Another quick elisp that enhance my productivity 10-fold!


I write many blogs. In one of the blog , i often need to create links to several dictionaries. For example, if the cursor is on the Chinese char 魂, then pressing a button i want it to become this:

<div class="cdict">
<a href="http://en.wiktionary.org/wiki/魂">Wiktionary 魂</a> ◇
<a href="http://translate.google.com/#zh-CN|en|魂">Google 魂</a> ◇
<a href="http://www.chineseetymology.org/CharacterEtymology.aspx?submitButton1=Etymology&amp;characterInput=魂">history 魂</a>

Which appears in browser like this:


I've already wrote ~20 elisp functions like this the past years. (See: Emacs Lisp Power! Transform Text Under Cursor.) So, today's problem is simply a matter of 5 minute job of copy-pasting.

Here's the code for today's case:

(defun chinese-linkify ()
  "Make the current Chinese character into several Chinese dictionary links.
If there's a text selection, use that for input."
  (let (ξchar p1 p2 templateStr resultStr)

    (if (region-active-p)
          (setq p1 (region-beginning) )
          (setq p2 (region-end) )
        (setq p1 (point) )
        (setq p2 (1+ (point)) ) ) )

    (setq ξchar (buffer-substring-no-properties p1 p2))

    (setq templateStr
          "<div class=\"cdict\">
<a href=\"http://en.wiktionary.org/wiki/獵\">Wiktionary 獵</a> ◇
<a href=\"http://translate.google.com/#zh-CN|en|獵\">Google 獵</a> ◇
<a href=\"http://www.chineseetymology.org/CharacterEtymology.aspx?submitButton1=Etymology&amp;characterInput=獵\">history 獵</a>

    (setq resultStr (replace-regexp-in-string "獵" ξchar templateStr))

    (delete-region p1 p2)
    (insert resultStr) ))

The code is pretty simple. If you are not familiar, see: Emacs Lisp Idioms (for writing interactive commands).

This is truely a time saver. Before, i had to go to the website, type in the Chinese char to search, copy url, back to emacs then paste, then call a emacs function to make the link. Do this for 3 other sites. Now, i just presse one button in emacs.

The added advantage is that they'll have consistent form. So if later on i want to add one more dictionary to my dictionary list, or use a different HTML tag/format, i can use a truely regular regular expression in find & replace to do it site-wide.

Note: technically, by spec, you should not have Chinese chars in a URL. They should be URL Encoded (See: URL Percent Encoding and Unicode) by bytes from the char's UTF-8 encoding. e.g. For example, the char 魂's UTF-8 encoding is 3 bytes of the following hexadecimal: E9 AD 82. So, this url:


should be:


However, i think the situation of percent encoding is a abomination. (See: Problems of Symbol Congestion in Computer Languages (ASCII Jam; Unicode; Fortress).) I decided to not botch my Chinese chars in URL. Today's browsers will automatically do the encoding for you when user clicks on it.

The weird ξ you see in my elisp code is Greek x. I use unicode char in variable name for experimental purposes. You can just ignore it. (See: Programing Style: Variable Naming: English Words Considered Harmful.)

Most of the time my work in emacs is about HTML. So, vast majority of my elisp examples are processing HTML. Though, i think not that many people deal with raw HTML these days. I think most of you who use emacs do C, C++, PHP, Python, etc in your day jobs. I've been thinking of writing some elisp tutorial for some needs of working with these langs, but since i don't work with them that much these days, its hard to come up with some examples. What are some applications of interactive elisp code that might work for you? I'd love to hear it.


A Interloper in alt.usage.english Theater

ok. There's fancyman, old hats, interloper, Nero fiddling, waffly, miasma, and the Gippers.

do you know their meanings, allusions, connotations, etymology? For my own good, i produced:

A Interloper in alt.usage.english Theater

Particularly interesting is the etymology of waffle. There doesn't seem to be a sure answer. Also, miasma is a good one to dig up. Nero fiddling to history. And the Gipper for American slang.


How to Download All Your Emails in Gmail?

Perm url with updates: http://xahlee.org/mswin/download_gmail_to_disk.html

How to Download All Your Emails in Gmail?

Xah Lee, 2011-07-25

For those who fear of Google killing your Gmail account, do this:

• Go to your Gmail. Click the options (gears icon) on upper right. Choose the 〖Forwarding and POP/IMAG〗 link at top.

• In the 〖IMAP Access〗 section, choose 〖Enable IMAP〗.

• Now, download Mozilla Thunderbird, at http://www.mozilla.org/en-US/thunderbird/. Install it. Start it.

• Type in your name, Gmail, and Google password. e.g.

name: xah lee
email: xahlee@gmail.com
password: your Gmail password

• Now, Thunderbird will start to download all your emails and store on your local computer. Starting with your inbox.

• In Thunderbird, click the 〖All Mail〗 folder on the left pane, then Thunderbird will start to download all emails in that folder too. Do this if you have other folders.

• Now, you can read or write email either from a web browser or Thunderbird. When you delete a email from either, it'll be deleted in the other too. (i.e. they are auto sync'd instantaneously.)

If Google kills your Gmail account, your email will still be available on your harddrive.

Still Paranoid?

Now, technically there's still a remote chance that Google can wipe the emails on your harddrive, if they really want to be nasty. (let's suppose you've got a high-powered employee there who hates you.)

Remember, that your Thunderbird is synced with Gmail? So, technically speaking, Google can delete all your emails on their server, and disable you from login to Gmail from web, but they let you login to your Gmail account still from a email client. So, when you woke up and start Thunderbird, it start to sync, and since all your emails on server is deleted, it'll sync and delete all your email on your harddrive too. By the time you realized what's going on, it might be too late.

If you are really that paranoid, you can prevent Google from deleting your Gmail in a absolute way.

Instead of enabling the IMAP in one of the step above, enable the POP instead.

POP basically means that the email will download just one way. That is, from Google's email server to your machine. It also means your Gmail and the one in Thunderbird won't be synced. You essentially have your own local copy.

— love n share n care, —Xah

Robot Dance & B-boying Dance (Breakdancing) Videos

Perm url with updates: http://xahlee.org/vofli_bolci/b-boying_breakdance.html

Robot Dance & B-boying Dance (Breakdancing) Videos

Xah Lee, 2011-07-25

Various nice dance videos.

Robot Dance

Robot (dance)

“LOUDER|DUBSTEP”. DUBSTEP, The dancers are (in order) {Marquese “nonstop” Scott, Julius “iglide” Chisolm, Cyrus “glitch” Spencer}. Videography by Jason Locklear. Song name “DJ Fresh - Louder (Doctor P & Flux Pavilion Remix)”

above, the dance starts at 00:50.


B-boying Dance

The acrobatics here is amazing.

“Extreme Crew B-Boying”

Wikipedia has extensive detail. B-boying.

Emacs: Jump to Previous Marked Position

Emacs: Jump to Previous Marked Position

Often, you need to go to a previous position in a buffer. For example, you are editing a big source code that's few hundred lines. You want to look at some function's definition, but then you want to return to the current position.

Emacs has a buffer mark ring and global mark ring that records mark positions and lets you jump to it.

  • Press 【Ctrl+Space Ctrl+Space】 to push current position into mark ring.
  • Press 【Ctrl+u Ctrl+Space】 to go back to previous position in current buffer.
  • Press 【Ctrl+x Ctrl+Space】 (“pop-global-mark”) to go back previous position that may not be the current buffer.


Abomination of Simplified Chinese Characters 灾=災

Abomination of Simplified Chinese Characters

This is a abomination of simplified chinese characters: 兽=獸 獸 etymology.

According to chineseetymology.org, that's one-off simplifications. (i.e. idiosyncratic, just for that particular char.)


The following two chars that i also had a hard time understanding, but their simplification are based on phonetics.

See also: 简体/繁體 字表 (Simplified/Traditional Chinese Characters List)

Thanks to Lew Perin (babelcarp).

Why Does HTML5 Kill the ‹big› Tag?

Why Does HTML5 Kill the ‹big› Tag?

Am quite annoyed that HTML5 axed the <big> tag, yet it retained the <small> tag. How idiotic.

Instead of using “big”, i supposed that you are supposed to use “em” or “strong” or “b”. But that doesn't cut it. “big” carries a sematic that's not one of emphasis, strong (e.g. warning), or bold. If you are going to argue that html should be used for semantic markup, then html5 shouldn't have “b”, “i”, “u”.

I need to use big for several reasons. One is for some unicode symbols and chinese chars. Ι want the char to be larger so user can see the detail of such complex glyph. For example:

鬍 ☫ ☬ ⚜ ☸ ☠ ❇ ❈ ❄ ❅ ❆ ✿ ❀ ❁ ✾

(more example at: Dingbats and Cultural Symbols in Unicode简体/繁體 字表 (Simplified/Traditional Chinese Characters List))

Another reason is that sometimes i want to make a title larger, but semantically i don't want it to be a header such as {h1, h2, h3}. For example, in a index page that lists articles, i want to make some link larger, to signify their relative importance or quality. For example, look at this page: Computing & its People.

Sure one can resort to css e.g. <span style="font-size:3cm">☠</span>, but “big” would carry with it a precise and semantic info. After all, “small” and “b” are there too.

When you get deep into a technology, often you find many odd shit, whose reasoning are buried in obscurity, or simply idiotic. Does anyone know why HTML5 axes “big”?

See also: HTML5 Tags.