2011-02-12

emacs math symbol input and misc news

some emacs news on my site.

New version: Emacs Math Symbols Input Mode (xmsi-mode)

I'm creating a Ask Emacs page. There, you can ask all sort of questions about emacs or elisp.

major corps attack on wikileaks; Jane Hamsher

Perm url with updates: http://xahlee.org/Periodic_dosage_dir/Wikileaks_Bank_of_America_Anonymous_espionage.html

Wikileaks, Bank of America, Anonymous, US Government, Espionage

Xah Lee, 2011-02-12, 2011-02-16

Discovered this political video, via wikileaks's tweet:

EXPOSED: Attack on Wikileaks

“EXPOSED: Attack on Wikileaks”

I was wondering, who is the blonde woman in the video. Of “firedoglake.com”?? What kinda name is that? How she seems to know so much. Can we trust her?

That blonde woman in the video, is Jane Hamsher. She was born in 1959, so, aged 51. Ten years older than me. Amazing. She's very cute. I thought she's 30ish. She must have had a very healthy life style. You know how some people just have this youngish feel? It's all over the way they talk, their smiles. Camille Paglia is also like that. (See: Camille Paglia: Modern Sex Goddess.)

Here's some quote about her frow Wikipedia:

Jane Hamsher (born July 25, 1959) is an American film producer, author, and blogger best known as the author of Killer Instinct, a memoir about co-producing the 1994 movie Natural Born Killers with Don Murphy and others, and as the founder and publisher of the politically progressive blog FireDogLake (2004 – the present). With Murphy, she also co-produced the subsequent films Apt Pupil (1998), Permanent Midnight (1998), and From Hell (2001). A contributor to The Huffington Post, she posts also in other liberal Websites and political magazines, such as AlterNet and The American Prospect.

Funny, she is the co-producer of the movie Natural Born Killers. (1994) amazon That film is a satirical critique of the media. I loved it. (it's not for the faint of heart)

Wikileaks, Bank of America, Anonymous, US Government, Espionage

Now, if you are wondering what they are talking about in the video... it's a very complex political thing. The following article gives a summary.

More facts emerge about the leaked smear campaigns (2011-02-15) By Glenn Greenwald. @ Source www.salon.com

Read the first 6 paragraphs, if you are so inclined.

It took me about 4 hours to read this and ~10 other related articles. If you don't follow politics, then a basic summary is that Bank Of America, and US Government, are taking illegal actions to smear Wikileak and its supporters, including hiring “security firms” (i.e. hackers), with things such as submitting false documents, creating false identities on social media, data mining social network, etc.

Anyway, if you are a programing geek, there's a quite a bit of interesting thing going on in this incident. The guy who started this all, is Aaron Barr. He's a top executive of security firm HB Gary Federal. He tried to find out the identities of the Anonymous (group), thru hacking activities. (remember, that the Anonymous have made major news in past few years, including fighting with Scientology.) So, Anonymous fought back. They hacked into HB Gary Federal site, defaced its front page, and got access to some 50k emails, and put it on the web. Now, here's the important part. These 50k emails, turns out, contains critical info, among them is that Bank Of America, and few other companies, including US Government, are plotting against Wikileak using questionable means. Several political news sites dug into the emails, and uncovered various things. (e.g. N Y Times, Forbes, The Huffington Post, Reason, Salon, all are reporting it, as it unfolds.) (many of the involved parties are making public responses. e.g. Bank Of American denies it, some other major security firms severed relationship with HB Gary Federal, etc.)

As a programing geek, the interesting part is the fight between Aaron Barr and the Anonymous Group. It's quite amazing how the Anonymous is able to break into the site, get the emails, rather quickly. The fight involves lots of things, data mining facebook, twitter, fake accounts, etc. This stuff is better than a espionage thriller. A thorough account is in the following article, from arstechnica.com:

How one man tracked down Anonymous—and paid a heavy price (2011-02-10) By Nate Anderson. @ Source arstechnica.com

Following is a screenshot of the hacked website left by Anonymous.

internetsanon

Screenshot of hacked site. The date is around 2011-02-05. The site's domain seems to be hbgaryfederal.com.

2011-02-10

Net neutrality??

Discovered a book, by Tim Wu. Quote:

Tim Wu (traditional Chinese: 吳修銘) is a professor at Columbia Law School, the chair of media reform group Free Press, and a writer for Slate Magazine.[1] He is best known for coining the phrase network neutrality in his paper Network Neutrality, Broadband Discrimination,and popularizing the concept thereafter, leading in part to the 2010 passage of a federal Net Neutrality rule.[2][3][4] Wu has also made significant contributions to wireless communications policy, most notably with his "Carterfone" proposal.[5]

Wu's academic specialties are copyright and telecommunications policy. For his work in this area, Professor Wu was named one of Scientific American's 50 people of the year in 2006. In 2007 Wu was named one of Harvard University's 100 most influential graduates by 02138 magazine.[1] His book The Master Switch was named among the best books of 2010 by the New Yorker Magazine,[6] Fortune Magazine,[7] Publisher's Weekly,[8] and other publications.

On February 8, 2011 Columbia Law School announced that Professor Wu "[had] been named senior advisor to the Federal Trade Commission (FTC) for consumer protection and competition issues that affect the Internet and mobile markets."[9] He is scheduled to begin his new position, on February 14, at the FTC's Office of Policy Planning.[10] Professor Wu will take a leave of absence from Columbia.[11]

Wu's 2010 book The Master Switch: The Rise and Fall of Information Empires described a long "cycle" whereby open information systems becoming consolidated and closed over time, reopening only after disruptive innovation. The book was named one of the best books of 2010 by the New Yorker Magazine,[6] Fortune Magazine,[7] Amazon.com,[26] the Washington Post,[27] Publisher's Weekly,[8] and others.

The Master Switch: The Rise and Fall of Information Empires (2010) By Tim Wu. amazon

See also:

2011-02-09

One Language to Rule Them All?

Perm url with updates: http://xahlee.org/comp/what_lang_to_use_for_find_replace.html

One Language to Rule Them All?

Xah Lee, 2011-02-08

This is my personal account of a struggle in choosing languages and trying to maintain some concept of efficiency of using one single system instead of mishmash of components. This essay is originally a post from comp.lang.lisp Source groups.google.com.

Prolog: How to Write grep in Emacs Lisp.

Or, What Language to Use for Find Replace?

... never really got into bash for shell scripting... sometimes tried but the ratio of power/syntax isn't tolerable. Knowing perl well pretty much killed any possible incentive left.

... in late 1990s, my thoughts was that i'll just learn perl well and never need to learn other lang or shell for any text processing and sys admin tasks for personal use. The thinking is that it'd be efficient in the sense of not having to waste time learning multiple langs for doing the same thing. (not counting job requirement in a company) So i have written a lot perl scripts for find & replace and file management stuff and tried to make them as general as possible. lol. But what turns out is that, over the years, for one reason or another, i just learned python, php, then in 2007 elisp. Maybe the love for languages inevitably won over my one-language efficiency obsession. But also, i end up rewrote many of my text processing script in each lang. I guess part of it is exercise when learning a new lang.

... anyway, i guess am random babbling, but one thing i learned is that for misc text processing scripts, the idea of writing a generic flexible powerful one once for all just doesn't work, because the coverage are too wide and tasks that needs to be done at one time are too specific. (and i think this makes sense, because the idea of one language or one generic script for all stem from ideology, not from real world practice. If we look at the real world, it's almost always a disparate mess of components and systems.)

my text processing scripts ends up being a mess. There are several versions in different langs. A few are general, but most are basically used once or in a particular year only. (many are branched off from a generic one but customized into specific needs that are used and thrown away). When i need to do some particular task, i found it easier just to write a new one in whatever lang that's currently in my brain memory than trying to spend time fishing out and revisit old scripts.

some concrete example...

e.g. i wrote this general script in 2000, intended to be one-stop for all find/replace needs. See: Perl: Find & Replace on Multiple Files.

in 2005, while i was learning python, i wrote (several) versions in python. e.g. Python: Find & Replace Strings in Unicode Files.

it's not a port of the perl code. The python version doesn't have much features as the perl. But for some reason, i have stopped using the perl version. Didn't need all that perl version features for some reason, and when i do need them, i have several other scripts that address a particular need. (e.g. one for searching unicode encoded files, one for change Windows/unix line ending, one for converting file encoding, one for multiple pairs find/replace in one shot, one for regex one for plain text, one for find only one for find+replace, several for find/replace only if particular condition is met (e.g. if the file contains a particular string, or the search string is inside a particular tag), etc.)

then in 2006, i fell into the emacs lisp hole. In the process, i realized that elisp for text processing is more powerful than perl or python. Not due to lisp the lang, but more due to emacs the text-editing environment and system. I tried to explain this in few places but mostly here: Text Processing: Emacs Lisp vs Perl.

so, all my new scripts for text processing are in elisp. A few of my perl and python scripts i still use, but almost everything is now in elisp.

also, sometimes in 2008, i grew a shell script that process weblog using the bunch of unix bag cat grep awk sort uniq. It's about 100 lines. You can see it here: weblog_process.sh.

at one time i wondered, why do i have this 100 lines shell script? Where did my idea go that perl should replace all shell scripts? I gave it a little thought, and i think the conclusion is that for this task, the shell script is actually more efficient and simpler to write. Possibly if i started with perl for this task and i might end up with a good structured code and not necessarily less efficient... but you know things in life isn't all planned. It began when i just need a few lines of grep to see something in my web log. Then, over the years, added another line, another line, then another, all need-based. If in any of those time i thought “let's scratch this and restart with perl” — that'd be wasting time. Besides that, i have some doubt that perl would do a better job for this. With shell tools, each line just do one simple thing with piping. To do it in perl, one'd have to read-in the huge log file then maintain some data structure and try to parse it... too much memory and thinking would involved. If i code perl by emulating the shell code line-by-line, then it makes no sense to do it in perl, since it's just shell bag in perl.

Also note, this shell script can't be replaced by elisp, because elisp is not suitable when the file size is large.

well, that's my story — extempore! ☺

王菲 - 彼岸花 (flower of paradise)

Perm url with updates: http://xahlee.org/Periodic_dosage_dir/sanga_pemci/flower_of_paradise.html

王菲 - 彼岸花 (flower of paradise)

Xah Lee, 2011-02-08

Discovered a exceedingly beautiful song, by Faye Wong. Listened to it continuously for 10 hours yesterday.

王菲 - 彼岸花

Title: 彼岸花
Date: 2000
Singer: 王菲
Lyrics: 林夕
Music: 王菲,  張亞東
看見的 熄滅了 消失的 記住了
我站在 海角天涯 聽見 土壤萌芽
等待 曇花再開 把芬芳 留給年華
彼岸 沒有燈塔 我依然 張望著
天黑 刷白了頭髮 緊握著 我火把
他來 我對自己說 我不害怕 我很愛他

Here's a excellent translation by Kevin Wei (Aezura).

What was seen, went out like a flame
What has disappeared, will be remembered

I stand, at the corner of the sea, the edge of the sky
I hear, sprouting from the ground

Waiting, for the Queen of the Night to bloom
Saving such fragrance, for the best to come

On the opposite shore, there is no light
But I still yearn

Night, bleached my hair
I grasp the torch tightly

He comes, I say to myself
I am not afraid I love him dearly

Note: He translates 曇花 as “queen of the night”. It's a excellent translation. “Queen of the Night” refers to various night blooming cactus. See: Nightblooming cereus. The english name for 曇花 is Epiphyllum, aka orchid cacti, one of the night blooming cactus. It bears large, strongly fragrant flowers, that bloom for a single night only.

What a beautiful poem. The title of the song is 彼岸花. Literally, it means “flower of the other shore”. What flower is that? Quote from Wikipedia 石蒜:

传说: 曼珠沙华,血红色的彼岸花。一般认为是生长在三途河边的接引之花。花香传说有魔力,能唤起死者生前的记忆。一般来讲,只有血的颜色才衬得起曼珠沙华,因此,我们一般所说的曼珠沙华指的是红花石蒜。

石蒜(学名:Lycoris radiata),又名红花石蒜、龙爪花、山乌毒,俗称蟑螂花、老鸦蒜,雅名曼珠沙华(源於梵語Mañjusaka)、彼岸花、莉可莉絲等,多年生草本植物,原产中国长江流域。目前广泛分布于东亚各地。种加词radiata意为“放射状的”。

佛经中的曼珠沙华

佛说此经已,结跏趺坐,入于无量义处三昧,身心不动。是时乱坠天花,有四花,天雨曼陀罗华,摩诃曼陀罗华,曼珠沙华,摩诃曼殊沙华。而散佛上及诸大众。 ——《法华经·卷一》   又云:

——云何曼陀罗华?
——白圆华,同如风茄花。
——云何曼殊沙华?
——赤团华。 ——《妙法莲华经决疑》

曼珠沙华、曼陀罗华,是佛经中描绘的天界之花。曼殊沙华、摩诃曼殊沙华、曼陀罗华、摩诃曼陀罗华、芬陀利华、摩诃芬陀利华等等这些称谓源于梵文佛经,曾于《大乘妙法莲华经》中记载过。摩诃的意思是大,大乘梵语发音即为摩诃衍那,至于衍那就是乘载的意思,华在古汉语中即是花之意。这些词语出现在古梵文佛经中,意指地上之花。

red spider lily flower lose-up

彼岸花 (red spider lily) Source upload.wikimedia.org

Here's some selected quotes from english Wikipedia Lycoris radiata:

Lycoris radiata (red spider lily) is a plant in the amaryllis family, Amaryllidaceae.

The red spider lily has many aliases; they are Spider Lily, Naked Lily, and Red Spider Lily. The scientific name of a red spider lily is Lycoris radiata. It can not reproduce sexually. The plant has both sexes and when they produce they just randomly assort their chromosomes so that there is no order to the assignment to each plant. Since there is no order in the chromosomes assortment that means the plant is sterile. The only way for the plant to reproduce is by bulb division.[1]

Cannot reproduce sexually! How fitting.

The red spider lily should be planted in a full-sun environment. It will bloom in late summer and be around 60–70 cm tall.[2] The leaves remain green all winter long and at the first hint of warm weather the following spring, they will die.[3] The Lycoris radiata is a perennial plant. The red spider lily is a monocot.[4] The best type of soil for this plant is sandy and some clay.[5]

This explains the word 沙 in 沙华 in its chinese name.

The Red Spider Lily does not like heat. It prefers a warm environment. When the summer heat becomes unbearable for the Spider lily it becomes dormant. It will then return when the weather has become cooler.

The bulbs of Lycoris radiata are very poisonous. These are mostly used in Japan, and they are used to surround their paddies and houses to keep the pest and mice away. That is why most of them grow close to rivers now.[1] In Japan the Red Spider Lily signals the arrival of fall. Many Buddhist will use it to celebrate the arrival of fall with a ceremony at the tomb of one of their ancestors. They plant them on graves because it shows a tribute to the dead. People believe that since the Red Spider Lily is mostly associated with death that one should never give a bouquet of these flowers.[3]

Since these scarlet flowers usually bloom near cemeteries around the autumnal equinox, they are described in Chinese and Japanese translations of the Lotus Sutra as ominous flowers that grow in Diyu (also known as Hell), or Huángquán (黄泉), and guide the dead into the next reincarnation.

Note, according to the Chinese Wikipedia, the association with death is by Japanese only.

When the flowers of lycoris bloom, their leaves would have fallen; when their leaves grow, the flowers would have wilted. This habit gave rise to various legends. A famous one is the legend of two elves: Mañju (曼珠), who guarded the flower, and Saka (沙華), who guarded the leaves. Out of curiosity, they defied their fate of guarding the herb alone, and managed to meet each other. At first sight, they fell in love with each other. Amaterasu, exasperated by their waywardness, separated the miserable couple, and laid a curse on them as a punishment: the flowers of Mañju shall never meet the leaves of Saka again. It was said that when the couple met after death in Diyu, they vowed to meet each other after reincarnation. However, neither of them could keep their words. In commemoration of the couple, some call the herbs 'Mañjusaka' (曼珠沙華), a mixture of 'Mañju' and 'Saka', instead of their scientific name. The same name is used in Japanese, in which it is pronounced manju-shage.

Some other legends have it that when you see someone that you may never meet again, these flowers, also called red spider lilies, would bloom along the path. Perhaps because of these sorrowful legends, Japanese people often used these flowers in funerals. The popular Japanese name Higanbana (彼岸花 Higan bana?) for lycoris radiata literally means higan (the other or that shore of sanzu river) flower, decorate and enjoyable, flower of afterlife in gokuraku jyōdo (極楽浄土 gokuraku jyōdo?).

In english, the title of the song is “flower of paradise”. Literally, it should be “the flower of the other shore”. Thematically, it should be “flower of the other side” or “flower of hell”. But “hell” gives the wrong impression. The “hell” here is buddism's mythology of hell. It shares with Christian's hell of after-life, judgement, but doesn't have a primary sense of “evil”. It's not associated “devil”, nor bearing a strong opposite of “paradise”. The Buddism's hell here is more in sync with Greek mythology's Underworld. Note that Buddism originates from India, which is closer to Greece. I wonder to what degree the mythologies are connected.

red spider lily

彼岸花 (red spider lily) Source en.wikipedia.org

Here's some user compiled info from Source zhidao.baidu.com

传说中的引魂之花,冥界唯一的花——彼岸花(绝美) 曼珠沙华,又称彼岸花。 一般认为是生长在三途河边的接引之花。 花香传说有魔力,能唤起死者生前的记忆。

春分前后三天叫春彼岸, 秋分前后三天叫秋彼岸。 是上坟的日子。彼岸花开在秋彼岸期间, 非常准时,所以才叫彼岸花。

彼岸花,花开开彼岸,花开时看不到叶子, 有叶子时看不到花,花叶两不相见,生生相错。 相传此花只开于黄泉,是黄泉路上唯一的风景。

彼岸花是开在黄泉之路的花朵, 在那儿大批大批的开着这花, 远远看上去就像是血所铺成的地毯, 又因其红的似火而被喻为”火照之路” 也是这长长黄泉路上唯一的风景与色彩. 人就踏着这花的指引通向幽冥之狱。

彼岸花,又名曼珠沙华,又称为 Red Spider Lily。 它生长的地方大多在田间小道,河边步道和墓地,所以别名也叫做死人花。 一到秋天,就绽放出妖异浓艳得近于红黑色的花朵,整片的彼岸花看上去 便是触目惊心的赤红,如火,如血,如荼。

彼岸花属于石蒜科(Lycoris Herb),属名是希腊神话中女海神的名字。 因为石蒜类的特性是先抽出花葶(总梗)开花,花末期或花谢后出叶; 还有另一些种类是先抽叶,在叶枯以后抽葶开花,所以才有“彼岸花,开彼岸, 只见花,不见叶”的说法。

曼珠沙华的美,是妖异、灾难、死亡与分离的不祥之美。或者是因为它深艳 鲜红的色泽让人联想到血,也或者是因为它的鳞茎含有剧毒,在一般的文学作品中, 它的形象通常是与“疯狂、血腥”之类的概念相联系起来的。在炎之蜃气楼的邂逅篇 《真皓き残响》中,桑原水菜笔下写到景虎自杀的瞬间,看到喷出的鲜血如同盛放 成群的彼岸花。

彼岸花开
花开彼岸时
只一团火红
花开无叶
叶生无花
相念相惜却不得相见
独自彼岸路

那一夜
梦中相会
你是白色无根莲
我是红色彼岸花
你苍白如雪
我妖红似血

你落落于天山镜池水沄沄
我寞寞在幽冥黄泉路漫漫
那一刻
爱上你
命里劫数
无路可逃
无所可逃

我会一直等
三千日斗转星移
你终于老去
我依旧沦陷

你来到渡口
前方暗河黑水潺湲
投以我浅浅一笑
孟婆汤碗已空

你踏上奈何桥
心静如水
心沉如石
我合上乱花枝
心痛破碎
心死无望

我脉脉花香的缠绵
抵不过苦涩寡汤的忘却
我还活着
没有灵魂只有肉体
却坚持爱你

2011-02-08

Eurythmics - I Need A Man

Perm url with updates: http://xahlee.org/Periodic_dosage_dir/sanga_pemci/i_need_a_man.html

Eurythmics - I Need A Man

Xah Lee, 2011-02-08

I Need a Man , a great song by Eurythmics.

“Eurythmics - I Need A Man”

Title: I Need A Man
Date: 1987
Singer: Annie Lennox (Eurythmics)
Lyrics: Annie Lennox, David A Stewart
Music: Annie Lennox, David A Stewart
 Hey, is this my turn?
 You want me to sing now?
 Okay

I don't care if you won't talk to me
You know I'm not that kind of girl
And I don't care if you won't walk with me
It don't give me such a thrill

And I don't care about the way you look
You should know I'm not impressed
'Cause there's just one thing that I'm looking for
And he don't wear a dress

I need a man ...

Baby, baby, baby don't you shave your legs
Don't you double comb your hair
Don't powder puff just leave it rough
I like your fingers bare

When the night comes down
I can turn it round, I can take you anywhere
I don't need love forget that stuff
You know that I don't care

I need a man ...

I don't need a heartbreaker, fifty-faced trouble maker
Two timing time taker, dirty little money maker
Muscle bound cheap skate, low down woman hater
Triple crossing, double dater, yella bellied alligator

...

I need a man; leave me alone
I need a man; don't take me home
I need a man; baby you are just all the blonde

Hey boy c'mon
I'll take you anytime
Baby, baby, baby

What a great song. Y'know? The modern society, we have the homosexual, Metrosexual, Drag queen, emo men, in part brewed by Gender Feminism, where, a man cannot be man anymore.

The music video is directed by Sophie Muller. Great video.

Annie Lennox is pretty ugly, doesn't have beautiful voice, but i really love her. She's got the talent, intelligence, independence, spunk, unfettered by worldly views, and i love her songs.

For females, if you don't have the looks or beautiful voice, you don't have much to sell. But when you buy Eurythmics's songs, you are buying the work of their talent.

“I Need Man” amazon

This song, in its callous style, reminds me of Lady Gaga's Bad Romance.

2011-02-07

How to Write grep in Emacs Lisp

Perm url with updates: http://xahlee.org/emacs/elisp_grep_script.html

How to Write grep in Emacs Lisp

Xah Lee, 2011-02-07

This page shows a real-world example of a emacs lisp script that search files, similar to unix grep. If you don't know elisp, first take a look at Emacs Lisp Basics.

The Problem

Summary

I want to write a elisp script that reports files in a dir that contain a string n times. The script is expected to search thru 5 thousand files.

Detail

Why can't i just use grep? Because:

• Often, my search string is long, containing 300 hundred chars or more. (e.g. a snippet of HTML that contains javascript and span multi-lines.) You could put your search string in a file with grep, but it is not convenient. Here's a example of a string i need to search:

<div class="chtk"><script type="text/javascript">ch_client="polyglut";ch_width=550;ch_height=90;ch_type="mpu";ch_sid="Chitika Default";ch_backfill=1;ch_color_site_link="#00C";ch_color_title="#00C";ch_color_border="#FFF";ch_color_text="#000";ch_color_bg="#FFF";</script><script src="http://scripts.chitika.net/eminimalls/amm.js" type="text/javascript"></script></div>

• Unix grep is not very robust with unicode. Especially so if you are calling it inside emacs on Windows, because it has to go thru 2 layers of interface: ① the ported unix grep program. ② the Windows OS. In the process, the char encoding in the stream can be messed up. My search string usually has unicode chars. (e.g. Sample Unicode Characters.) For example, grep fails when searching for “│” (UTF+2502). This is calling cygwin grep from emacs on Windows. It's too complex to figure out exactly why it fails.

• grep isn't robust with various encoding. You have to deal with “locale” and it's a headache. With emacs, i don't have to think about file encoding at all. The emacs environment automatically detect file's encoding.

• grep can't really deal with directories recursively. (there's -r, but then you can't specify file pattern (e.g. *\.html) (it is possible, with shell file globs or “find ... -exec, xargs”, but i find it quite frustrating to trial error man page loop with unix tools.))

• Sometimes you need to work on a list of files, sometimes by a pattern (e.g. *\.html), sometimes you want to exclude some files by list or by pattern, sometimes a combination of the above in a specific order. Some unix tools provide these features, sometimes by combination of tools (e.g. find/xargs), but their order and syntax is complex and tool specific. With a script in perl, python, elisp, it's much easier to control.

• There are too many versions and varieties grep. The primary 2 are BSD vs GNU. Mac OS X by default mostly has bsd versions, but some are GNU versions. This makes it very painful. Linuxes typically has all GNU versions. The different versions accept different options. Also, GNU grep for example, support a varieties of regex (“--basic-regexp”, “--extended-regexp”, “--perl-regexp”.) It's too painful to figure them out and remember them.

• unix grep and associated tool (sort, wc, uniq, pipe, sed, awk, …) is not flexible. When your need is slightly more complex, unix shell tools can't handle it. For example, suppose you need to find a string in HTML file only if the string happens inside another tag. (extending the limit of unix tools is how Perl was born in 1987.)

When writing a script in perl or python, you can always write it so the script works as a command line script that takes options like unix command line tools. Or, you can leave the script without a command line interface. When you need to run the script, you open it with a editor, modify the parameters, save, then run it.

Ι always prefer the latter. Because, that way i can edit the options much more comfortably, in a editor with full view instead of the command line. I can also view whatever doc the script has in the header, instead of doing some confusing “-help” or “-h”, “--help” or “man ...” in the command line. And with emacs, i can run the script by a press of a key, and much other conveniences. Basically, a command line is nice if you are using other's code because it's a blackbox with a (somewhat) standardize command line interface. But for my custom text processing needs, i find that if i'm writing my own, i prefer not to add command line interface, but use it together with emacs.

So, with my own script for grep (may it be elisp or perl or python), i can make the script do exactly what i need and works everywhere with emacs.

(See: Python: Find & ReplacePerl: Find & Replace.)

Solution

The solution is quite simple actually. Here's a script i've been using close to a year. I use it almost everyday, on 5 thousand files.

Typically, i press one button to open the script. Edit the parameters i want to search. (the input dir, file extension filter, search string, plain text or regex, number of occurance, etc.) Then, save the script. Press another button to run it.

;; -*- coding: utf-8 -*-
;; 2010-03-27
;; print file names of files that have n occurrences of a string, of a given dir

;; input dir
(setq inputDir "~/web/xahlee_org/" )

;; add a ending slash if not there
;; in elisp, dir path should end with a slash
(when (not (string= "/" (substring inputDir -1) ))
  (setq inputDir (concat inputDir "/") )
  )

(defun my-process-file (fpath)
  "process the file at fullpath fpath ..."
  (let (mybuffer p1 p2 (ii 0) searchStr)

    (when t
      ;; (and (not (string-match "/xx" fpath)) ) ; exclude some dir

      ;; create a temp buffer. Work in temp buffer. Faster.
      (setq mybuffer (get-buffer-create " myTemp"))
      (set-buffer mybuffer)
      (insert-file-contents fpath nil nil nil t)

      (setq searchStr "(2) " )          ; search string here

      (goto-char 1)
      (while (search-forward searchStr nil t)
        (setq ii (1+ ii))
        )

      ;; report if the occurance is not n times
      (if (not (= ii 0))
          (princ (format "this many: %d %s\n" ii fpath))
        )

      (kill-buffer mybuffer)
      )
    ))

;; traverse the dir

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah occur output*" )
  (with-output-to-temp-buffer outputBuffer 
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (princ "Done deal!")
    )
  )

The code is pretty simple. At the bottom, the code visits every file in a dir. For each file, it calls (my-process-file fpath). The “my-process-file” creates a temp buffer, paste the file content in it, then do search inside the temp buffer. We do this because it's faster. (with temp buffer, emacs doesn't do font-locking (which is rather resource intensive), and no “undo”, or any other thing emacs normally do when opening a file for interactive edit.)

To run the file, you can call “eval-buffer” or “load-file”. (i have “eval-buffer” aliased to just “eb”. ((defalias 'eb 'eval-buffer)) Actually, i just press a button to run the current file. See: Emacs Lisp: a Command to Execute/Compile Current File.)

The elisp idioms used in this script have been explained a few times in different places in this site. If you are not familiar, please review at: Text Processing with Emacs Lisp Batch Style.

On 5k files, the script takes 30 seconds on my machine.

Emacs is fantastic!

scientist confirm the possibility of seeing the future

Recently, in the academic psychology community, there's hot discussion about a psychologist (Daryl Bem) who made experiments that seem to confirm the possibility of some parapsychology phenomenon. e.g. seeing the future, knowing the past, reading other's minds, etc. I'd say, don't pay attention to it. But if you do have a interest in this matter, i recommend reading this article:

Back from the Future: Parapsychology and the Bem Affair (2011-01-06) By James Alcock. @ Source www.csicop.org

2011-02-06

Google YouTube fixes invalid embed code

2 weeks ago i reported that the embed video code handed out by YouTube contains a invalid attribute type="text/html". Ι wrote about it here: HTML Validation, Google, Amazon, and also asked about at stackoverflow.com, also posted the question to YouTube forum at Source www.google.com.

Amazingly, Google fixed it! Now the embed code no longer contains type="text/html". Yay!

Lady Gaga - Bad Romance & 王彩樺 - 鋩鋩角角

“Lady Gaga - Bad Romance” amazon
王彩樺 鋩鋩角角 熱舞版

for more info, lyrics, see http://xahlee.org/music/bad_romance.html

unicode support in langs

major update: Unicode Support in Ruby, Perl, Python, javascript, Java, Emacs Lisp, Mathematica