2011-07-09

Yaksha (夜叉) in “Journey to the West” (西游记)

Perm url with updates: http://xahlee.org/lit/yaksha_Journey_West.html

Yaksha (夜叉) in “Journey to the West” (西游记)

Xah Lee, 2011-07-09

Yaksha 夜叉 appeared in 西游记 (Journey To The West) many times. Here's a list of all occurrences.

西游记,第三回 (页1):

好猴王,跳至桥头,使一个闭水法,捻着诀,扑的钻入波中,分开水路,径入东洋海底。正行间,忽见一个巡海的夜叉,挡住问道:『那推水来的,是何神圣?说个明白,好通报迎接。』悟空道:『吾乃花果山天生圣人孙悟空,是你老龙王的紧邻,为何不识?』那夜叉听说,急转水晶宫传报道:『大王,外面有个花果山天生圣人孙悟空,口称是大王紧邻,将到宫也。』

西游记,第九回 (页1):

这正是路上说话,草里有人。原来这泾河水府有一个巡水的夜叉,听见了百下百着之言,急转水晶宫,慌忙报与龙王道:『祸事了!祸事了!』龙王问:『有甚祸事?』夜叉道:『臣巡水去到河边,只听得两个渔樵攀话。相别时,言语甚是利害。那渔翁说:长安城里西门街上,有个卖卦先生,算得最准。他每日送他鲤鱼一尾,他就袖传一课,教他百下百着。若依此等算准,却不将水族尽情打了?何以壮观水府,何以跃浪翻波辅助大王威力?』龙王甚怒,急提了剑就要上长安城,诛灭这卖卦的。旁边闪过龙子龙孙、虾臣蟹士、鲥军师鳜少卿鲤太宰,一齐启奏道

西游记,第十八回 (页2):

行者却弄神通,摇身一变,变得就如那女子一般,独自个坐在房里等那妖精。不多时,一阵风来,真个是走石飞砂。好风:起初时微微荡荡,向后来渺渺茫茫。微微荡荡乾坤大,渺渺茫茫无阻碍。凋花折柳胜揌麻,倒树摧林如拔菜。翻江搅海鬼神愁,裂石崩山天地怪。衔花糜鹿失来踪,摘果猿猴迷在外。七层铁塔侵佛头,八面幢幡伤宝盖。金梁玉柱起根摇,房上瓦飞如燕块。举棹梢公许愿心,开船忙把猪羊赛。当坊土地弃祠堂,四海龙王朝上拜。海边撞损夜叉船,长城刮倒半边塞。

西游记,第二十一回 (页1):

王母正去赴蟠桃,一风吹断裙腰钏。二郎迷失灌州城,哪吒难取匣中剑。天王不见手心塔,鲁班吊了金头钻。雷音宝阙倒三层,赵州石桥崩两断。一轮红日荡无光,满天星斗皆昏乱。南山鸟往北山飞,东湖水向西湖漫。雌雄拆对不相呼,子母分离难叫唤。龙王遍海找夜叉,雷公到处寻闪电。十代阎王觅判官,地府牛头追马面。这风吹倒普陀山,卷起观音经一卷。白莲花卸海边飞,欢倒菩萨十二院。盘古至今曾见风,不似这风来不善。唿喇喇,乾坤险不炸崩开,万里江山都是颤!那妖怪使出这阵狂风,就把孙大圣毫毛变的小行者刮得在那半空中,却似纺车儿一般乱转,莫想轮得棒,如何拢得身?慌得行者将毫毛一抖,收上身来,独自个举着铁棒,上前来打,又被那怪劈脸喷了一口黄风,把两只火眼金睛,刮得紧紧闭合,莫能睁开,因此难使铁棒,遂败下阵来。那妖收风回洞不题。

西游记,第二十四回 (页1):

却说那三人穿林入里,只见那呆子绷在树上,声声叫喊,痛苦难禁。行者上前笑道:『好女婿呀!这早晚还不起来谢亲,又不到师父处报喜,还在这里卖解儿耍子哩!咄!你娘呢?你老婆呢?好个绷巴吊拷的女婿呀!』那呆子见他来抢白着羞,咬着牙,忍着疼,不敢叫喊。沙僧见了老大不忍,放下行李,上前解了绳索救下。呆子对他们只是磕头礼拜,其实羞耻难当,有《西江月》为证:色乃伤身之剑,贪之必定遭殃。佳人二八好容妆,更比夜叉凶壮。只有一个原本,再无微利添囊。好将资本谨收藏,坚守休教放荡。那八戒撮土焚香,望空礼拜。

西游记,第二十八回 (页2):

小小洞门,虽到不得那阿鼻地狱;楞楞妖怪,却就是一个牛头夜叉

西游记,第三十八回 (页2):

八戒正叙话处,早有一个巡水的夜叉,开了门,看见他的模样,急抽身进去报道:『大王,祸事了!井上落一个长嘴大耳的和尚来了!赤淋淋的,衣服全无,还不死,逼法说话哩。』那井龙王忽闻此言,心中大惊道:『这是天蓬元帅来也。昨夜夜游神奉上敕旨,来取乌鸡国王魂灵去拜见唐僧,请齐天大圣降妖。

西游记,第四十一回 (页1):

好大圣,纵云离此地,顷刻到东洋,却也无心看玩海景,使个逼水法,分开波浪。正行时,见一个巡海夜叉相撞,看见是孙大圣,急回到水晶宫里,报知那老龙王。敖广即率龙子、龙孙、虾兵、蟹卒一齐出门迎接,请里面坐。

西游记,第四十三回 (页2):

『这厮却把供状先递与老孙也!』正才袖了帖子,往前再行。早有一个探海的夜叉望见行者,急抽身撞上水晶宫报大王:『齐天大圣孙爷爷来了!』那龙王敖顺即领众水族出宫迎接道:『大圣,请入小宫少座,献茶。』行者道:『我还不曾吃你的茶,你倒先吃了我的酒也!』龙王笑道:『大圣一向皈依佛门,不动荤酒,却几时请我吃酒来?』行者道:『你便不曾去吃酒,只是惹下一个吃酒的罪名了。』敖顺大惊道:『小龙为何有罪?』行者袖中取出简帖儿,递与龙王。

西游记,第五十六回 (页2):

『你贵处到我这里,程途迢递,怎么涉水登山,独自到此?』三藏道:『贫僧还有三个徒弟同来。』老者问:『高徒何在?』三藏用手指道:『那大路旁立的便是。』老者猛抬头,看见他们面貌丑陋,急回身往里就走,被三藏扯住道:『老施主,千万慈悲,告借一宿!』老者战兢兢钳口难言,摇着头,摆着手道:『不不不不象人模样!是是是几个妖精!』三藏陪笑道:『施主切休恐惧,我徒弟生得是这等相貌,不是妖精!』老者道:『爷爷呀,一个夜叉,一个马面,一个雷公!』行者闻言,厉声高叫道:『雷公是我孙子,夜叉是我重孙,马面是我玄孙哩!』那老者听见,魄散魂飞,面容失色,只要进去。三藏搀住他,同到草堂,陪笑道:『老施主,不要怕他。他都是这等粗鲁,不会说话。』

西游记,第五十六回 (页2):

那婆婆真个丢了孩儿,入里面捧出二锺茶来。茶罢,三藏却转下来,对婆婆作礼道:『贫僧是东土大唐差往西天取经的,才到贵处,拜求尊府借宿,因是我三个徒弟貌丑,老家长见了虚惊也。』婆婆道:『见貌丑的就这等虚惊,若见了老虎豺狼,却怎么好?』老者道:『妈妈呀,人面丑陋还可,只是言语一发吓人。我说他象夜叉马面雷公,他吆喝道,雷公是他孙子,夜叉是他重孙,马面是他玄孙。我听此言,故然悚惧。』唐僧道:『不是不是,象雷公的是我大徒孙悟空,象马面的是我二徒猪悟能,象夜叉的是我三徒沙悟净。他们虽是丑陋,却也秉教沙门,皈依善果,不是甚么恶魔毒怪,怕他怎么!』公婆两个,闻说他名号皈正沙门之言,却才定性回惊,教:『请来,请来。』长老出门叫来,又吩咐道:『适才这老者甚恶你等,今进去相见,切勿抗礼,各要尊重些。』八戒道:『我俊秀,我斯文,不比师兄撒泼。』

西游记,第九十二回 (页2):

却说西海中有个探海的夜叉,巡海的介士,远见犀牛分开水势,又认得孙大圣与二天星,即赴水晶宫对龙王慌慌张张报道:『大王!有三只犀牛,被齐天大圣和二位天星赶来也!』老龙王敖顺听言,即唤太子摩昂:『快点水兵,想是犀牛精辟寒、辟暑、辟尘儿三个惹了孙行者。今既至海,快快拔刀相助。』敖摩昂得令,即忙点兵。顷刻间,龟鳖鼋鼍,鯾鱼白鳜鲤,与虾兵蟹卒等,各执枪刀,一齐呐喊,腾出水晶宫外,挡住犀牛精。犀牛精不能前进,急退后,又有井、角二星并大圣拦阻,慌得他失了群,各各逃生,四散奔走,早把个辟尘儿被老龙王领兵围住。孙大圣见了心欢,叫道:『消停消停!捉活的,不要死的。』摩昂听令,一拥上前,将辟尘儿扳翻在地,用铁钩子穿了鼻,攒蹄捆倒。

西游记,第九十三回 (页2):

是雷公,夜叉?』行者道:『那官儿,有话不说,为何沉吟?』那官儿慌得战战兢兢的,双手举着圣旨,口里乱道:『我公主有请会亲,我主公会亲有请!』八戒道:『我这里没刑具,不打你,你慢慢说,不要怕。』行者道:『莫成道怕你打?怕你那脸哩!快收拾挑担牵马进朝,见师父议事去也!』这正是:路逢狭道难回避,定教恩爱反为仇。毕竟不知见了国王有何话说,且听下回分解。

西游记 (Monkey King)附 录:

却说刘洪杀死的家僮尸首,顺水流去,惟有陈光蕊的尸首,沉在水底不动。有洪江口巡海夜叉见了,星飞报入龙宫,正值龙王升殿,夜叉报道:『今洪江口不知甚人把一个读书士子打死,将尸撇在水底。』龙王叫将尸抬来,放在面前,仔细一看道:『此人正是救我的恩人,如何被人谋死?常言道,恩将恩报。

我今日须索救他性命,以报日前之恩。』即写下牒文一道,差夜叉径往洪州城隍土地处投下,要取秀才魂魄来,救他的性命。

城隍土地遂唤小鬼把陈光蕊的魂魄交付与夜叉去,夜叉带了魂魄到水晶宫,禀见了龙王。龙王问道:『你这秀才,姓甚名谁?何方人氏?因甚到此,被人打死?』

三人望江痛哭,早已惊动水府。有巡海夜叉,将祭文呈与龙王。龙王看罢,就差鳖无帅去请光蕊来到,道:『先生,恭喜!恭喜!今有先生夫人,公子同岳丈俱在江边祭你,我今送你还魂去也。再有如意珠一颗,走盘珠二颗,绞绡十端,明珠玉带一条奉送。你今日便可夫妻子母相会也。』光蕊再三拜谢。龙王就令夜叉将光蕊身尸送出江口还魂,夜叉领命而去。

Note that all Yaksha 夜叉 here seems to be males, opposed to the female Yakshini. Also, they play minor roles. Such as minor guards. For pictures, see: Yaksha, 夜叉, Yakshini

食蔓鬼 Wreath-eating Ghost

Perm url with updates: http://xahlee.org/dinju/wreath-eating_ghost.html

食蔓鬼 Wreath-eating Ghost

Xah Lee, 2011-07-09

shi man gui-s
食蔓鬼. img src

This is one of the sculptures outside of 丰都鬼城 (Fengdu Ghost City).

What does 食蔓鬼 mean? 食=eat. 蔓 means 蔓菁, aka 蕪菁, 大頭菜 , which means turnip. So, with literal interpretation, it means “turnip-eating ghost”.

蔓菁 Turnip

Brassica rapa plant-s
蔓菁, 大頭菜, turnip. img src
turnip flower
Turnip flower. img src
800px-Brassica rapa turnip
Turnip roots. img src

食蔓鬼 Wreath-eating Ghost

But what does it really mean? According to this page: 鬼的种类及业因 By 陈咏明 @ www.longyuan.net, quote:

鬼的种类极多,正法念经卷十六中列举出三十六种鬼:

十一、食蔓鬼 有人用鲜花进行祭祀时,此鬼便于此时食花,虽身常饥渴,但不能吃别的东西。因其前生曾盗用装饰佛像的“华蔓”来打扮自己。“华蔓”指用鲜花编织成串的装饰物。

Quick translation:

There are many types of ghosts. The book 正法念经 [a old Buddhism scripture], section 16, lists 36 types of ghosts.

11. Wreath-eating Ghost. When in worship, this ghost eats the fresh flower given to the gods. Although she is hungry, but she can't eat anything else. Because, in prior life, she stole the ornamental flowers used on budha statues to prettify herself. “华蔓” is a ornamental wreath made of fresh flowers.

wreath-eating ghost head
Head of the wreath-eating ghost. img src

Ok, but why is she suckling a deer?

According to this blog 九州风情之二 丰都鬼城 (2007-04-28) By Source blog.sina.com.cn. Quote:

鬼门关外道路两侧有神态各异十六罗刹鬼,最美丽的是食蔓鬼,也是我最喜欢 的。食蔓鬼,传说中因为她在世的时候太臭美,把献给佛像的“华蔓”偷来打扮自 己,变成鬼后只能吃祭祀的鲜花,虽身常饥渴,但不能吃别的东西,还要用乳汁 来喂养小鹿。唉唉,每个鬼鬼也都有自己的故事呀。

Quick translation:

On the sides of the road of Hell's Gate , there are 16 sculptures of different ghosts. The most beautiful is the wreath-eating ghost, my favorite. By folklore, when she was a human in her previous life, she's smug and stole the flowers dedicated to gods to prettify herself, so when she became a ghost [due to reincarnation] she has only wreath to eat, and nothing else, even in constant hunger. She also must feed the little deer with her breast milk. =sigh=, even ghosts have their tales to tell.

HTML5 {meter, progress} Tags

Perm url with updates: http://xahlee.org/js/html5_meter_tag.html

HTML5 {meter, progress} Tags

Xah Lee, 2011-07-08

This page shows examples of the “meter” and “progress” tags.

Meter Tag

The meter tag is a inline element. It is used to indicate a measure within a given range. e.g. disk usage, percentage. Here's a example:

Example 1

HTML Code:

<p><meter value="0.7">70%</meter></p>

Here's what your browser shows:

70%

Example 2

<p><meter value="3" min="0" max="5">★★★</meter></p>

★★★

Meter Tag Attributes

  • value. Required.
  • min and max. Specifies the range for possible values of the “value” attribute. If none of these are specified, they are by default 0 and 1.
  • low. A number indicating that values below or equal to it is considered low. (must be within “min” and “max”.)
  • high. A number indicating that values above or equal to it is considered high. (must be within “min” and “max”.)
  • optimum. A number indicating the optimal value. e.g. if it is equal to “max”, then it means higher is better.

The “meter” tag should not be used to indicate progress (as in a progress bar).

Meter Tag Browser Support

As of today (2011-07), it is supported by Google Chrome, Opera.

It is currently not supported by Firefox, IE9, Safari.

http://dev.w3.org/html5/spec-author-view/the-meter-element.html#the-meter-element

“progress” Tag

The “progress” tag is used for a progress bar. e.g. download completion, etc.

Example 1

<p><progress value="0.3"></progress></p>

Your browser shows:

Example 2

<p><progress value="4" max="10"></progress></p>

Your browser shows:

Attributes

  • value. Required.
  • max. Optional. Indicates the max value, if known.

Progress Tag Browser Support

As of today (2011-07), it is supported by Google Chrome, Opera.

It is currently not supported by Firefox, IE9, Safari.

http://dev.w3.org/html5/spec-author-view/the-progress-element.html#the-progress-element

Back to HTML5 Tags.

2011-07-08

Perl Program to Delete Duplicate Files

Perm url with updates: http://xahlee.org/perl-python/delete_dup_files.html

Perl Program to Delete Duplicate Files

Xah Lee, 2005-03-20, 2011-07-08

Suppose you have 30 thousands of files in many directories. Some of these files are identical, but you don't know which ones are identical with which. Here's a perl script that solves the problem.

How to Use It

perl del_dup.pl --help

To find dup files in a dir:
 perl del_dup.pl dirpath

To find dup files in several dir:
 perl del_dup.pl dirpath1 dirpath2 dirpath3 ...

To delete dup files:
 perl del_dup.pl --delete dirpath
or
 perl del_dup.pl --delete dirpath1 dirpath2 ...

When there are duplicate files, the first one found (in the order the dir is given) is preserved, the others are deleted.

To see this help again:
 perl del_dup.pl --help

Note: the options --help and --delete must be first argument.

A file is considered duplicate only if its file content is exactly identical. If you have 2 images, one is scaled version of the other, they are not considered identical.

I use this script on 30 thousand image files regularly over the years.

Sample Output

perl del_dup.pl --delete C:\Users\h3\Pictures\keyboard "C:\Users\h3\Pictures\keyboard - Copy"

Adding dir to check: C:\Users\h3\Pictures\keyboard
Adding dir to check: C:\Users\h3\Pictures\keyboard - Copy
There are a total of 32 files examed.
==============================
There are 16 unique file size.
==============================
---------------------
These following files are identical:
C:\Users\h3\Pictures\keyboard/windowslogo.gif
C:\Users\h3\Pictures\keyboard - Copy/windowslogo.gif

C:\Users\h3\Pictures\keyboard/DSC_1108.jpg
C:\Users\h3\Pictures\keyboard - Copy/DSC_1108.jpg

C:\Users\h3\Pictures\keyboard/ms-sidewinder-x6-gaming-keyboard-full.jpg
C:\Users\h3\Pictures\keyboard - Copy/ms-sidewinder-x6-gaming-keyboard-full.jpg

C:\Users\h3\Pictures\keyboard/g510.jpg
C:\Users\h3\Pictures\keyboard - Copy/g510.jpg

C:\Users\h3\Pictures\keyboard/71Uvd2tZOZL._AA1500_.jpg
C:\Users\h3\Pictures\keyboard - Copy/71Uvd2tZOZL._AA1500_.jpg

C:\Users\h3\Pictures\keyboard/g510 red.jpg
C:\Users\h3\Pictures\keyboard - Copy/g510 red.jpg

C:\Users\h3\Pictures\keyboard/ms x4.jpg
C:\Users\h3\Pictures\keyboard - Copy/ms x4.jpg

C:\Users\h3\Pictures\keyboard/81fuOEG-2lL._AA1500_.jpg
C:\Users\h3\Pictures\keyboard - Copy/81fuOEG-2lL._AA1500_.jpg

C:\Users\h3\Pictures\keyboard/g110.jpg
C:\Users\h3\Pictures\keyboard - Copy/g110.jpg

C:\Users\h3\Pictures\keyboard - Copy/81hTgnd037L._AA1500_.jpg
C:\Users\h3\Pictures\keyboard/81hTgnd037L._AA1500_ - Copy.jpg
C:\Users\h3\Pictures\keyboard - Copy/81hTgnd037L._AA1500_ - Copy.jpg
C:\Users\h3\Pictures\keyboard/81hTgnd037L._AA1500_.jpg

C:\Users\h3\Pictures\keyboard/lenovo_thinkpad_usb_trackpoint_keyboard-2.jpg
C:\Users\h3\Pictures\keyboard - Copy/lenovo_thinkpad_usb_trackpoint_keyboard-2.jpg

C:\Users\h3\Pictures\keyboard/g19.jpg
C:\Users\h3\Pictures\keyboard - Copy/g19.jpg

C:\Users\h3\Pictures\keyboard/g510 yellow - Copy.jpg
C:\Users\h3\Pictures\keyboard - Copy/g510 yellow - Copy.jpg

C:\Users\h3\Pictures\keyboard/g510 green.jpg
C:\Users\h3\Pictures\keyboard - Copy/g510 green.jpg

==============================
There are 16 redundant files, totaling 2396674 bytes.
The following files (if any) will be deleted (if you used the “--delete” option):
C:\Users\h3\Pictures\keyboard - Copy/71Uvd2tZOZL._AA1500_.jpg
C:\Users\h3\Pictures\keyboard - Copy/81fuOEG-2lL._AA1500_.jpg
C:\Users\h3\Pictures\keyboard - Copy/81hTgnd037L._AA1500_ - Copy.jpg
C:\Users\h3\Pictures\keyboard - Copy/81hTgnd037L._AA1500_.jpg
C:\Users\h3\Pictures\keyboard - Copy/DSC_1108.jpg
C:\Users\h3\Pictures\keyboard - Copy/g110.jpg
C:\Users\h3\Pictures\keyboard - Copy/g19.jpg
C:\Users\h3\Pictures\keyboard - Copy/g510 green.jpg
C:\Users\h3\Pictures\keyboard - Copy/g510 red.jpg
C:\Users\h3\Pictures\keyboard - Copy/g510 yellow - Copy.jpg
C:\Users\h3\Pictures\keyboard - Copy/g510.jpg
C:\Users\h3\Pictures\keyboard - Copy/lenovo_thinkpad_usb_trackpoint_keyboard-2.jpg
C:\Users\h3\Pictures\keyboard - Copy/ms x4.jpg
C:\Users\h3\Pictures\keyboard - Copy/ms-sidewinder-x6-gaming-keyboard-full.jpg
C:\Users\h3\Pictures\keyboard - Copy/windowslogo.gif
C:\Users\h3\Pictures\keyboard/81hTgnd037L._AA1500_.jpg
File deletion done (if any)!

Buy

Use the paypal button below, pay $5. In the comment field, put “perl delete dup”. I'll email you the program. Make sure your email address is included and correct.

Legal disclaimer: this software is sold as is. I'm not responsible for any damages caused by this software.

2011-07-07

HTML Table Examples with colgroup and col

Perm url with updates: http://xahlee.org/js/html_table_colgroup.html

HTML Table Examples with colgroup and col

Xah Lee, 2011-07-07

HTML table has a “colgroup” tag. It is used to indicate that several columns are a group. It does not change the rendering of the table. However, it is convenient to use it so that you can use CSS on just one tag, instead of adding a class to every “th” tag. Example:

1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9

Here's the source code:

<table border="1">
<colgroup span="1" style="background-color:blue"></colgroup>
<colgroup span="3" style="background-color:pink"></colgroup>
<colgroup span="2" style="background-color:yellow"></colgroup>
<colgroup span="3" style="background-color:gray"></colgroup>
<tr>
<td>1,1</td>
<td>1,2</td>
<td>1,3</td>
<td>1,4</td>
<td>1,5</td>
<td>1,6</td>
<td>1,7</td>
<td>1,8</td>
<td>1,9</td>
</tr>

<tr>
<td>2,1</td>
<td>2,2</td>
<td>2,3</td>
<td>2,4</td>
<td>2,5</td>
<td>2,6</td>
<td>2,7</td>
<td>2,8</td>
<td>2,9</td>
</tr>
</table>

The “colgroup” tag must come before any {tr, thead, tbody, tfoot}.

Using “col” tag

Alternatively, you can also use the “col” tag instead of {colgroup with span}.

1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9

Here's the relevant source code:

<colgroup style="background-color:blue"><col span="1"></col></colgroup>
<colgroup style="background-color:pink"><col span="3"></col></colgroup>
<colgroup style="background-color:yellow"><col span="2"></col></colgroup>
<colgroup style="background-color:gray"><col span="3"></col></colgroup>

The “col” tag must always be used inside “colgroup”.

Alternatively, you can just repeat the “col” tag instead of using the “span” attribute. For example, write:

<colgroup><col></col><col></col></colgroup>

instead of:

<colgroup><col span="2"></col></colgroup>

Browser Support

The {colgroup, col} tags are supported in all major browsers as of 2011-07.

See also:

2011-07-04

HTML5 “ruby” Tag

Perm url with updates: http://xahlee.org/js/html5_ruby_tag.html

HTML5 “ruby” Tag

HTML5 has {ruby, rt, rp} tags. These are used for pronunciation markup for Asian languages (mostly Japanese, sometimes Chinese). This page show examples of how to use it.

Following is HTML code for ruby annotation of the chinese characters 漢字, with the japanese pronunciation {かん,じ}.

<ruby>漢<rt>かん</rt>字<rt>じ </rt></ruby>

Here's what your browser shows: かんじ 

If your browser supports ruby annotation, the pronunciation should be rendered in small font above the chinese characters.

The following are 2 more examples. The first one uses the pinyin pronunciation system, the second uses zhuyin. (See: Zhuyin (bopomofo), Pinyin, IPA, Comparison.)

<ruby>汉<rt>hàn</rt>字<rt>zì </rt></ruby>
<ruby>漢<rt>ㄏㄢˋ</rt>字<rt>ㄗˋ </rt></ruby>

what your browser shows: hàn, ㄏㄢˋㄗˋ 

The “rp” Tag

The “rp” tag is used to add parenthesis around the pronunciation symbols for browsers that does not support ruby.

  • If the browser does support ruby, then “rp” and its content is ignored.
  • If the browser does NOT understand any of the ruby tags, then normally it'll just ignore the tag but display the tag's text content, which results in showing pronunciation inside parenthesis.

Example:

<ruby>
漢 <rp>(</rp><rt>かん</rt><rp>)</rp>
字 <rp>(</rp><rt>じ</rt><rp>)</rp>
</ruby>

what your browser shows: (かん)()

Browser Support

As of today (2011-07), the browsers that support ruby are: {Google Chrome (v12), Safari (v5.0.5), IE9}. Firefox 5 and Opera 11.50 are fails.

Back to HTML5 Tags.

2011-07-03

Emacs Lisp: Processing HTML: Transform Tags to HTML5 “figure” and “figcaption” Tags

Perm url with updates: http://xahlee.org/emacs/elisp_batch_html5_tag_transform.html

Emacs Lisp: Processing HTML: Transform Tags to HTML5 “figure” and “figcaption” Tags

Xah Lee, 2011-07-03

Another triumph of using elisp for text processing over {perl,python}.

The Problem

Summary

I want batch transform the image tags in 5 thousand html files to use HTML5's new “figure” and “figcaption” tags.

I want to be able to view each change interactively, while optionally give it a “go ahead” to do the whole job in batch.

Interactive eye-ball verification on many cases lets me be reasonably sure the transform is done correctly. It also lets me see whether i want to push forward with this change.

Detail

HTML5 has the following new tags: “figure” and “figcaption”. They are used like this:

<figure>
<img src="cat.jpg" alt="my cat" width="167" height="106">
<figcaption>my cat!</figcaption>
</figure>

(For detail, see: HTML5 “figure” & “figurecaption” Tags Browser Support)

On my website, i used a similar structure. They look like this:

<div class="img">
<img src="cat.jpg" alt="my cat" width="167" height="106">
<p class="cpt">my cat!</p>
</div>

So, i want to replace them with the HTML5's new tags. This can be done with a regex. Here's the “find” regex:

<div class="img">
?<img src="http://xahlee.org/emacs/\([^"]+?\)" alt="\([^"]+?\)" width="\([0-9]+?\)" height="\([0-9]+?\)">?
<p class="cpt">\([^<]+?\)</p>
?</div>

Here's the replacement string:

<figure>
<img src="http://xahlee.org/emacs/\1" alt="\2" width="\3" height="\4">
<figcaption>\5</figcaption>
</figure>

Then, you can use “find-dired” and dired's “dired-do-query-replace-regexp” to work on your 5 thousand pages. Nice. (See: Emacs: Interactively Find & Replace String Patterns on Multiple Files.)

However, the problem here is more complicated. There may be more than one image per group. Also, the caption part may also contain complicated html. Here's some examples:

<div class="img">
<img src="cat1.jpg" alt="my cat" width="200" height="200">
<img src="turtle.jpg" alt="my turtle" width="200" height="200">
<p class="cpt">my cat and my turtle</p>
</div>
<div class="img">
<img src="jamie_cat.jpg" alt="jamie's cat" width="167" height="106">
<p class="cpt">jamie's cat! Her blog is <a href="http://example.com/jamie/">http://example.com/jamie/</a></p>
</div>

So, a solution by regex is out.

Solution

The solution is pretty simple. Here's the major steps:

  • Use “find-lisp-find-files” to traverse a dir.
  • For each file, open it.
  • Search for the string <div class="img">
  • Use “sgml-skip-tag-forward” to jump to its closing tag.
  • Save the positions of these tag begin/end positions.
  • Ask user if she wants to replace. If so, do it. (using “delete-region” and “insert”)
  • Repeat.

Here's the code:

;; -*- coding: utf-8 -*-
;; 2011-07-03
;; replace image tags to use html5's “figure”  and “figcaption” tags.

;; Example. This:
;; <div class="img">…</div>
;; should become this
;; <figure>…</figure>

;; do this for all files in a dir.

;; rough steps:
;; find the <div class="img">
;; use sgml-skip-tag-forward to move to the ending tag.
;; save their positions.
;; ask user whether to replace, if so, delete them and insert new string

(defun my-process-file (fpath)
  "process the file at fullpath FPATH ..."
  (let (mybuff p1 p2 p3 p4 )
    (setq mybuff (find-file fpath))

    (widen)
    (goto-char 0) ;; in case buffer already open

    (while (search-forward "<div class=\"img\">" nil t)
      (progn
        (setq p2 (point) )
        (backward-char 17) ; beginning of “div” tag
        (setq p1 (point) )

        (forward-char 1)
        (sgml-skip-tag-forward 1) ; move to the closing tag
        (setq p4 (point) )
        (backward-char 6) ; beginning of the closing div tag
        (setq p3 (point) )
        (narrow-to-region p1 p4) 

        (when (y-or-n-p "replace?")
          (progn 
            (delete-region p3 p4 )
            (goto-char p3)
            (insert "</figure>")

            (delete-region p1 p2 )
            (goto-char p1)
            (insert "<figure>")
            (widen) ) ) ) )

    (when (not (buffer-modified-p mybuff)) (kill-buffer mybuff) )

    ) )

(require 'find-lisp)


(let (outputBuffer)
  (setq outputBuffer "*xah img/figure replace output*" )
  (with-output-to-temp-buffer outputBuffer 
    (mapc 'my-process-file (find-lisp-find-files "~/web/xahlee_org/emacs/" "\\.html$"))
    (princ "Done deal!")
    ) )

Seems pretty simple right?

The “p1” and “p2” variables are the positions of start/end of <div class="img">. The “p3” and “p4” is the start/end of its closing tag </div>.

We also used a little trick with “widen” and “narrow-to-region”. It lets me see just the part that i'm interested. It narrows to the beginning/end of the div.img. This makes eye-balling a bit easier.

The real time-saver is the “sgml-skip-tag-forward” function from “html-mode”. Without that, one'd have to write a mini-parser to deal with html's nested ways to be able to locate the proper ending tag.

Using the above code, i can comfortably eye-ball and press “y” at the rate of about 5 per second. That makes 300 replacements per minute. I have 5000+ files. If we assume there are 6k replacement to be made, then at 5 per second means 20 minutes sitting there pressing “y”. Quite tiresome.

So, now, the next step is simply to remove the asking (y-or-n-p "replace?"). Or, if i'm absolutely paranoid, i can make emacs write into a log buffer for every replacement it makes (together with the file path). When the batch replacement is done (probably takes 1 or 2 minutes), i can simply scan thru the log to see if any replacement went wrong. For a example of that, see: Emacs Lisp: Multi-Pair String Replacement with Report.

Also note that i left each changed file unsaved in emacs. If i decided i didn't want to commit the changes, i can exit emacs without saving. Or, i can go to “ibuffer” and press 3 keys to save and close them all 【*uS】. But if you want them saved with elisp, you can just add (save-buffer). Note that emacs automatically makes a backup~ of the original files if you haven't turned that off.

But what about replacing <p class="cpt">…</p> with <figcaption>…</figcaption>?

I simply copy-pasted the above code into a new file, and make changes in 4 places. So, the replacing figcaption part is done in a separete second batch job. Of course, one could spend extra hour to make the code do them both in one pass, but that extra time of thinking & coding isn't worthwhile for this one-time job.

I ♥ Emacs, do you?

Change in Current Buffer

Here's the code that changes both {div.img, p.cpt} to {figure, figcaption} in one shot, on the current buffer. It output the changes to a temp buffer, so you can scan it.

(defun xah-fix-wrap-img-figure ()
  "Change current buffer's <div class=\"img\"> to <figure> and <p class=\"cpt\"> to <figcaption>."
  (interactive)

  (save-excursion 
    (let (p1 p2 p3 p4 
             mystr
             ξchanges
             (changedItems '())
             (mybuff (current-buffer))
             )

      (goto-char (point-min)) ;; in case buffer already open
      (while (search-forward "<div class=\"img\">" nil t)
        (progn
          (setq p2 (point) )
          (backward-char 17)
          (setq p1 (point) )

          (forward-char 1)
          (sgml-skip-tag-forward 1)
          (setq p4 (point) )
          (backward-char 6)
          (setq p3 (point) )

          (when t
            (setq mystr (buffer-substring-no-properties p1 p4))
            (setq changedItems (cons mystr changedItems ) )
            
            (progn 
              (delete-region p3 p4 )
              (goto-char p3)
              (insert "</figure>")

              (delete-region p1 p2 )
              (goto-char p1)
              (insert "<figure>")
              (widen) )
            ) ) )

      (goto-char (point-min)) ;; in case buffer already open
      (while (search-forward "<p class=\"cpt\">" nil t)
        (progn
          (setq p2 (point) )
          (backward-char 15)
          (setq p1 (point) )

          (forward-char 1)
          (sgml-skip-tag-forward 1)
          (setq p4 (point) )
          (backward-char 4)
          (setq p3 (point) )

          (when t
            (setq mystr (buffer-substring-no-properties p1 p4))
            (setq changedItems (cons mystr changedItems ) )
            
            (progn 
              (delete-region p3 p4 )
              (goto-char p3)
              (insert "</figcaption>")

              (delete-region p1 p2 )
              (goto-char p1)
              (insert "<figcaption>")
              (widen) )
            ) ) )

      (with-output-to-temp-buffer "*changed items*" 
        (mapc (lambda ( ξchanges) (princ ξchanges) (princ "\n\n") ) changedItems)
        (set-buffer "*changed items*")
        (funcall 'html-mode)
        (set-buffer mybuff)
        ) )) )

Are You Intelligence Enough to Understand HTML5?

Perm url with updates: http://xahlee.org/UnixResource_dir/writ/html5_vs_intelligence.html

Are You Intelligent Enough to Understand HTML5?

Xah Lee, 2011-07-03

Are you intelligent?

Check. I have SAT score of {600/verbal, 660/math} to prove it. Was a member of Mensa in ~1992. (I exited Mensa because i didn't want to pay annual membership fee.)

Do you have years of experience working with HTML?

Check. My website xahlee.org is 5 thousand pages of hand-crafted html, started in 1997. More than three thousand of those pages are written, typed, word by word, tag by tag, in a text editor. They are strictly correct, passing W3C's HTML validator.

Do you have good understanding of mathematical logic?

Check. I have studied math for over a decade, and am especially interested in formal logic. I've written several articles about it. By Luck, I've also done a stint as a lecturer for graduate math students on math visualization programing. (See: Math Notations, Computer Languages, and the “Form” in Formalism.)

So, you are reading HTML5 spec — the EZ edition for web authors. For example, there's this subsection on content models: HTML5 Content models @ dev.w3.org. Do you understand it?

No.

html5 content models categories diagram

HTML5 content models categories diagram

Do you read Slashdot, Reddit, Hacker News regularly, and you are well acquainted with bleeding-edge practices of programing, and your peers respectfully refer you as a hacker? You often help others with stern advices of HTML semantics vs representation, about big O of algorithms and Turing-complete of languages, about proper software engineering practices such as {design patterns, unit testing, eXtreme Programing}, and encourage others to read docs such as RFC and wisdoms such as the unix philosophy, for a world of better software?

No, YOU are half-assed moron, the mud in a puddle, the foam in a pisspot, the shit in cesspit. You are a scumbag — a bag that gathers scum, and won't stop at that.

Go shove your unit testing, your “patterns”, your UML, your eXtreme Programing, your big O turing-complete verbiage into ya ass. Shove your perl sigil more than one way. Pythonic your face. Unix philosophy your pipe. Insert your lisp cons in your colon, tail recursion your hindquarters. Roll your emacs vi command-line interface gospels in your scrotum. Go spit your drivels in your mom's pussy. Fuck you. That is: “F” “U” “C” “K” “Y” “O” “U” — Fuck YOU!

HTML5 “time” Tag

Perm url with updates: http://xahlee.org/js/html5_time_tag.html

HTML5 “time” Tag

Xah Lee, 2011-07-03

The “time” tag is used to represent date/time combination. Examples:

<footer>Published <time pubdate datetime="2011-07-03">07/03</time>.</footer>
I need this <time datetime="2011-07-03T12:28:57-07:00">now</time>!
Captain's log, date <time>2011-07-03T12:51:02-07:00</time>.

Note: if you do not include the “datetime” attribute, then your “time” tag's content must follow the same format used by the “datetime” attribute.

“datetime” Attribute

“datetime” attribute encodes the precise date or date/time. The format must be exact, and must include at least “yyyy-mm-dd”. (time info is optional) Examples:

… mom's birthday <time datetime="2011-07-03">July 3rd</time> …
… had piano lesson at <time datetime="2011-07-03T13:00">1 pm</time> …
… the bomb went off at <time datetime="2011-07-03T12:46:03-07:00">12:46:03 PST</time> …

The “datetime” attribute is optional. But if not present, then the “time” tag's content must use the the same fomat used by “datetime” attribute. e.g. yyyy-mm-ddThh:mm:ss-hh:mm.

Sample Correct Formats

  • 2011-07-03
  • 2011-07-03T12:58
  • 2011-07-03T12:58:46

global datetime stamp (with UTC offset)

  • 2011-07-03T12:58-07:00
  • 2011-07-03T12:58:46-07:00
  • 2011-07-03T12:58:46.31-07:00

Wrong Examplez

  • Wrong! <time>today</time>
  • Wrong! <time>July 3</time>
  • Wrong! <time>07/03/11</time>
  • Wrong! <time>Sun Jul 03 13:20:16 2011</time> (incorrect format)
  • Wrong! <time>12:50</time> (requires “yyyy-mm-dd” at least)

“pubdate” Attribute

“pubdate” boolean attribute can be included. If present, that means your time tag represent the time your article is published. Example:

<time pubdate>2011-07-03T13:05:28-07:00</time>
<time pubdate datetime="2011-07-03T13:05:28-07:00">July 03</time>

Note that “pubdate” does not take any values. It is wrong to say pubdate="…".

Back to HTML5 Tags.

July4th Flag Chicks

Gisele Bundchen flag 2

Gisele Bundchen

Tomorrow is July 4th. Remember to take photos of any flag-clad cuties, and any flag things. Send it to me. See: 〈Banners & Damsels & Mores〉 @ http://xahlee.org/Periodic_dosage_dir/lanci/lanci.html