html microformat

perm url: HTML Microformat

HTML Microformat

Xah Lee, 2009-01-17

Learned about the term Microformat↗. Basically, you just use your html's “class” attribute and tag structure to represent structured data, so that you can parse and manipulate them easily. It is more or less a home-cooked method of using html/xhtml to achieve the purposes of specialized XML. (The concept of microformat is similar to many software's ad hoc line-based text file formats. (e.g. unix config files))

I've been using microformat for my English Vocabulary project. For example, see the source code for this page: Vocabulary Study: Hyphenated Wonders. Effectively, i created a microformat for vocabulary citation. Namely, each entry is a word entry, with container for usage example, cited source, definition. Here's a example:

<div class="ent">
 <p class="wd">‹a word›</p>
 <div class="ex">
  <div class="bdy">‹some example usage involving the word›</div>
  <div class="src">‹source of the above example usage›</div>
 <div class="def">‹a word› = ‹word's definition›</div>

I've also been using microformat for annotation on literature of my World Literature Classics project. For example, see source code of: What Desires Are Politically Important?. A lose microformat is basically a structured html/xhtml. When done in a strict way, it effectively makes the page ready for semantic web, and can be machine parsed and transformed easily.

In the past year, i've been gradually cleaning up the 3500+ pages of my website towards more and more structure use of html, with the eventual goal that they can be validated by a grammar checker (besides being lexically valid html/xhtml). This work has been done haphazardly in a gradual manner. Part of it is designing bits of microformat for my website diverse projects. For example, you'll need various miroformats for projects that are math expositions, programing tutorials, annotated literature, art/photo gallery, commentary/essays. Part of the job is converting snippets of existing pages to the newly formed microformat, using combination of emacs/elisp and perl, python, primarily based on regex. Part of it is ongoing coding of bits of grammar validator in elisp. (in elisp because so that i can interactively validate as i write new pages.) Part of this started me in studying parsers and especially the promising Parsing Expression Grammar for these purposes. See Pattern Matching vs Lexical Grammar Specification.

Related essays:

Suggestions on Emacs's Inline Doc

perm url: http://xahlee.org/emacs/modernization_inline_doc.html

Suggestions on Emacs's Inline Doc

Xah Lee, 2009-01-17

In emacs, you can press “Ctrl+x h ‹function name›” to see any elisp function's inline documentation, and if the cursor is on a function, it defaults to lookup that function. This integrated facility is extremely convenient. Some other scripting languages such as Perl, Python, Ruby, Javascript etc do provide such lookup too, often thru their command-line interface, but is nowhere near as convenient as in the emacs environment.

However, some improvement can be made. Here are some suggestions:

  • (1) make elisp-index-search's default prompt to be the symbol under cursor. This seems useful and consistent with other emacs lookup commands.
  • (2) make describe-function display a link to the elisp manual's node on that function.
  • (3) make describe-function display related functions as in “See also: ...”

For (1), few people have suggested implementations here: http://groups.google.com/group/gnu.emacs.help/browse_frm/thread/f248ae0258c1b37a

Here's one that works for me:

(defadvice elisp-index-search (before interactive-default activate)
  "Provide the symbol at point as the default when reading TOPIC interactively."
   (let ((symbol-at-point (thing-at-point 'symbol)))
     (list (read-string (if symbol-at-point
                            (format "Topic (%s): " symbol-at-point)
                          (format "Topic: "))
                        nil nil symbol-at-point)))))

For (2), showing a link to elisp manual of pertinent page would be convenient. Because sometimes inline doc is not detailed enough or doesn't provide context.

For (3), listing similar functions, is a practical need. For example, when looking up on goto-line, it might say “See also: goto-line, move-to-column, ...” etc..

Note that listing related functions in a function's doc is in many programing lang manuals. e.g Mathematica, MS's JScript, PHP ... they are quite useful. Because for those not expert yet of a lang (which is majority), often they do not know similar functions or do not know if there's manual page that list such, and often are confused about the differences of many functions that seem the same. By providing a list of similar functions, a coder can easily locate the right function to use for his task.

Note: some of the above suggestions are reported to emacs dev as bug number: #575, #1119.

A Ruby Illustration of Lisp Problems

perm url: http://xahlee.org/UnixResource_dir/writ/lisp_problems_by_ruby.html

A Ruby Illustration of Lisp Problems

Xah Lee, 2009-01-17

Here's a interesting toy problem posted by Drew Krause to comp.lang.lisp:

OK, I want to create a nested list in Lisp (always of only integers) from a text file, such that each line in the text file would be represented as a sublist in the 'imported' list.

Example of input

3 10 2
4 1
11 18

example of output:

((3 10 2) (4 1) (11 18))

Here's a emacs lisp version:

(defun read-lines (file)
  "Return a list of lines in FILE."
    (insert-file-contents file)
     (buffer-substring-no-properties 1 (point-max)) "\n" t)

 (lambda (x)
    (lambda (y) (string-to-number y) )
    (split-string x " ")
 (read-lines "xxblob.txt")

The above coding style is a typical maintainable elisp.

In a show-off context, it can be reduced to by about 50%, but still far verbose than ruby or say perl (which is 1 or 2 lines. (python would be 3 or 5)).

w_a_x_...@yahoo.com and William James gave a ruby solution:

IO.readlines("blob.txt").map{|line| line.split.map{|s| s.to_i }}

That's really the beauty of Ruby.

This problem and ruby code illustrates 2 fundamental problems of lisp, namely, the cons problem, and the nested syntax pain. Both of which are practically unfixable.

The lisp's cons fundamentally makes nested list a pain to work with. Lisp's nested syntax makes functional sequencing cumbersome.

In the ruby code, its post-fix sequential notation (as a side effect of its OOP notation) brings out the beauty of functional sequencing paradigm (sometimes known as functional chain, sequencing, filtering, unix piping).

its list, like all modern high level langs such as perl, php, python, javascript, don't have the lisp's cons problem. The cons destroys the usability of lists up-front, untill you have some at least 2 full-time years of coding lisp to utilize cons properly. (and even after that, it is still a pain to work with, and all you gain is a bit of speed optimization in rare cases that requires largish data, most of which has better solutions such as a database.)

Both of these problems i've published articles on. For more detail on the cons problem, see the section “The Cons Business” at Fundamental Problems of Lisp.

For more detail on the nested syntax problem for function chaining, see the section “How Purely Nested Notation Limits The Language's Utility” at The Concepts and Confusions of Prefix, Infix, Postfix and Fully Nested Notations.

Related essays:


Neal Stephenson at Google Talk

perm url: http://xahlee.org/Periodic_dosage_dir/Neal_Stephenson.html

Neal Stephenson at Google Talk

Xah Lee, 2009-01-16

Was chatting on freenode's irc #rcirc channel out of boredome. I asked out in the open for suggestions on some sci-fi movies to watch. Sabetts (Shawn Betts, author of Ratpoison↗ and Stumpwm↗) mentioned that Neal Stephenson↗ has a google talk.

Neal Stephenson talk at Google on 2008-09-12.

I watched the entire 58 min of it. In the beginning 5 or 10 min, you see this boring guy, humorless, self-absorbed, absent-minded nerd, going on monotonously. The entire talk is a emotionless monotone, somewhat demeaning and self-abasing too, entirely devoid of any high points, energy, constantly letting out a subdued sigh. Can't find a single gleam of a smile on his face thru the entire talk.

I've of course heard of him, first time in 1998 thru a colleague (Jon Frisby↗), who named his coding projects after his books. Neal is this sci-fi novel writer, famous for titles like Snow Crash, Cryptonomicon, etc, some kinda celebrity god among tech geekers. I watched to see what he have say, after all he's giving a talk at Google.

It turns out, i find him to be extremely intelligent. When he got asked about what he thinks of Wikipedia, my ears perked up intently. I'm a Wikipedia expert, as far as what it is, the quality of it, its relation to the tech geekers, and to humanity at large, so his answers will be a high point for me to make a judgment of him. And then behold, what quality in observation he has, brought out in such a un-spectacular mannerism. Though, it is disappointing when asked about Second Life, for which his answer was that he basically never tried it so doesn't know much to comment, despite the fact that he partially founded such a metaverse idea and in fact supported its development by creating a wiki metaweb.com during mid 2000s. (in fact, my name and my article on trolling (On Ignoring Trolls) was mentioned on that wiki while it existed. (metaweb.com went defunct few years back and today it's some company's site.))

Related essays:


2 php tutorials

Two PHP tutorials: How To Send HTML Mail With PHP, How To Send Mail with Attachment in PHP (computing; computer language)

How To Send Mail with Attachment in PHP

Xah Lee, 2009-01-14

This page shows you how to send email with attached file, using PHP.

Sending email in php is extremely easy. All you have to do is call the “mail” function. But how do you send out email with attachment?

There are php packages that allows you to do that, however, they will often need installation of the package, and if you are using a web hosting service provider, sometimes that is not possible. Luckily, it is not difficult to write a simple code that does it. All you have to do is to encode your mail payload as multipart MIME↗.

Here's a simple working example of sending html mail with attached file:


$fromAddr = 'staff@example.com'; // the address to show in From field.
$recipientAddr = 'jane@example.org';
$subjectStr = 'Thank you';

$mailBodyText = <<<END89283
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<title>Thank You</title>
<b>Login:</b> {$_POST['login']}<br>
<b>Password:</b> {$_POST['password']}<br>

$filePath = 'uploaded_files/great_house.jpg';
$fileName = basename($filePath);
$fileType = 'image/jpeg';
/* to find out what string to use for type, see
or $_FILES['attachment']['type'];

/* encode the email content */


From: $fromAddr
MIME-Version: 1.0
Content-Type: multipart/mixed;

// Add a multipart boundary above the plain message 
$mailBodyEncodedText = <<<TTTTTTTTTTTTTTTTT
This is a multi-part message in MIME format.

Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable



$file = fopen($filePath,'rb'); 
$data = fread($file,filesize($filePath)); 
$data = chunk_split(base64_encode($data));

// file attachment part
$mailBodyEncodedText .= <<<FFFFFFFFFFFFFFFFFFFFF
Content-Type: $fileType;
Content-Disposition: attachment;
Content-Transfer-Encoding: base64




if (
mail( $recipientAddr , $subjectStr , $mailBodyEncodedText, $headers )
) {
  echo '<p>Send successfully!</p>';
} else {
  echo '<p>Bah!</p>';



using USB Flash Drive on mac

Perm url http://xahlee.org/Periodic_dosage_dir/t1/zip_mother-son.html.

USB Flash Drive

I just bought a 18GB usb flash drive. Its speed seems to be 20 times slower than my 8 years old external 20GB firewire drive (which is 10 times as bulky). USB 2 should be as fast as Firewire. So, my initial guess is that flash drive's format is not native ... Wikipedia comes to the rescue:

Some points of personal interest:

  • Because flash drive are solid state storage devices, it doesn't have disk fragmentation issues as do magneto disk based mechanical drives. (It does has its own issues)
  • Flash drive does have a life span, in number of write operation and in storage permanence too.
  • Flash drives uses file systems just like normal drives. In fact, it's just a storage device with usb interface.
  • Most flash drives are pre-formatted with FAT32. If you use it for Mac exclusively, you might want to reformat it to a Mac native format (e.g. HFS+, use Disk Utilities). I haven't tested in detail, but HFS+ seems to improve speed by as much as 2 times.

.DS Store

I've always wondered what does the Mac OS X's “.DS Store” file stores. Wikipedia comes to the rescue: .DS Store↗. Also, Windows has the annoying thumbs.nb↗ files too.

Emacs Should Adopt HTML To Replace Texinfo

Perm url: http://xahlee.org/emacs/modernization_html_vs_info.html

Emacs Should Adopt HTML To Replace Texinfo

Xah Lee, 2009-01-12

Dan Davison wrote:

What does this syntax mean? “See Info node `(viper)Top'.”

Is there some way of using it to immediately access the info node referred to?

Lennart Borgman wrote:

M-: (info "(viper) Top")

Note: the “M-:” above means “Alt+:”. See: Emacs's M-‹key› Notation vs Alt+‹key› Notation.

it'd be much better if emacs adopted html as its standard doc format.

It would than just be: “http://gnu.org/doc/emacs/viper/top.html”

in this format, every programer understand what it is. In “(info "(viper)Top")” or “(viper)Top”, maybe 0.001% of programers knew what it is. If we count among all emacs users who used emacs for at least 1 year, the percentage is perhaps 10%.

Personally, i use emacs daily, staying in emacs most of the time when using computer, since 1998, and have been using text terminal based emacs exclusively from 1998 to 2005. I didn't know what is “(info "...")” until 2005 or so thru chatting in freenode's emacs irc.

Adopting html as standard doc format is easy to do, in fact mostly just a political gesture. Texinfo can already convert to html, and most if not all GNU's doc are already presented in html format on GNU's site.

with adoption of html, people will naturally citing doc by url instead of “info xyz”. This will help understanding and consequently spread emacs. For example, if in a discussion in some programing forum, someone might mention “look (info xyz) in emacs”. Vast majority of readers wouldn't understand what that is will simply ignore it. But if html doc is official, then the citing would be “http://gnu.org/doc/xyz.html”, and those who saw this are very likely to click it.

this wouldn't effect emacs much since emacs can and should still use info doc in emacs as a integrated system. But down the road, say in 5 years, emacs will need to deprecate texinfo eventually. The HTML/XHTML/CSS/JavaScript world is literally with few million more users and developers. Their tools, technical power, extensibility, adoption... in every area, are few order of magnitude better than textinfo. In fact, it isn't surprising that modern browser such as Firefox actually render a html doc faster than emacs can parse a texinfo file.

By adopting html now, it can pave the way for emacs transition to using html/xhtml as integrated doc component. For example, currently there's w3m for reading html. However, it's some 5 times slower than Firefox, and some 5 times slower than info reading texinfo. However, this can be improved. One could have html/xml parser buildin elisp as c code (or borrowing the rendering engine from firefox), so that reading html docs in emacs is acceptably fast as current reading in info.

The integrated nature of info in emacs is really a joy to use, especially programing in elisp. You can lookup any function or keyword in the lang so easily, and all cross-referenced by a clickable link. However, if the lang is not elisp but perl, python, php, etc, then it's not so easy because you often have to download and install a info version of their doc (if it exist at all), and depending whether the guy who implemented your lang's mode took the fancy to implement info doc lookup features. (10 years ago, some mode would still support info doc. Today, as far as i know, nobody bothered with info version of docs.)

When emacs accepted more html docs, it would mean the integrated doc feature automatically apply to all langs, such as java, perl, python, ruby, php, javascript... since their official doc are all html.