2011-08-20

Xah's Programing Language Tutorials

Xah's Programing Language Tutorials

post from g+.

For those of you programers, i write tutorials for several computer languages, Usually i cover only the basics, with lots examples, and without any “engineering” or “computer science” talk. I want it that way so that programers can quickly learn the language as it is. Like, if you type THIS, then THAT will happen on your computer.

few decades ago, programers can know it all. But today, like most sciences, programing has branched into hundreds of specialized fields and tens of general purposes computer languages, all widely used.

If you want to learn the lang, please give my tutorial a shot. Let me know what you think.

of the following tutorials, the Emacs Lisp is the best. Most in depth and comprehensive, and no commercial book comes close in either aspect, except Lisp Manual.

Xah Emacs Lisp Tutorial

Perl am quite a expert, but my tutorial of it really doesn't cover that much, but covers more for python, even though i haven't coded python professionally.

Xah's Perl & Python Tutorial

Xah's Javascript Tutorial

Xah's Java Tutorial

The HTML CSS tutorial is good for you to pickup. The language are really trivial, but they are most about tips and tricks.

Xah's Web Dev Tutorial (HTML, CSS, javascript)

I haven't really picked up OCaml, but this tutorial in my opnion gives you a functional understanding (in the approach i mentioned before), better than other tutorials i know of, because they always talk about currying and other jargons in some computer science way, half of them they don't really understand.

2011-08-19

Third Person Writing for Author Profile

On Google+ i do hate those who write their own profile in third person. Like:

Dr. Xah Lee is professor of philology at University of Bovine. He is a renowned programer and philosopher. He has helped tens of thousand people to better themselves. He has won several awards, and is a recipient of Noble Prize. He is also SEO of Grandiloquence International. He is nominated as the Savior of the Year 2020.

Rule: If you write it yourself, don't third person it.

If anyone doesn't know, vast majority of such, actually i think all of it, in book covers, journal article intro, etc, are written by the person himself. I haven't researched the history of this practice per se, but this convention established precisely for the purpose to make it sound neutral and true, and they do that to sell the journal/book. In short, it's a marketing gimmick!

The Google+ Rap

“The Google+ Rap”

Added to What is Google Plus and Google+ Songs (humor).

2011-08-18

Google Chrome reports Malware Site

Here's a Google Chrome screenshot, reporting that a site has been hacked.

Google Chrome stuckincustoms malware warning 2011-08-18
Google Chrome warning on the site 〔stuckincustoms.com〕, captured on 〔2011-08-18T15:07:21-07:00〕.

The site belongs to a popular photographer Trey Ratcliff Trey Ratcliff on g+.

If you need a invite to g+, click on this plus.google.com

If you don't know what is g+, see: What is Google Plus? (humor).

2011-08-17

Emacs Lisp: HTML Processing: Split Annotation

Perm url with updates: http://xahlee.org/emacs/elisp_text_processing_split_annotation.html

Emacs Lisp: HTML Processing: Split Annotation

Xah Lee, 2011-08-16

This page shows a example of emacs lisp for processing HTML. The HTML files are classical novels. The annotation markups need to change from one format into another. There are hundreds of such pages that need to be processed.

Problem

Summary

For all HTML files in a directory, find any annotation markup containing the bullet “•” symbol:

<div class="x-note">A ⇒ … • B ⇒ … • C ⇒ …</div>

Split the annotation into multiple markups, like this:

<div class="x-note">A ⇒ … </div>
<div class="x-note">B ⇒ … </div>
<div class="x-note">C ⇒ … </div>

Detail

If you are a contract web dev programer, then you know that 99.99% of websites are a messy text soup. They are created by hundreds of tools or languages. Word processors, HTML generators, tens of lighweight markup langs, different frameworks from different languages PHP, Perl, Python, from different web era, from different programers in the past. Even emacs has several modes that generate HTML. They are not in any consistent form. Often, they have missing tags too.

It is in these situations, emacs shines thru, because emacs's powerful embedded language lisp, and its interactive nature, lets you maximize automation. Interactively when you are still feeling the pattern, then by Keyboard Macro or emacs lisp for parts that can be automated.

For my website, i take the time to make sure that my all my HTML are consistent. But still, they are written in the span of 15 years. Periodically i take the time to improve the markup. For example, when new version of CSS or HTML are widely adopted by web browsers. (CSS1 to 2 to 3, HTML 3 to 4 to HTML5.)

I have hundreds of pages of classic novels as HTML documents. These documents contain annotations in special HTML markup. For example, here's sample annotation from Titus Andronicus: Act 1:

• short ⇒ rudely brief. (AHD)
• sharp ⇒ Fierce, impetuous, hash, severe… (AHD)
SATURNINUS. 'Tis good, sir. You are very short with us;
  But if we live we'll be as sharp with you.

Here's the raw HTML:

<div class="x-note">• short ⇒ rudely brief. (AHD)<br>
• sharp ⇒ Fierce, impetuous, hash, severe… (AHD)</div>

<pre class="tx">SATURNINUS. 'Tis good, sir. You are very <span class="xnt">short</span> with us;
  But if we live we'll be as <span class="xnt">sharp</span> with you.
</pre>

Here's how the tag works. Each <span class="xnt"> markup a word in main text. When a word is marked by “span.xnt”, that means it has a sidebar annotation. The sidebar section is marked by <div class="x-note">. Inside the “div.x-note”, there may be more than one entries. Each entry starts with the bullet symbol “•”. For example, in the above, the words “short” and “sharp” are both entries inside a “div.x-note” sidebar.

But recently, i think it is better to have one entry per sidebar. This way, it makes the logic simpler, and is much easier if i want to add Javascript functionality. For example, when mouse hovers on a word in main text, the corresponding annotation would be highlighted.

So, i want write a elisp script to process all my files. If you simply read the spec for this job, of splitting a markup by a particular character, you may think it's trivial and can be done in any lang in 10 minutes. Why then the elaborate discussion about text soup situation?

The important thing is that i DO NOT know what needs to be done to begin with. Only after having used emacs power together with lisp script i wrote before to look at and check my existing markup in hundreds of files, then i know what state they are and decide on what i want to do. Also, this change must be done with the ability to visually check that all changes are done correctly, because the input may not be in the format i expect. (it might be missing the bullet “•”.)

For those Scheme Lisp academic computer science folks, you might wonder, when i started with these annotations, why didn't i “design” it well to begin with. The reason is that, when i write a blog article, or my literature annotation project, i really want focus on the writing first, the content, get it done, rather than get distracted by the CSS/HTML markup design. (one thing i do make sure is that whatever CSS/HTML i device, i made sure that they can be easily changed systematically later by a simple parsing.) I devote significantly more percentage of time on design than most people, but many factors necessitates change. For example, you may not know CSS as well before, and the thoughts of HTML semantics is quite complex. (e.g. see: Are You Intelligent Enough to Understand HTML5?.) Browsers change, standards changes (e.g. HTML → XHTML → HTML5. See: HTML5 Doctype, Validation, X-UA-Compatible, and Why Do I Hate Hackers.), thoughts of best practices change, and my needs for the annotation also changed through-out the years.

Solution

Here's the outline of steps:

  • Open the file. Search for the tag we want.
  • Check if the tag contains a bullet “•”.
  • If so, replace the bullet char with new end tag and beginning tag. e.g. </div> <div>
  • Do this for all files in a dir. (or a given list of files)

Here's the code:

;; -*- coding: utf-8 -*-
;; 2011-08-13
;; process all files in a dir.
;; split any markup like this:
;; <div class="x-note">… • … • …</div>
;; by the bullet •
;; into several x-note tags

(setq inputDir "~/web/xahlee_org/p/" )

;; add a ending slash if not there
(when (not (string= "/" (substring inputDir -1) )) (setq inputDir (concat inputDir "/") ) )

;; files to process
(setq fileList 
[
"~/web/xahlee_org/p/arabian_nights/aladdin/aladdin4_1.html"
"~/web/xahlee_org/p/arabian_nights/aladdin/aladdin3.html"
]
)

(defun my-process-file-xnote (fpath)
  "process the file at fullpath FPATH …"
  (let (myBuffer (ξcounter 0) p1 p2 ξmeat
                 ξmeatNew
                 (changedItems '())
                 (tagBegin "<div class=\"x-note\">" )
                 (tagEnd "</div>" )
                 )

    (require 'sgml-mode)
    (when t

      (setq myBuffer (find-file fpath))
      (goto-char 1)
      (while (search-forward "<div class=\"x-note\">" nil t)

        ;; capture the x-note tag text
        (setq p1 (point))
        (backward-char 1)
        (sgml-skip-tag-forward 1)
        (backward-char 6)
        (setq p2 (point))
        (setq ξmeat (buffer-substring-no-properties p1 p2))

        ;; if it contains a bullet
        (when (string-match "•" ξmeat)
          (setq ξcounter (1+ ξcounter))

          ;; clean the text. Remove some newline and <br> that's no longer needed
          (setq ξmeat (replace-regexp-in-string "\n*• *" "•" ξmeat t t ) )
          (setq ξmeat (replace-regexp-in-string "\n$" "" ξmeat t t ) ) ; delete ending eol
          (setq ξmeat (replace-regexp-in-string "<br>•" "•" ξmeat t t ) )

          ;; put the new entries into a list, for later reporting
          (setq changedItems (split-string ξmeat  "•" t) )

          ;; break the bullet into new end/begin tags
          (setq ξmeatNew (replace-regexp-in-string "•" (concat tagEnd "\n" tagBegin) ξmeat t t ) )

          (goto-char p1)
          (delete-region p1 p2)
          (insert ξmeatNew)

          ;; remove the newline before end tag
          (when (looking-back "\n") (delete-backward-char 1))
          )
        )

      ;; report if the occurance is not n times
      (when (not (= ξcounter 0))
          (princ "-------------------------------------------\n")
          (princ (format "%d %s\n\n" ξcounter fpath))

          (mapc (lambda (ξx) (princ (format "%s\n\n" ξx)) ) changedItems)
        )

        ;; close buffer if there's no change. Else leave it open.
        (when (not (buffer-modified-p myBuffer)) (kill-buffer myBuffer) )
      )
    ))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah x-note output*" )
  (with-output-to-temp-buffer outputBuffer 
    ;; (mapc 'my-process-file-xnote fileList)
    (mapc 'my-process-file-xnote (find-lisp-find-files inputDir "\\.html$"))
  (princ "Done deal!")
    )
  )

Here's a sample output: elisp_text_processing_split_annotation.txt

I've put lots comments in the code. It should be easy to understand. If any part you don't understand, ask me. If you are new to elisp, checkout the first few section of Emacs Lisp Tutorial.

The weird ξ you see in my elisp code is Greek x. I use unicode char in variable name for experimental purposes. You can just ignore it. (See: Programing Style: Variable Naming: English Words Considered Harmful.)

I ♥ emacs.

2011-08-16

TV Has Arrived on The Web

Perm url with updates: http://xahlee.org/w/TV_arrived_on_the_web.html

TV Has Arrived on The Web

Xah Lee, 2011-08-16

it is disgusting.

y'know how these days there's lots of web2.0 show on YouTube? like, they talk about latest gadgets, latest web stuff such as g+, and Apple stuff, and the like.

i watched a couple this week, from links in g+ circles. These shows, are becoming disgusting. At first, about 5 years back, it started with so-called podcast, then videocast but they so-call it webcast (these web fashion idiots with their jargons). So, back then, it was relatively new. They are done by creative people. Though i haven't used them much, but from a couple i've listened to or watched, they are pretty interesting, precisely because they are from non-commercial, creative individuals, who's says are usually interesting.

But today, when you watch these on youtube, they are just like from TV. With, intro commercials, sparkling 3D logo flying about, 10 secs of spiffy roll telling you who “brought them by”. Attractive chicks as hosts. Scripted text from the lips of these attractive chicks. Spontaneous music with drumbeats.

Fuck. Fuck it

i stopped watching TV since 2000. It looks like, TV has arrived on YouTube.

Google Webmaster Advices Hurt Quality Writers

Google Webmaster Advices Hurt Quality Writers

Google's Matt Cutts just put out a new Google Webmaster Video:

“Underscores vs dashes in URLs”

this really sucks, and is a prime example how google giving SEO advice really hurts real content creators who are not familiar with SEO.

because google is doing this, as shown in this video, it gives companies and spammers who has lots of money and time to fine tune their website for ranking higher, while vast majority of others high quality content producers, e.g. professors who blogs occasionally, won't know nor care about these things. So, their high quality writing went down.

this annoys me personally too by the logic of the choice. I write for the web since 1996, and have a domain since 2000. I am a programer. When writing for website, i choose underscore _ as space separators for file names, not dash. Because, when forced between these 2 choices, underscore is a better choice to stand for space because dash has significance in english words. For example, look at these file names:

  • “seashell/pink-mouthed_murex”
  • “emacs/emacs_kill-ring.html”
  • “emacs_lisp_make-citation.html”
  • “ms_keyboard/f-lock_key_problem.html”
  • “blog_past_2011-01.html”
  • “ClassicalMusic_dir/midi/chopin/etude/Op25_dir/ch_25-04.mid”

you can see that hyphen has distinct meanings. Info would be lost if you replace them all with hyphen, or just underscore.

but mostly i'm just concerned that Google giving SEO advices is really a major force shaping all these little formating, wording, title placements, link placement, etc, most of which has really nothing to do with quality of content. And because Google gives these SEO advices and encourages people to follow their guide, it worsens the SEO/spammer war. Those original, quality, writers who are not web2.0 hipsters just fall by the wayside.

See also: Why Does Google Give SEO Advice?.

What is Google Plus? (humor)

Perm url with updates: http://xahlee.org/funny/whats_is_googleplus.html

What is Google Plus? (humor)

Xah Lee, 2011-08-16

xkcd googleplus
xkcd comics on Google+

Google+ video starring Ashley Pitman

the Google+ Song

Here's the lyrics.

Check my email, got an invite
To a website I don't know
Looked like googlebuzz at first sight,
But my friend said that's a no.
Why did we need, another social network?
Doesn't Facebook work alright?
A new thing for me to learn
This could take all night, Look out! 

What is this google +
I don't Need google +
There's another +1 and another +1
What is this google +
Hey, Why's this red thing here?
Go away google +
Sophie's choice ⇒ a 1979 novel written by William Styron which depicts a mother at wit's end faced with a forced decision in which any and all options have equally negative outcomes. Sophie's Choice (novel).
You want me to put my friends in circles
But circles are for squares
Everday's like Sophie's choice
Trying to choose which friend goes where
But No Parents, or Ex-boyfriends...
Can get in without invites
And when I drunkenly post that he's cute
I can edit it later that night - whoa

I kinda like google +
Can't believe I'm on google +
Oh another hangout, and another hang out
I dig my google +
Hey, old high school friend
You can't join my google +

Friends add me, without me adding them
That feature's really nice
Randos
Uglies
my pot dealer
and all these friend's I've never liked
No one knows, my circle names,
So even good friend's get handpicked
Do you make it into my main feed
or do I add you to my circle of pricks

I'm in love with google plus
I'm judgemental on goggle +
and Another douchebag and another loser
You didn't make my google plus
Hey, Now I'm a facist pig
Thank you google plus!

If you need a g+ invite, click on this link: https://plus.google.com/_/notifications/ngemlink?path=%2F%3Fgpinv%3DGgVtaJi7mSY%3ACjIEwxuY0CA

2011-08-15

Emacs Lisp: Writing a make-citation Command

Perm url with updates: http://xahlee.org/emacs/elisp_make-citation.html

Emacs Lisp: Writing a make-citation Command

Xah Lee, 2011-08-15

This page shows you how to write a emacs lisp command that transforms a text block under cursor into a specific citation format.

Problem

Summary

Write a elisp command so that when called, and if cursor is somewhere in a text like this:

Defective C++
By Yossi Kreinin
2007
http://yosefk.com/c++fqa/defective.html

It becomes this:

<cite>Defective C++</cite> (2007) By Yossi Kreinin. @ <a class="sorc" href="http://yosefk.com/c++fqa/defective.html" title="accessed:2011-08-15">Source yosefk.com</a>

Detail

I write many blogs. When i make a link, i like to also include the article title, author, date. This would help solving the link rot problem. (when a link is dead, at least the reader still knows the title, author, date.) For example, here's a typical link:

<a href="http://yosefk.com/c++fqa/defective.html">http://yosefk.com/c++fqa/defective.html</a>

I would like it to be like this:

<cite>Defective C++</cite> (2007) By Yossi Kreinin. @ <a class="sorc" href="http://yosefk.com/c++fqa/defective.html" title="accessed:2011-08-15">Source yosefk.com</a>

With proper CSS, it is rendered in browsers like this:

Defective C++ (2007) By Yossi Kreinin. @ Source yosefk.com

It is quite tedious to get the title, author, date, from a site. But once i got these info manually, i can automate the part of formatting. So, i start with this text:

Defective C++
By Yossi Kreinin
2007
http://yosefk.com/c++fqa/defective.html

Then, pressing a button, the text will be transformed to the desired format.

Solution

Here's the outline of steps:

  • Get the input text. (get their boundary positions)
  • Split the input text by line break.
  • Process each line into proper format.
  • Delete the input text.
  • Insert new next.

Here's the code:

(defun make-citation ()
  "Reformat current text block or selection into a canonical citation format.

For example, place cursor somewhere in the following block:

Circus Maximalist
By PAUL GRAY
Monday, Sep. 12, 1994
http://www.time.com/time/magazine/article/0,9171,981408,00.html

After execution, the lines will become

<cite>Circus Maximalist</cite> (1994-09-12) By Paul Gray. @ <a href=\"http://www.time.com/time/magazine/article/0,9171,981408,00.html\">Source www.time.com</a>

If there's a text selection, use it for input, otherwise the input is a text block between empty lines."
  (interactive)
  (let (bds p1 p2 ξmeat mylist ξtitle ξauthor ξdate ξurl )

    (setq bds (get-selection-or-unit 'block))
    (setq ξmeat (elt bds 0) )
    (setq p1 (elt bds 1) )
    (setq p2 (elt bds 2) )

    (setq mylist (split-string ξmeat " *\n *" t) )

    (setq ξtitle (elt mylist 0))
    (setq ξauthor (elt mylist 1))
    (setq ξdate (elt mylist 2))
    (setq ξurl (elt mylist 3))

    (setq ξauthor (replace-regexp-in-string "\\. " " " ξauthor)) ; remove period in Initals 
    (setq ξauthor (replace-regexp-in-string "By +" "" ξauthor))
    (setq ξauthor (upcase-initials (downcase ξauthor)))
    (setq ξdate (fix-timestamp-string ξdate))

    (setq ξurl (with-temp-buffer (insert ξurl) (source-linkify) (buffer-string)))

    (delete-region p1 p2 )
    (insert (concat "<cite>" ξtitle "</cite>") " " "(" ξdate ")"  " By " ξauthor ". @ " ξurl)
    ))

The code is pretty simple. Grabbing the text is done by:

    (setq bds (get-selection-or-unit 'block))
    (setq ξmeat (elt bds 0) )
    (setq p1 (elt bds 1) )
    (setq p2 (elt bds 2) )

The “get-selection-or-unit” is my custom function as a replacement for elisp's “thing-at-point”. It returns a vector [‹text› ‹begin boundary› ‹end boundary›]. (See: Emacs Lisp: Using thing-at-point for detail.)

Then, we split the text into lines:

    (setq mylist (split-string ξmeat " *\n *" t) )

    (setq ξtitle (elt mylist 0))
    (setq ξauthor (elt mylist 1))
    (setq ξdate (elt mylist 2))
    (setq ξurl (elt mylist 3))

process each line:

    (setq ξauthor (replace-regexp-in-string "\\. " " " ξauthor)) ; remove period in Initals 
    (setq ξauthor (replace-regexp-in-string "By +" "" ξauthor))
    (setq ξauthor (upcase-initials (downcase ξauthor))) ; some site has author name in all caps
    (setq ξdate (fix-timestamp-string ξdate)) ; transform the date format to yyyy-mm-dd

The “fix-timestamp-string” transforms arbitrary datetime format into a canonical form yyyy-mm-dd. (ISO 8601) For examples:

  • Sat, 23 Jul 2011 08:13:51 +01002011-07-23
  • Jul 23, 20112011-07-23
  • 7/23/20112011-07-23

Try to write that yourself. I'll post a solution in 2 days.

Now, we change the url into a link:

    (setq ξurl (with-temp-buffer (insert ξurl) (source-linkify) (buffer-string)))

The “source-linkify” is a command i wrote to change url to a link into a special format for my own blogs. For example, it changes this:

http://yosefk.com/c++fqa/defective.html

into this:

<a class="sorc" href="http://yosefk.com/c++fqa/defective.html" title="accessed:2011-08-16">Source yosefk.com</a>

For detail of “source-linkify”, see: Emacs Lisp: Writing a url-linkify Command.

Finally, the code deletes the input text, and insert the new:

    (delete-region p1 p2 )
    (insert (concat "<cite>" ξtitle "</cite>") " " "(" ξdate ")"  " By " ξauthor ". @ " ξurl)

The weird ξ you see in my elisp code is Greek x. I use unicode char in variable name for experimental purposes. You can just ignore it. (See: Programing Style: Variable Naming: English Words Considered Harmful.)

Emacs Lisp is fantastic!