Emacs Lisp: writing a url-linkify

Perm url with updates: http://xahlee.org/emacs/elisp_html-linkify.html

Emacs Lisp: writing a url-linkify

Xah Lee, 2010-12-03

This page is a little lisp tutorial. A example on writing a function that transform the text under cursor on the fly. If you are not familiar with elisp, see: Emacs Lisp Basics.

Problem

I need to write a elisp command, so that, when pressing a button, the url under cursor, such as:

http://some.example.com/xyz.html

becomes this:

<a class="sorc" href="http://some.example.com/xyz.html"
title="accessed:2010-12-03">Source some.example.com</a>

And pressing another button, the link become this:

<a class="sorcdd" href="#" 
title="accessed:2010-12-03; defunct:2010-12-03; http://some.example.com/xyz.html">Source some.example.com</a>

Detail

In writing blogs, often you need to cite links. The links may be other blogs, news sites, or some random site. Many such url are ephemeral. They exst today, but may be a dead link few months later. Typically, if the url doesn't have a domain, but is hosted blog service site, it is more likely to go bad sooner.

For me, i write many blogs on xahlee.org, so have hundreds of links. When you update your pages years later, you find dead links like 〔http://someRandomBlog.org/importantToday.html〕, and may not remember what that link is about. No author, no title, no idea when that link was active or become dead. Sometimes, link is still good but the domain name owner of the link has changed, so the linked page may become porn site or been bought by domain squatters.

One partial solution is to add access date together with the link.

<p>blab blab news!
See <a href="http://some.example.com/xyz.html">here</a>!
(Accessed on 2010-12-03)</p>

With a access date, at least you know when the link was good. If the link went bad, you or your readers can at least try to see the link thru web archive site such as Internet Archive.

This way, at least you know when the link was good. However, this requires manual insertion of the date. It would be better, if the access date is somehow embedded in the link in some uniform format. HTML4 or even html5 does not have a way to embed access date. So, i decided to use the “title” attribute, like this:

<a class="sorc" href="..." title="accessed:2010-12-03">...</a>

This is not a ideal solution, because the “title” attribute is supposed to be title, not a date stamp. But in practice, i decided it's ok for me to adopt this solution.

I prefer this embedded access date approach because otherwise adding the access date besides the link text is distracting. If you have a paragraph with 3 or more links, each one says “accessed on ...”, that's very annoying.

When later on if i found a link is dead, i can press a button, and emacs will change the link to this format:

<a class="sorcdd" href="#" title="accessed:2010-12-03; defunct:2010-12-03; http://some.example.com/xyz.html">Source some.example.com</a>

Notice that the class value has changed from “sorc” to “sorcdd”. With proper css, the link will be shown as crossed out. Like this: Source some.example.com.

A uniform format to embed accessed date is good. Because, if later on HTML6 or other HTML Microformat has a way to add access date to links, i can easily write a script that change all my thousands of external links to the new format.

Solution

Here's the code:

(defun source-linkify ()
  "Make url at cursor point into a html link.
If there's a text selection, use the text selection as input.

Example: http://example.com/xyz.htm
becomes
<a class=\"sorc\" href=\"http://example.com/xyz.htm\" title=\"accessed:2008-12-25\">Source example.com</a>"
  (interactive)
  (let (url resultLinkStr bds p1 p2 domainName)

    ;; get the boundary of url or text selection
    (if (region-active-p)
        (setq bds (list (region-beginning) (region-end))  )
      (setq bds (bounds-of-thing-at-point 'url))
      )

    ;; set url
    (setq p1 (car bds))
    (setq p2 (cdr bds))
    (setq url (buffer-substring-no-properties p1 p2))

    ;; get the domainName
    (string-match "://\\([^\/]+?\\)/" url)
    (setq domainName  (match-string 1 url))

    (setq url (replace-regexp-in-string "&" "&amp;" url))
    (setq resultLinkStr
          (concat "<a class=\"sorc\" href=\"" url "\""
                  " title=\"accessed:" (format-time-string "%Y-%m-%d")
                  "\""
                  ">" 
                  "Source " domainName
                  "</a>"))

    ;; delete url and insert the link
    (progn (delete-region p1 p2))
    (insert resultLinkStr)))

The code is easy to understand. If you find it difficult, try reading this page Emacs Lisp: Writing a Wrap-URL Function, which has more explanation.

You can assign a hotkey for this command.

The following is the code to turn a link into a dead link format.

(defun defunct ()
  "Make the html link under cursor to a defunct form.
Example:
If cursor is on this line
<a class=\"sorc\" href=\"http://example.com/\" title=\"accessed:2008-12-26\">...</a>
 (and inside the opening tag.)

It becomes:
<a class=\"sorcdd\" href=\"#/\" title=\"accessed:2008-12-26; defunct:2008-12-26; http://example.com\">...</a>"
  (interactive)
  (let (p1 p2 linkStr url)
    (save-excursion

      ;; get the boundary of opening tag
      (search-backward "<a " ) (setq p1 (point) )
      (search-forward "\">") (setq p2 (point) )

      ;; get linkStr
      (setq linkStr (buffer-substring-no-properties p1 p2))

      ;; change the “class” attribute value
      (setq linkStr (replace-regexp-in-string "class=\"sorc\"" "class=\"sorcdd\"" linkStr t t)) 

      ;; (setq linkStr (replace-regexp-in-string "href=\"\\([^\"]+\\)\" +title=\"accessed:\\([^;]+\\)\""
      ;; (concat  "href=\"#\" title=\"accessed:\1; defunct:" (format-time-string "%Y-%m-%d") ";\2\"" )
      ;;  linkStr t t))

      ;; insert defunct date and url etc
      (with-temp-buffer
        (insert linkStr)
        (goto-char 1)
        (search-forward-regexp  "href=\"\\([^\"]+\\)\"")
        (setq url (match-string 1))
        (replace-match "href=\"#\"")
        (search-forward "\"")
        (search-forward "\"")
        (backward-char 1)
        (insert "; defunct:" (format-time-string "%Y-%m-%d") "; " url)
        (setq linkStr (buffer-string))))

    (delete-region p1 p2)
    (insert linkStr)))

Here's the css for the deadlink:

a.sorcdd:link:active, a.sorcdd:link:hover, a.sorcdd:visited:hover, a.sorcdd:visited, a.sorcdd:link
{color:black; cursor:text; text-decoration:line-through}

Popular posts from this blog

Browser User Agent Strings 2012

11 Years of Writing About Emacs

does md5 creates more randomness?