Emacs Lisp: Writing a Date Time String Parsing Function

Perm url with updates: http://xahlee.org/emacs/elisp_parse_time.html

Emacs Lisp: Writing a Date Time String Parsing Function

Xah Lee, 2011-09-02

This page shows a example of writing a emacs lisp function that parses a date time string.

The Problem

Write a elisp function. The function will take a string argument that's any of common date time format, e.g.

  • 2011-09-02T05:29:26-07:00 (ISO 8601)
  • 2011-09-02 (ISO 8601)
  • Fri, 2 Sep 2011 11:14:11 +0200 (unixy)
  • 09/02/2011 (USA)
  • Sep 2, 2011
  • 2 Sep, 2011
  • 2 September, 2011

and output a canonical form 2011-09-02.

Solution

If you've worked with elisp for a while, or from a web search, you'll know there's a time parsing function “parse-time-string”, from the file 〔parse-time.el〕, with feature name 'parse-time (that is, you call (require 'parse-time) to load it). (See: Emacs Lisp's Library System: What's require, load, load-file, autoload, feature?.)

Here's its inline doc:

parse-time-string is a compiled Lisp function in `parse-time.el'.

(parse-time-string STRING)

Parse the time-string STRING into (SEC MIN HOUR DAY MON YEAR DOW DST TZ).
The values are identical to those of `decode-time', but any values that are
unknown are returned as nil.

However, a little test shows that this function doesn't parse some common date formats. In particular, it doesn't understand ISO 8601 nor USA custom of mm/dd/yyyy.

;; testing for supported formats for “parse-time-string”
;; As of 2011-08-15 GNU Emacs 23.2.1

(require 'parse-time)

;; unixy formats
(parse-time-string "Date: Mon, 01 Aug 2011 12:24:51 -0400") ; yes
(parse-time-string "Local: Mon, Aug 1 2011 9:24 am")        ; yes

(parse-time-string "2007, August 1")                        ; yes
(parse-time-string "August 1, 2007")                        ; yes
(parse-time-string "august 1, 2007")                        ; yes. Lowercase ok.
(parse-time-string "August 1st, 2007")                      ; no. The date is nil.
(parse-time-string "aug 1, 2007")                           ; yes. Month abbr OK.
(parse-time-string "1 aug, 2007")                           ; yes

(parse-time-string "8/1/2007")     ; no. Takes the 8 as date, 1 as nil
(parse-time-string "08/01/2007")   ; no. Takes the 8 as date, 1 as nil
(parse-time-string "8,1,2007")     ; no
(parse-time-string "2007-08-01")   ; yes
(parse-time-string "2007")         ; yes
(parse-time-string "2007-08")      ; no
(parse-time-string "2011-08-01")   ; yes
(parse-time-string "2011-08-01T11:55:37-07:00") ; no. got nothing

For me, i need it to understand the USA customary format 8/1/2007 interpreted as month/day/year. Ι also need it to understand formats such as August 1st, 2007. And i also need it to understand ISO 8601 format such as yyyy-mm, yyyy-mm-dd, yyyy-mm-ddThh:mm:ss-07:00.

The simplest solution is just do a regex match on the form. I don't need the time info, so it makes the problem slightly simpler. Here's my code:

(defun fix-timestamp-string (dateStr)
  "Returns yyyy-mm-dd format of timeStr

For examples:
 「Nov. 28, 1994」 ⇒ 「1994-11-28」
 「November 28, 1994」 ⇒ 「1994-11-28」
 「11/28/1994」 ⇒ 「1994-11-28」

Any “day of week”, or “time” info, or any other parts of the string, are discarded.

Code detail: URL `http://xahlee.org/emacs/elisp_parse_time.html'"
  (let (dateList ξyear ξmonth ξdate yyyy mm dd)
    (require 'parse-time)

    (setq dateStr (replace-regexp-in-string "^ *\\(.+\\) *$" "\\1" dateStr)) ; remove white spaces

    (cond

     ;; USA convention of mm/dd/yyyy
     ((string-match "^\\([0-9][0-9]\\)/\\([0-9][0-9]\\)/\\([0-9][0-9][0-9][0-9]\\)$" dateStr)
      (concat (match-string 3 dateStr) "-" (match-string 1 dateStr) "-" (match-string 2 dateStr))
      )
     ((string-match "^\\([0-9]\\)/\\([0-9][0-9]\\)/\\([0-9][0-9][0-9][0-9]\\)$" dateStr)
      (concat (match-string 3 dateStr) "-" (match-string 1 dateStr) "-" (match-string 2 dateStr))
      )

     ;; some ISO 8601. yyyy-mm-dd
     ((string-match "^\\([0-9][0-9][0-9][0-9]\\)-\\([0-9][0-9]\\)-\\([0-9][0-9]\\)$T[0-9][0-9]:[0-9][0-9]" dateStr)
      (concat (match-string 1 dateStr) "-" (match-string 2 dateStr) "-" (match-string 3 dateStr))
      )
     ((string-match "^\\([0-9][0-9][0-9][0-9]\\)-\\([0-9][0-9]\\)-\\([0-9][0-9]\\)$" dateStr)
      (concat (match-string 1 dateStr) "-" (match-string 2 dateStr) "-" (match-string 3 dateStr))
      )
     ((string-match "^\\([0-9][0-9][0-9][0-9]\\)-\\([0-9][0-9]\\)$" dateStr)
      (concat (match-string 1 dateStr) "-" (match-string 2 dateStr))
      )
     ((string-match "^\\([0-9][0-9][0-9][0-9]\\)$" dateStr)
      (match-string 1 dateStr)
      )

     ;; else
     (t
      (progn
        (setq dateStr (replace-regexp-in-string "January " "Jan. " dateStr))
        (setq dateStr (replace-regexp-in-string "February " "Feb. " dateStr))
        (setq dateStr (replace-regexp-in-string "March " "Mar. " dateStr))
        (setq dateStr (replace-regexp-in-string "April " "Apr. " dateStr))
        (setq dateStr (replace-regexp-in-string "May " "May. " dateStr))
        (setq dateStr (replace-regexp-in-string "June " "Jun. " dateStr))
        (setq dateStr (replace-regexp-in-string "July " "Jul. " dateStr))
        (setq dateStr (replace-regexp-in-string "August " "Aug. " dateStr))
        (setq dateStr (replace-regexp-in-string "September " "Sep. " dateStr))
        (setq dateStr (replace-regexp-in-string "October " "Oct. " dateStr))
        (setq dateStr (replace-regexp-in-string "November " "Nov. " dateStr))
        (setq dateStr (replace-regexp-in-string "December " "Dec. " dateStr))

        (setq dateStr (replace-regexp-in-string " 1st," " 1" dateStr))
        (setq dateStr (replace-regexp-in-string " 2nd," " 2" dateStr))
        (setq dateStr (replace-regexp-in-string " 3rd," " 3" dateStr))
        (setq dateStr (replace-regexp-in-string "\\([0-9]\\)th," "\\1" dateStr))

        (setq dateStr (replace-regexp-in-string " 1st " " 1 " dateStr))
        (setq dateStr (replace-regexp-in-string " 2nd " " 2 " dateStr))
        (setq dateStr (replace-regexp-in-string " 3rd " " 3 " dateStr))
        (setq dateStr (replace-regexp-in-string "\\([0-9]\\)th " "\\1 " dateStr))

        (setq dateList (parse-time-string dateStr))
        (setq ξyear (nth 5 dateList))
        (setq ξmonth (nth 4 dateList))
        (setq ξdate (nth 3 dateList))

        (setq yyyy (number-to-string ξyear))
        (setq mm (if ξmonth (format "%02d" ξmonth) "" ) )
        (setq dd (if ξdate (format "%02d" ξdate) "" ) )
        (concat yyyy "-" mm "-" dd)

        ) ) ) ))

This code is easy to understand. The function takes a string, and returns a string.

The whole code is just one giant multi-branch conditional test, known in other languages as “case” or “switch”. Elisp conditional takes this form:

(cond
 (TEST1 BODY)
 (TEST2 BODY)
 …
 )

Each of the TEST is either true (not “nil”) or false (“nil”). Emacs will go thru them in sequence. The first test that's non-nil, its body will be executed, then exit the conditional.

In my code, the first few tests are regex match of forms like nn/nn/nnnn where each “n” is a digit. When any of these match, then basically i got what i want, and the code exists. Here's one example:

 ;; USA convention of mm/dd/yyyy
 ((string-match "^\\([0-9][0-9]\\)/\\([0-9][0-9]\\)/\\([0-9][0-9][0-9][0-9]\\)$" dateStr)
  (concat (match-string 3 dateStr) "-" (match-string 1 dateStr) "-" (match-string 2 dateStr))
  )

When none of these match, then it goes to the end of the test (t BODY), where the “t” there is always true, and run a giant BODY. In the BODY, first i replace each full spelling of month names by their abbrev using “replace-regexp-in-string”, e.g.

(setq dateStr (replace-regexp-in-string "January " "Jan. " dateStr))

This is done because in emacs 22, the “parse-time-string” doesn't understand fully spelled month names. (this has been fixed.)

Then, i also replace {1st, 2nd, nth} etc by {1, 2, n}. Then, i simply feed it to “parse-time-string” and get a parsed date time as a list. After that, just extract the elements from the list and reformat the way i want.

Now, remember that my function takes a string and returns a string. It is not a interactive command. What i actually want is a interactive command, so that i can press a button, then the date on the current line will be transformed to the format i want. Here's the interactive command wrapper, which calls my “fix-timestamp-string” function to work:

(defun fix-timestamp ()
  "Change timestamp under cursor into a yyyy-mm-dd format.
If there's a text selection, use that as input, else use current line.
All other text in input are discarded.
For example:
TUESDAY, FEB 15, 2011 05:16 ET
becomes
2011-02-15
.
See `fix-timestamp-string' for detail."
  (interactive)
  (let (bds p3 p4 inputstr)
    (setq bds (get-selection-or-unit 'line))
    (setq inputstr (elt bds 0) )
    (setq p3 (elt bds 1) )
    (setq p4 (elt bds 2) )
    (delete-region p3 p4)
    (insert (fix-timestamp-string inputstr)) ))

The “get-selection-or-unit” is my custom function as replacement for “thing-at-point” function. See: Emacs Lisp: get-selection-or-unit.

The weird ξ you see in my elisp code is Greek x. I use unicode char in variable name for experimental purposes. You can just ignore it. (See: Programing Style: Variable Naming: English Words Considered Harmful.)

I ♥ Emacs.

Popular posts from this blog

Browser User Agent Strings 2012

11 Years of Writing About Emacs

does md5 creates more randomness?