Perm url with updates: http://xahlee.org/emacs/elisp_parse_time.html
Emacs Lisp: Writing a Date Time String Parsing Function
Xah Lee, 2011-09-02
This page shows a example of writing a emacs lisp function that parses a date time string.
The Problem
Write a elisp function. The function will take a string argument that's any of common date time format, e.g.
2011-09-02T05:29:26-07:00 (ISO 8601)
2011-09-02 (ISO 8601)
Fri, 2 Sep 2011 11:14:11 +0200 (unixy)
09/02/2011 (USA)
Sep 2, 2011
2 Sep, 2011
2 September, 2011
and output a canonical form 2011-09-02.
Solution
If you've worked with elisp for a while,
or from a web search, you'll know there's a time parsing function “parse-time-string”, from the file 〔parse-time.el〕, with feature name 'parse-time (that is, you call (require 'parse-time) to load it).
(See: Emacs Lisp's Library System: What's require, load, load-file, autoload, feature?.)
Here's its inline doc:
parse-time-string is a compiled Lisp function in `parse-time.el'.
(parse-time-string STRING)
Parse the time-string STRING into (SEC MIN HOUR DAY MON YEAR DOW DST TZ).
The values are identical to those of `decode-time', but any values that are
unknown are returned as nil.
However, a little test shows that this function doesn't parse some common date formats. In particular, it doesn't understand ISO 8601 nor USA custom of mm/dd/yyyy.
(require 'parse-time)
(parse-time-string "Date: Mon, 01 Aug 2011 12:24:51 -0400") (parse-time-string "Local: Mon, Aug 1 2011 9:24 am")
(parse-time-string "2007, August 1") (parse-time-string "August 1, 2007") (parse-time-string "august 1, 2007") (parse-time-string "August 1st, 2007") (parse-time-string "aug 1, 2007") (parse-time-string "1 aug, 2007")
(parse-time-string "8/1/2007") (parse-time-string "08/01/2007") (parse-time-string "8,1,2007") (parse-time-string "2007-08-01") (parse-time-string "2007") (parse-time-string "2007-08") (parse-time-string "2011-08-01") (parse-time-string "2011-08-01T11:55:37-07:00")
For me, i need it to understand the USA customary format 8/1/2007 interpreted as month/day/year. Ι also need it to understand formats such as August 1st, 2007. And i also need it to understand ISO 8601 format such as
yyyy-mm,
yyyy-mm-dd,
yyyy-mm-ddThh:mm:ss-07:00.
The simplest solution is just do a regex match on the form.
I don't need the time info, so it makes the problem slightly simpler. Here's my code:
(defun fix-timestamp-string (dateStr)
"Returns yyyy-mm-dd format of timeStr
For examples:
「Nov. 28, 1994」 ⇒ 「1994-11-28」
「November 28, 1994」 ⇒ 「1994-11-28」
「11/28/1994」 ⇒ 「1994-11-28」
Any “day of week”, or “time” info, or any other parts of the string, are discarded.
Code detail: URL `http://xahlee.org/emacs/elisp_parse_time.html'"
(let (dateList ξyear ξmonth ξdate yyyy mm dd)
(require 'parse-time)
(setq dateStr (replace-regexp-in-string "^ *\\(.+\\) *$" "\\1" dateStr))
(cond
((string-match "^\\([0-9][0-9]\\)/\\([0-9][0-9]\\)/\\([0-9][0-9][0-9][0-9]\\)$" dateStr)
(concat (match-string 3 dateStr) "-" (match-string 1 dateStr) "-" (match-string 2 dateStr))
)
((string-match "^\\([0-9]\\)/\\([0-9][0-9]\\)/\\([0-9][0-9][0-9][0-9]\\)$" dateStr)
(concat (match-string 3 dateStr) "-" (match-string 1 dateStr) "-" (match-string 2 dateStr))
)
((string-match "^\\([0-9][0-9][0-9][0-9]\\)-\\([0-9][0-9]\\)-\\([0-9][0-9]\\)$T[0-9][0-9]:[0-9][0-9]" dateStr)
(concat (match-string 1 dateStr) "-" (match-string 2 dateStr) "-" (match-string 3 dateStr))
)
((string-match "^\\([0-9][0-9][0-9][0-9]\\)-\\([0-9][0-9]\\)-\\([0-9][0-9]\\)$" dateStr)
(concat (match-string 1 dateStr) "-" (match-string 2 dateStr) "-" (match-string 3 dateStr))
)
((string-match "^\\([0-9][0-9][0-9][0-9]\\)-\\([0-9][0-9]\\)$" dateStr)
(concat (match-string 1 dateStr) "-" (match-string 2 dateStr))
)
((string-match "^\\([0-9][0-9][0-9][0-9]\\)$" dateStr)
(match-string 1 dateStr)
)
(t
(progn
(setq dateStr (replace-regexp-in-string "January " "Jan. " dateStr))
(setq dateStr (replace-regexp-in-string "February " "Feb. " dateStr))
(setq dateStr (replace-regexp-in-string "March " "Mar. " dateStr))
(setq dateStr (replace-regexp-in-string "April " "Apr. " dateStr))
(setq dateStr (replace-regexp-in-string "May " "May. " dateStr))
(setq dateStr (replace-regexp-in-string "June " "Jun. " dateStr))
(setq dateStr (replace-regexp-in-string "July " "Jul. " dateStr))
(setq dateStr (replace-regexp-in-string "August " "Aug. " dateStr))
(setq dateStr (replace-regexp-in-string "September " "Sep. " dateStr))
(setq dateStr (replace-regexp-in-string "October " "Oct. " dateStr))
(setq dateStr (replace-regexp-in-string "November " "Nov. " dateStr))
(setq dateStr (replace-regexp-in-string "December " "Dec. " dateStr))
(setq dateStr (replace-regexp-in-string " 1st," " 1" dateStr))
(setq dateStr (replace-regexp-in-string " 2nd," " 2" dateStr))
(setq dateStr (replace-regexp-in-string " 3rd," " 3" dateStr))
(setq dateStr (replace-regexp-in-string "\\([0-9]\\)th," "\\1" dateStr))
(setq dateStr (replace-regexp-in-string " 1st " " 1 " dateStr))
(setq dateStr (replace-regexp-in-string " 2nd " " 2 " dateStr))
(setq dateStr (replace-regexp-in-string " 3rd " " 3 " dateStr))
(setq dateStr (replace-regexp-in-string "\\([0-9]\\)th " "\\1 " dateStr))
(setq dateList (parse-time-string dateStr))
(setq ξyear (nth 5 dateList))
(setq ξmonth (nth 4 dateList))
(setq ξdate (nth 3 dateList))
(setq yyyy (number-to-string ξyear))
(setq mm (if ξmonth (format "%02d" ξmonth) "" ) )
(setq dd (if ξdate (format "%02d" ξdate) "" ) )
(concat yyyy "-" mm "-" dd)
) ) ) ))
This code is easy to understand. The function takes a string, and returns a string.
The whole code is just one giant multi-branch conditional test, known in other languages as “case” or “switch”. Elisp conditional takes this form:
(cond
(TEST1 BODY)
(TEST2 BODY)
…
)
Each of the TEST is either true (not “nil”) or false (“nil”). Emacs will go thru them in sequence. The first test that's non-nil, its body will be executed, then exit the conditional.
In my code, the first few tests are regex match of forms like nn/nn/nnnn where each “n” is a digit. When any of these match, then basically i got what i want, and the code exists. Here's one example:
((string-match "^\\([0-9][0-9]\\)/\\([0-9][0-9]\\)/\\([0-9][0-9][0-9][0-9]\\)$" dateStr)
(concat (match-string 3 dateStr) "-" (match-string 1 dateStr) "-" (match-string 2 dateStr))
)
When none of these match, then it goes to the end of the test (t BODY), where the “t” there is always true, and run a giant BODY. In the BODY, first i replace each full spelling of month names by their abbrev using “replace-regexp-in-string”, e.g.
(setq dateStr (replace-regexp-in-string "January " "Jan. " dateStr))
This is done because in emacs 22, the “parse-time-string” doesn't understand fully spelled month names. (this has been fixed.)
Then, i also replace {1st, 2nd, nth} etc by {1, 2, n}. Then, i simply feed it to “parse-time-string” and get a parsed date time as a list. After that, just extract the elements from the list and reformat the way i want.
Now, remember that my function takes a string and returns a string. It is not a interactive command. What i actually want is a interactive command, so that i can press a button, then the date on the current line will be transformed to the format i want. Here's the interactive command wrapper, which calls my “fix-timestamp-string” function to work:
(defun fix-timestamp ()
"Change timestamp under cursor into a yyyy-mm-dd format.
If there's a text selection, use that as input, else use current line.
All other text in input are discarded.
For example:
TUESDAY, FEB 15, 2011 05:16 ET
becomes
2011-02-15
.
See `fix-timestamp-string' for detail."
(interactive)
(let (bds p3 p4 inputstr)
(setq bds (get-selection-or-unit 'line))
(setq inputstr (elt bds 0) )
(setq p3 (elt bds 1) )
(setq p4 (elt bds 2) )
(delete-region p3 p4)
(insert (fix-timestamp-string inputstr)) ))
The “get-selection-or-unit” is my custom function as replacement for “thing-at-point” function. See: Emacs Lisp: get-selection-or-unit.
The weird ξ you see in my elisp code is Greek x. I use unicode char in variable name for experimental purposes. You can just ignore it. (See: Programing Style: Variable Naming: English Words Considered Harmful.)
I ♥ Emacs.