Emacs Lisp Batch Processing: Grep Find Replace Variations

Perm url with updates: http://xahlee.org/emacs/elisp_report_string_position.html

Emacs Lisp Batch Processing: Grep Find Replace Variations

Xah Lee, 2011-03-21

This page shows emacs lisp scripts that do variations of grep/find/replace string, and is applied to few thousand files. For example, report the position of a given string, replace a HTML page's “H1” tag text from its “TITLE” tag text. If you don't know elisp, first take a look at Emacs Lisp Basics.

Problem: Report String Position

I need to know if a particular string happens in beginning of file or near the end. Ι need to know this for about 5k files in a dir.

Solution

;; -*- coding: utf-8 -*-
;; 2011-03-21
;; report the position (line number) of a occurances of string, of a given dir

(setq inputDir "~/web/xahlee_org/" )

;; add a ending slash if not there
(when (not (string= "/" (substring inputDir -1) ))
  (setq inputDir (concat inputDir "/") )
  )

(defun my-process-file (fpath)
  "process the file at fullpath fpath ..."
  (let (mybuffer (ii 0) searchStr)

    (when (not (string-match "/xx" fpath))

      (setq mybuffer (get-buffer-create " myTemp"))
      (set-buffer mybuffer)
      (insert-file-contents fpath nil nil nil t)

      (setq case-fold-search nil) ; NOTE: remember to set case sensitivity here

      (setq searchStr "<div class=\"amz728x90\">" )

      (goto-char 1)
      (while (search-forward searchStr nil t) ; NOTE: for regex, use re-search-forward
          (princ (format "this many: %d %s\n" (line-number-at-pos (point)) fpath))
        )
      
      (kill-buffer mybuffer)
      )
    ))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah occur output*" )
  (with-output-to-temp-buffer outputBuffer 
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
  (princ "Done deal!")
    )
  )

You can modify the “inputDir” and “searchStr” above and test it on your own machine.

For explanation of this code, see: How to Write grep in Emacs Lisp.

Problem 2: Fix HTML “TITLE” & “H1” Tags

Today, while i working on my website, i noticed some html files are missing a “H1” header tag. While in another directory, i wish to replace all “TITLE” tag content by the one from “H1” tag.

So, i need a script that fix these tag's texts.

Solution

Here's a function that gets a file “title” tag text. I wrote this about a year ago.

(defun get-html-file-title (fname)
"Return FNAME <title> tag's text.
Assumes that the file contains the string
“<title>...</title>”."
 (let (x1 x2 linkText)

   (with-temp-buffer
     (goto-char 1)
     (insert-file-contents fname nil nil nil t)

     (setq x1 (search-forward "<title>"))
     (search-forward "</title>")
     (setq x2 (search-backward "<"))
     (buffer-substring-no-properties x1 x2)
     )
   ))

I also need to get the “H1” tag text. So i just quickly did a copy-paste coding:

(defun get-html-file-h1-text (fname)
  "Return FNAME <h1> tag's text.
Assumes that the file contains the string
“<h1>...</h1>”."
  (let (x1 x2 linkText)

    (with-temp-buffer
      (goto-char 1)
      (insert-file-contents fname nil nil nil t)

      (setq x1 (search-forward "<h1>"))
      (search-forward "</h1>")
      (setq x2 (search-backward "<"))
      (buffer-substring-no-properties x1 x2)
      )
    ))

It's not efficient to open file twice to get “title” and “h1” texts, but that's ok, because my whole script will finish running in a few seconds anyway and this is just one-time use.

Now, here's the code i wrote quickly to fix the tags:

;; -*- coding: utf-8 -*-
;; 2011-03-20
;; change title to h1 tag's text in “Time Machine” pages
;; 
;; for each html page in 〔~/web/xahlee_org/p/time_machine/〕
;; if the title tag and h1 tag text differ, make the title use h1's text

(setq inputDir "~/web/xahlee_org/p/time_machine/" ) ; dir must end with a slash

(defun my-process-file (fpath)
  "process the file at fullpath fpath ..."
  (let ( titleText h1Text p1 p2)

    (setq h1Text (get-html-file-h1-text fpath))
    (setq titleText (get-html-file-title fpath))

    (if (equal h1Text titleText)
        nil
      (progn 
        (find-file fpath )
        (goto-char 1)
        (search-forward "<title>" )
        (setq p1 (point) )

        (search-forward "</title>" )
        (backward-char 8)
        (setq p2 (point) )

        (delete-region p1 p2 )
        (insert h1Text)
        (print fpath)
        ))
    ))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*process time machine output*" )
  (with-output-to-temp-buffer outputBuffer 
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (princ "Done deal!")
    )
  )

Again, all the above script are variations of find/replace. For code detail, see: How to Write grep in Emacs Lisp and Emacs Lisp: Find String Inside HTML Tag.

In this script, i didn't include code to save the changed file. This way, i can do some manual verification after the script has run. When i want them all saved, i just go to ibuffer and type 3 keys 【* u S】 to have all of them saved, and 【D y】 closes them all.

What might you use this script for in your work?

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs