dehtmlize source code in emacs lisp

perm url:

added elisp code to dehtmilze a block of htmlized source code.

DeHtmlize Text


When a source code in a html file is htmlized, it is usually unreadible. Suppose you want to modify the source code presented in html. Usually, you view it in a browser, then copy the source code. Then create a new buffer, paste the code, to edit it. When done, you copy the newly edited text, close temp buffer, delete the htmilzed version in your html file, paste the new in, then htmlize it again. This process is painful.

It would be nice, if you can press a button, then the htmlized source code in your html will become plain. So you can modify it. Press a button again to have it htmlized again.

Here are 2 elisp code to dehtmlize. The dehtmilze-region will dehtmilze a selected region. The dehtmlize-block will dehtmlize code inside a pre block of the form “<pre class="langName">”.

(defun dehtmlize-block ()
  "Delete span tags inside a <pre> region.
For example, if the cursor somewhere inside the tag:

<pre class=\"code\">

after calling, the “codeXYZ...” block of text's span tags will be removed.
dehtmlize-block in the reverse of htmlize-block."
  (let (mycode tag-begin code-begin code-end tag-end mymode)
      (setq tag-begin (re-search-backward "<pre class=\"\\([A-z-]+\\)\""))
      (setq code-begin (re-search-forward ">"))
      (re-search-forward "</pre>")
      (setq code-end (re-search-backward "<"))
      (setq tag-end (re-search-forward "</pre>"))

    (let (myStr)
      (setq myStr (buffer-substring code-begin code-end))
      (setq myStr (replace-regexp-in-string "<span class=\"[^\"]+\">" "" myStr))
      (setq myStr (replace-regexp-in-string "</span>" "" myStr))
      (setq myStr (replace-regexp-in-string "&amp;" "&" myStr))
      (setq myStr (replace-regexp-in-string "&lt;" "<" myStr))
      (setq myStr (replace-regexp-in-string "&gt;" ">" myStr))
      (delete-region code-begin code-end)
      (goto-char code-begin)
      (insert myStr)
(defun dehtmlize-span-region (p1 p2)
  "Delete HTML “span” tags on a region.
Note: only certain span tags are deleted."
  (interactive "r")

  (let (mystr)
    (setq mystr (buffer-substring p1 p2))

    (setq mystr
            (insert mystr)
            (goto-char (point-min))
            (while (search-forward-regexp "<span class=\"[^\"]+\">" nil t) (replace-match ""))

            (goto-char (point-min))
            (while (search-forward "</span>" nil t) (replace-match ""))

            (goto-char (point-min))
            (while (search-forward "&amp;" nil t) (replace-match "&"))

            (goto-char (point-min))
            (while (search-forward "&lt;" nil t) (replace-match "<"))

            (goto-char (point-min))
            (while (search-forward "&gt;" nil t) (replace-match ">"))

    (delete-region p1 p2)
    (insert mystr))

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs