2013-05-11

is XML Syntax Regular?

Even XML, whose syntax is more regular than lisp, cannot escape irregularities.

Here's a sample valid XML, from ATOM webfeed.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://xahlee.info/comp/">

  <title>…</title>
  <subtitle>…</subtitle>
  <link rel="self" href="blog.xml"/>
  <link rel="alternate" href="blog.html"/>
  <updated>…</updated>
  <author>
    <name>…</name>
    <uri>…</uri>
  </author>
  <id>…</id>
  <icon>…</icon>
  <rights>…</rights>

  <entry>
    <title>…</title>
    <id>…</id>
    <updated>…</updated>
    <summary>…</summary>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">
        <p><a href="…">…</a></p>
        <p>…</p>
        <p>…</p>
      </div>
    </content>
    <link rel="alternate" href="…"/>
  </entry>

</feed>

Can you spot the syntax irregularity?

for moar, see Programing Language Design: Syntax Sugar Problem: Irregularity vs Convenience

emacs: convert Unicode chars to ASCII (Zap Gremlins)

Perm URL with updates: http://ergoemacs.org/emacs/emacs_zap_gremlins.html

This page shows a emacs lisp command that changes Unicode string into ASCII. For example “passé” becomes “passe”, “voilà” becomes “voila”.

Emacs Lisp Solution

Here's a solution.

(defun asciify-text (ξstring &optional ξfrom ξto)
"Change some Unicode characters into equivalent ASCII ones.
For example, “passé” becomes “passe”.

This function works on chars in European languages, and does not transcode arbitrary Unicode chars (such as Greek, math symbols).  Un-transformed unicode char remains in the string.

When called interactively, work on text selection or current block.

When called in lisp code, if ξfrom is nil, returns a changed string, else, change text in the region between positions ξfrom ξto."
  (interactive
   (if (region-active-p)
       (list nil (region-beginning) (region-end))
     (let ((bds (bounds-of-thing-at-point 'paragraph)) )
       (list nil (car bds) (cdr bds)) ) ) )

  (require 'xfrp_find_replace_pairs)

  (let (workOnStringP
        inputStr
        (charChangeMap [
                        ["á\\|à\\|â\\|ä\\|ã\\|å" "a"]
                        ["é\\|è\\|ê\\|ë" "e"]
                        ["í\\|ì\\|î\\|ï" "i"]
                        ["ó\\|ò\\|ô\\|ö\\|õ\\|ø" "o"]
                        ["ú\\|ù\\|û\\|ü"     "u"]
                        ["Ý\\|ý\\|ÿ"     "y"]
                        ["ñ" "n"]
                        ["ç" "c"]
                        ["ð" "d"]
                        ["þ" "th"]
                        ["ß" "ss"]
                        ["æ" "ae"]
                        ])
        )
    (setq workOnStringP (if ξfrom nil t))
    (setq inputStr (if workOnStringP ξstring (buffer-substring-no-properties ξfrom ξto)))
    (if workOnStringP
        (let ((case-fold-search t)) (replace-regexp-pairs-in-string inputStr charChangeMap) )
      (let ((case-fold-search t)) (replace-regexp-pairs-region ξfrom ξto charChangeMap) )) ) )

You'll need xfrp_find_replace_pairs.el

TODO

Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”
Code originally by Teemu Likonen."
  (with-temp-buffer
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes just accents.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

Source groups.google.com

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .

Emacs: Change {Round, Square, Curly} Brackets in Text

when you have data in different languages, sometimes you need to convert round brackets to square brackets, or curly brackets.

When you need to do that, it's often tedious. You have to do replacement twice, once for the left bracket, then right bracket. Here's a command that helps:

(defun change-bracket-pairs (fromType toType)
  "Change bracket pairs from one type to another on text selection or text block.
For example, change all parenthesis () to square brackets [].

In lisp code, fromType is a string of a bracket pair. ⁖ \"()\", likewise for toType."
  (interactive
   (let (
         (bracketTypes '("[]" "()" "{}" "〈〉" "《》" "「」" "『』" "【】" "〖〗"))
         )
     (list
      (ido-completing-read "Replace this:" bracketTypes "PREDICATE" )
      (ido-completing-read "To:" bracketTypes "PREDICATE" ) ) ) )

  (let* (
         (bds (get-selection-or-unit 'block))
         (p1 (elt bds 1))
         (p2 (elt bds 2))
         (changePairs (vector
                 (vector (char-to-string (elt fromType 0)) (char-to-string (elt toType 0)))
                 (vector (char-to-string (elt fromType 1)) (char-to-string (elt toType 1)))
                 ))
         )
    (replace-pairs-region p1 p2 changePairs) ) )

you'll need get-selection-or-unit and xfrp_find_replace_pairs.el

Perl: Sort List, Matrix, Object

Perm URL with updates: http://xahlee.info/perl/perl_sort.html

This page shows you how to sort in Perl

here's a example of sort (Perl 5.14):

#-*- coding: utf-8 -*-
# perl

# sort a list

@li = (1,9,2,3);

@li2 = sort {$a <=> $b} @li; # original list is not changed

print join(' ', @li2); # 1 2 3 9

In Perl, sort is a function. It returns the sorted result as another list.

“sort” takes the form sort {…} @myList. Inside the enclosing braces is the body of the ordering function, where variables 「$a」 and 「$b」 inside are predefined by the language to represent two elements in the list. The operator <=> returns -1 if left operand is less than the right operand. If equal, it returns 0, else 1. It is equivalent to Python's “cmp” function.

Another form of sort is sort orderFunctionName @list, which uses a function name in place of the comparison block. The function should have 2 parameters, and return one of {-1, 0, 1}.

Compare as Number or as String

Perl has 2 comparison operators.

  • ‹x› <=> ‹y› compare ‹x› ‹y› as numbers.
  • ‹x› cmp ‹y› compare ‹x› ‹y› as strings.

Example:

# -*- coding: utf-8 -*-
# perl

print "3" <=> "20"; # prints -1

print "\n";

print "3" cmp "20"; # prints 1

In Perl, numbers and strings are mutually automatically converted if needed.

Sort Matrix

# -*- coding: utf-8 -*-
# perl

# sort a matrix

use Data::Dumper;
use strict;

my @li1 = ([2,6],[1,3],[5,4]);

my @li2 = sort { $a->[1] <=> $b->[1] } @li1;

print Dumper(\@li2);            #  [[1, 3], [5, 4], [2, 6]]

The ([2,6],[1,3],[5,4]) is the syntax for nested list. The square brackets inside creates array references.

The $a->[1] is the syntax to get the element of a array reference.

The \@li2 in Dumper(\@li2) gets the reference to the array @li2.

Reverse Sort Order

To reverse sort order, all you have to do is to reverse the placement of $a and $b in your comparison. Example: sort {$b <=> $a} @li

Or, you can use the reverse function afterward, if you don't mind doing extra computation.

# -*- coding: utf-8 -*-
# perl

use Data::Dumper;

@aa = (3, 4, 5);

@bb = reverse(@aa);

print Dumper(\@bb);

Sort Complex Objects

Here's a more complex example of sort.

Suppose you have a list of strings.

'my283.jpg'
'my23i.jpg'
'web7-s.jpg'
'fris88large.jpg'
…

You want to sort them by the number embedded in them.

You need to define a ordering function, and pass it to sort. The function should takes two strings, and compare the integer inside the string. Here's the solution:

#-*- coding: utf-8 -*-
#  perl

@li = (
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
);

# sorts a list of strings by their embedded number

@li2 = sort { ($a =~ m/(\d+)/)[0] <=> ($b =~ m/(\d+)/)[0]} @li;

print join(' ', @li2);  # prints web7-s.jpg my23i.jpg fris88large.jpg my283.jpg

decorate-sort-dedecorate, Schwartzian transform

Normally, the key for comparison is computed 2 or more times for each element.

Here's a more efficient way, called decorate-sort-dedecorate (aka Schwartzian transform).

# -*- coding: utf-8 -*-
# perl

# sort a array of string, by comparing the number part inside the string

@li = ('my283.jpg','my23i.jpg','web7-s.jpg','fris88large.jpg');

# this is “decorate-sort-dedecorate”, aka Schwartzian transform
@li2 = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, ($_=~m/(\d+)/)[0] ] } @li;
#          ↑ take item               ↑ sort            ↑ create list of pairs [item,key]

use Data::Dumper;
print Dumper(\@li2); # ('web7-s.jpg', 'my23i.jpg', 'fris88large.jpg', 'my283.jpg')

In the above Perl code:

  • the map { [ $_, ($_=~m/(\d+)/)[0] ] } @li; generates a temp array. Each element is a pair, item & key.
  • Then, sort is applied to the temp array.
  • Then, another map map { $_->[0] } … gets the items of the original list.

In this way, the cost to compute the same key multiple times is avoided. This method is good when computing the key is expensive.

perldoc -f sort

General Function to Sort Matrix

See: Perl: General Function for Sorting Matrix

2013-05-10

Syntax = Most Important Aspect of a Programing Language

the quality of a programing language, can be judged by how much of it can be explained by its syntax alone.

with this criterion, the order are roughly: Mathematica ≻ PHP ≻ Lisp ≻ JavaScript ≻ Ruby? ≻ Perl ≻ Python ≻ Java.

If you have coded one of {Haskell, ML/OCaml/F#, erlang, Pascal/Ada, Lua, tcl, PostScript}, i'd be interested in your opinion on their placement in the above. (you should have coded in the lang for a few years)

Python 3: Sort List, Matrix, Object

Perm URL with updates: http://xahlee.info/python/python3_sort.html

This page shows you how to sort in Python

sort Method

You can use the “sort” method. For example:

# -*- coding: utf-8 -*-
# python 3

li = [1,9,2,3]

li.sort() # the variable is modified

print(li) # [1, 2, 3, 9]

sort Function

You can use the “sorted” function. This does not modify the variable. For example:

# -*- coding: utf-8 -*-
# python 3

li = [1,9,2,3]

li2 = sorted(li)

print(li)                       # [1, 9, 2, 3]
print(li2)                      # [1, 2, 3, 9]

sort by Column/Key

You can sort by specifying a optional parameter “key”. This is most useful for sorting a matrix.

# -*- coding: utf-8 -*-
# python 3

# sort a matrix

li = [[2,6],[1,3],[5,4]]

li.sort(key=lambda x:x[1])

print(li);                      # prints [[1, 3], [5, 4], [2, 6]]

Sort and Reverse

Another optional parameter is “reverse”. You can use it like this

# -*- coding: utf-8 -*-
# python 3

# sort a matrix, by 2nd column, reverse order

li = [[2,6],[1,3],[5,4]]

li.sort(key=lambda x:x[1], reverse=True)

print(li);                      # prints [[2, 6], [5, 4], [1, 3]]

Sort Arbitrary Object

Here's a more complex example. Suppose you have a list of strings.

'my283.jpg'
'my23i.jpg'
'web7-s.jpg'
'fris88large.jpg'
…

You want to sort them by the number embedded in them.

You need to define a ordering function, and pass it to sort. The function should takes two strings, and compare the integer inside the string. Here's the solution:

# -*- coding: utf-8 -*-
# python 3

# sort by custom order

import re

li = [
"my283.jpg",
"my23i.jpg",
"web7-s.jpg",
"fris88large.jpg",
]

# compare number inside string
def myKey (myString):
    return float(re.findall(r"\d+", myString)[0]) # return number part in string

li.sort(key = myKey)

print(li) # returns ["web7-s.jpg", "my23i.jpg", "fris88large.jpg", "my283.jpg"]

Here, we defined a function “myKey” to tell sort about the key to use.

Programing Language Design: Syntax Sugar Problem: Irregularity vs Convenience

Perm URL with updates: http://xahlee.info/comp/syntax_irregularity_vs_convenience.html

one of the idiocy of HTML spec is that the “pre” tag discards the first blank line.

for example, if you have:

<pre style="border:solid thin red">
x = 3
</pre>

Here's how your browser renders it:

x = 3

The first blank line is ignored. However, only the FIRST blank line is ignored. If you have 2 blank lines in the beginning, it'll be rendered with 1 blank line.

<pre style="border:solid thin red">

x = 3
</pre>

x = 3

They do this, because, it's convenient for coder. Because, one likes to see the pre content aligned to the left in raw HTML.

For example, you rather write it this way:

<pre>
1
2
3
</pre>

than

<pre>1
2
3</pre>

this is a idiocy because it mixes convenience with syntax.

The problem comes, when you have programs that deal with code. That's why, in programing, computing tech, there are one hundred exceptions, irregularities, and thus bugs, headaches. The worst offender is unix shell syntax. 〔☛ Unix Shell Syntax Irregularities Galore

At first, syntax conveniences like these are nice. The rules are lax, and you use it without problems. But then, once the language grew, and you deal with many languages, you find everywhere there's exceptions, special rules, and you can't remember what rule they thought were convenient at the time, and there is no simple systematic rule about them. Each one becomes a ad hoc syntax soup of hell.

For example of the bad consequence of the “pre” tag, see: CSS “pre” Problem: No Linebreak After Tag. And syntax coloring tools that color computer program source code in HTML, have to work-around the problem by wrapping “span” tag with line-breaks at unnatural places. 〔☛ Emacs Lisp: Syntax Color Source Code in HTML

Almost all languages have this problem, to various degrees. C language syntax is worst. It is basically of no design. Most of the syntax “design” is based on user's typing convenience at the time. 〔☛ Programing: Why I Hate C〕 Even lisp, didn't escape this problem. 〔☛ Programing Language: Fundamental Problems of Lisp

Another major problem of HTML irregularity is letting users to omit ending tags. Big offender is Google, telling users to omit ending tags in their HTML style guide. The consequence is that people will omit ending tags that cannot be ommited, and we are back to syntax-soup quirk-mode hell. See:

How to Solve the Syntax Sugar Problem?

This problem should be solved by clear separation of issues. For example, XML takes the regularity approach, and you can have editors that represent the data to the user in a most easy-to-read format, or structural editors. Another approach is Mathematica, where you have a systematic syntax layer. So, at the bottom layer, it's purely nested like XML and LISP, but without irregularities, and another layer on top, that supports all the syntax warts we human have got used to, as in traditional math notation and infix notation. Yet, there's a simple, regular, systematic, transformation rules that can change these two layers easily.

Instead of syntax sugar, you should have a 100% regular syntax, or a layer with systematic rule, and let editor deal with it, and present code to user in a different layer.

See also:

Emacs: How Do You Insert Current Date?

here's how i do it.

(defun insert-date (&optional addTimeStamp-p)
  "Insert current date and or time.

• In this format yyyy-mm-dd.
• When called with `universal-argument', insert date and time, e.g. 2012-05-28T07:06:23-07:00
• Replaces text selection.

See also `current-date-time-string'."
  (interactive "P")
  (when (region-active-p) (delete-region (region-beginning) (region-end) ) )
  (cond
   ((equal addTimeStamp-p nil ) (insert (format-time-string "%Y-%m-%d")))
   (t (insert (current-date-time-string))) ) )

(defun current-date-time-string ()
  "Returns current date-time string in full ISO 8601 format.
Example: 「2012-04-05T21:08:24-07:00」.

Note, for the time zone offset, both the formats 「hhmm」 and 「hh:mm」 are valid ISO 8601. However, Atom Webfeed spec seems to require 「hh:mm」."
  (concat
   (format-time-string "%Y-%m-%dT%T")
   ((lambda (ξx) (format "%s:%s" (substring ξx 0 3) (substring ξx 3 5))) (format-time-string "%z")) )
  )

emacs: xah-html-mode, improved xhm-make-citation

much improved “xhm-make-citation”. Now, the order of lines for {title, url, author, date} doesn't matter. Get it in Emacs: Xah HTML Mode

to learn how to write it, see Emacs Lisp: Writing a make-citation Command

2013-05-09

keyboard: one thousand function keys

A new function keys keyboard manufacture. See: http://www.access-is.com/custom_keyboards.php

that's nice if you are the master of function keys. See also:

thx to David Rogoff

Logic Write Style: the Incongruousness of the Word “Actually”

Perm URL with updates: http://wordyenglish.com/lit/the_word_actually.html

Here's the inline doc of “assoc” from GNU Emacs 24.3.1:

assoc is a built-in function in `C source code'.

(assoc KEY LIST)

Return non-nil if KEY is `equal' to the car of an element of LIST.
The value is actually the first element of LIST whose car equals KEY.

note the word “actually”.

the word “actually” is often used for emphasis purposes. It means “in fact”. It came from “actual” + “ly”.

actually «early 15c., “in fact, in reality” (as opposed to in possibility), from actual + -ly (2). Meaning “actively, vigorously” is from mid-15c.; that of “at this time, at present” is from 1660s. As an intensive added to a statement and suggesting “as a matter of fact, really, in truth” it is attested from 1762.»

when writing in a logical style, the use of “actually” is a oddity, incongruous, or redundant.

Linux: Convert HTML to PDF

to convert HTML to PDF on Linux, you can use wkhtmltopdf. It's based on webkit, the web browser engine used by Safari an Google Chrome.

# install
sudo apt-get install wkhtmltopdf
wkhtmltopdf my_resume.html my_resume.pdf

if you just have one single file, you can also use libreoffice.

# install libreoffice
sudo apt-get install libreoffice

Type libreoffice to start it, then, open the HTML file, then use menu 〖File ▸ Export…〗.

2013-05-08

video: Emacs, Shell, Abbrev, and ELISP Power to Bear!

here's a video version of today tutorial.

Emacs, Shell, Abbrev, and ELISP Power to Bear!

code at Emacs, Shell, Abbrev, and ELISP Power to Bear!

Emacs, Shell, Abbrev, and ELISP Power to Bear!

Perm URL with updates: http://ergoemacs.org/misc/emacs_abbrev_shell_elisp.html

Using Abbrev for Shell Commands

You can define abbrev for frequently used shell commands. For example, i type “3rs” and it expands to

rsync -z -r -v -t --chmod=Dugo+x --chmod=ugo+r --delete --exclude='*~' --exclude='.bash_history' --exclude='logs/'  --rsh='ssh -l u89150' ~/web/ u89150@s72750.example.com:~/

Is emacs abbrev better than bash alias?

Emacs abbrev is better than bash alias, because you see the full command.

what about using shell 【Ctrl+r】 back search feature?

That's great, but for frequently used command, alias or abbrev is better, because you get EXACTLY the command you want.

With back search, you might have modified a command, and you have to eye-ball to be sure it's the command you want. Bash alias or emacs abbrev are muscle memory.

for how to set abbrev, see: Using Emacs Abbrev Mode for Abbreviation

Advantage of Using Shell Inside Emacs

using shell inside emacs is much superior than using a terminal. You have all lines logged by default, can be saved if you want, anytime. You can edit or copy/paste easily any past lines. And you have full power of emacs to edit any line, or jump into any file path (ffap)

for how to use shell inside emacs, see Emacs Shell Tutorial (Bash, cmd.exe, PowerShell)

Elisp Power Come to Bear

now, for the more advanced and esoteric emacs users.

What's even better, is elisp.

I have many shell command abbrevs. For example:

;; shell commands
("3ditto" "ditto -ck --sequesterRsrc --keepParent src dest")
("3im" "convert -quality 85% ")
("3ims" "convert -size  -quality 85% ")
("3im256" "convert +dither -colors 256 ")
("3imf" "find . -name \"*png\" | xargs -l -i basename \"{}\" \".png\" | xargs -l -i  convert -quality 85% \"{}.png\" \"{}.jpg\"")

("3f0" "find . -type f -empty")
("3f00" "find . -type f -size 0 -exec rm {} ';'")
("3chmod" "find . -type f -exec chmod 644 {} ';'")
("3chmod2" "find . -type d -exec chmod 755 {} ';'")

("3unison" "unison -servercmd /usr/bin/unison c:/Users/xah/web ssh://xah@example.com//Users/xah/web")
("3sftp" "sftp xah@xahlee.org")
("3ssh" "ssh xah@xahlee.org")
("3rsync" "rsync -z -r -v -t --exclude=\"*~\" --exclude=\".DS_Store\" --exclude=\".bash_history\" --exclude=\"**/xx_xahlee_info/*\"  --exclude=\"*/_curves_robert_yates/*.png\" --exclude=\"logs/*\"  --exclude=\"xlogs/*\" --delete --rsh=\"ssh -l xah\" ~/web/ xah@example.com:~/")

("3rsync2" "rsync -r -v -t --delete --rsh=\"ssh -l xah\" ~/web/ xah@example.com:~/web/")
("3rsync3" "rsync -r -v -t --delete --exclude=\"**/My *\" --rsh=\"ssh -l xah\" ~/Documents/ xah@example.com:~/Documents/")

the problem is, some of them are not frequently used, and i forgot what is the abbrev. For example, i have many command that convert image in various ways. They starts with “3im”. But i forgot the one i wanted. So, i have to open my abbrev file to see. 〔☛ Using Emacs's Bookmark Feature

would it be great, if all your abbrevs has a ido-like interface? so that you can see them as you type? 〔☛ Emacs: iswitch vs ido mode〕 〔☛ Emacs: Name Completion Features & Packages〕 〔☛ Emacs: List/Switch Buffers

So, now i wrote a command, like this:

(defcustom xah-shell-abbrev-alist nil "alist of xah's shell abbrevs")
(setq xah-shell-abbrev-alist
          '(
            ("rsync1" . "rsync -z -r -v -t --chmod=Dugo+x --chmod=ugo+r --delete --exclude='*~' --exclude='.bash_history' --exclude='logs/'  --rsh='ssh -l u80781' ~/web/ u80781@s30097.example.com:~/")

            ("ssh" . "ssh -l u80781 xahlee.org ")
            ("img1" . "convert -quality 85% ")
            ("imgScale" . "convert -scale 50% -quality 85% ")
            ("img256" . "convert +dither -colors 256 ")
            ("imgBatch" . "find . -name \"*png\" | xargs -l -i basename \"{}\" \".png\" | xargs -l -i  convert -quality 85% \"{}.png\" \"{}.jpg\"")
            ("img-bmp2png" . "find . -name \"*bmp\" | xargs -l -i basename \"{}\" \".bmp\" | xargs -l -i  convert \"{}.bmp\" \"{}.png\"")

            ("grep" . "grep -r -F 'xxx' --include='*html' ~/web")

            ("rm_empty" . "find . -type f -empty")
            ("chmod_file" . "find . -type f -exec chmod 644 {} ';'")
            ("rm~" . "find . -name \"*~\" -exec rm {} ';'")
            ("findEmptyDir" . "find . -depth -empty -type d")
            ("rmEmptyDir" . "find . -depth -empty -type d -exec rmdir {} ';'")
            ("chmod2" . "find . -type d -exec chmod 755 {} ';'")
            ("lynx" . "lynx -dump -assume_local_charset=utf-8 -display_charset=utf-8 -width=100")
            ("vp" . "feh --randomize --recursive --auto-zoom --action \"gvfs-trash '%f'\" --geometry 1600x1000 ~/Pictures/cinse_pixra3/ &")
            )

          )

(defun xah-shell-commands (cmdAbbrev)
  "insert shell command from a selection prompt."
  (interactive
   (list
      (ido-completing-read "shell abbrevs:" (mapcar (lambda (x) (car x)) xah-shell-abbrev-alist) "PREDICATE" "REQUIRE-MATCH") ) )
  (progn
    (insert (cdr (assoc cmdAbbrev xah-shell-abbrev-alist)))
    ))

i give a easy key for this command. When i need a shell command, i press a key, ido comes up showing me all the abbrevs. Then, i type 2 or more keys and press Enter ↵ and my command is inserted. Nice!

be warned, that this will take some used to. Like keybinding, you have to kick habit, and that may not be easy.

2013-05-07

unix linux shell uniq unicode bug

Perm URL with updates: http://xahlee.info/comp/unix_uniq_unicode_bug.html

Here's a bug of unix/linux GNU shellutil uniq.

Create a file of the following text:

═
═
═
║
║
║
╒
╓
╔
╕
╖
╗
╘
╙
╚
╛
╜
╝
╞
╟
╠
╡
╢
╣
╤
╥
╦
╧
╨
╩
╪
╫
╬

save it as 〔unicode.txt〕, then do cat unicode.txt | unicq -c. You get “33 ═”. Idiotic unix.

◆ uniq --version
uniq (GNU coreutils) 8.13
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later .
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Richard M. Stallman and David MacKenzie.

The man page doesn't mention anything about Unicode. Here's my locale setting anyhow.

◆ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

i think, since about 2005, unix utils are in frenzed patch trying to be Unicode compatible. It looks like, the state is still shi�tty.

A related problem is grep. 〔☛ Problems of Calling Unix grep in Emacs〕 I thought the problem is between the complexity of emacs+cygwin+layer+environment variable. But now i know, it's unix!

am not sure how many unix utils still have Unicode problem.

see also: Complexity & Tedium of Software Engineering

2013-05-06

Functional Programing Meta Language (ML) in Emacs Lisp!

discovered that emacs has a bundled library for functional programing pattern matching! The package file is 〔pcase.el〕. It's part of GNU Emacs 24.3.1.

it's written by Stefan Monnier, a professor of functional programing research, and one of the two current leader of emacs dev.

you can get to the file by calling describe-function then pcase-let, then click on the file name.

Note: ML stands for Meta Language. It was a family of languages. Current popular decendents include OCaml and Microsoft's FSharp. 〔☛ Xah's OCaml Tutorial〕 Among functional programing languages, OCaml is one of those that's heavily used in the industry (⁖ Mldonkey, Unison 〔☛ Unison Tutorial〕), especially in math formal proof systems ⁖ Coq. And it's famously used in JaneStreet. 〔☛ OCaml Use in Industry: Janestreet Talk by Yaron Minsky 📺

Proof systems written in OCaml includes: Coq, HOL Light. 〔☛ State of Theorem Proving Systems 2008

Also, the designer of ML is Robin Milner (1934 〜 2010), he died in recent years.

Writing grep/sed in Python, Perl, Emacs Lisp

also updated. Python: Find/Replace by Regex Text Pattern

for a Perl version: Perl: Find/Replace on Multiple Files

emacs lisp version: How to Write grep in Emacs Lisp

2013-05-05

emacs: xah-html-mode

Perm URL with updates: http://ergoemacs.org/emacs/xah-html-mode.html

This is the home page of xah-html-mode, a emacs major mode for HTML5.

How's it diff from default HTML mode or other HTML modes?

The basic idea of this mode is simple keyword based coloring. Just color keywords, that's all, no fancy syntax parsing. (this also means you'll get wrong colors in text.)

The idea of simple keyword coloring is that, if a word is colored in a particular way, you know for sure it is a keyword in one of {HTML, CSS, JavaScript, PHP, …}, and you can tell if it is {type, class, var, function, property} by the coloring (most of the time). It lets you easily recognize typos too, because it won't be colored.

that's the basic idea.

This mode currently alpha software. It is result of my several years of manual coding HTML. All commands have fairly complete and correct inline doc. But there's no keybinding as of now. You'll have to set them yourself.

some command names might also change in the future, also behavior. ALPHA!

Current Features

  • HTML5 tag names are colored. (⁖ p, span, div, b, i, …)
  • HTML5 attribute names are colored. (⁖ class, id, style, title, width, height, …)
  • CSS property names are colored. (⁖ color, font-family, border, position, width, …)
  • CSS unit names are colored. (⁖ px, em, ex, %, …)
  • CSS color names are colored. (⁖ red, yellow, aqua, aquamarine, …)
  • Curly quoted text are colored (as well as strings). (⁖ “curly”, "string")

• tag insertion cammands. They wrap a tag around text selection. If there's no selection, then decided smartly on word/line/block. When current selection or position is empty, it'll place your cursor in between the inserted tag. The major command is xhm-wrap-html-tag, xhm-wrap-url, xhm-wrap-p-tag.

• Convert text to table or reverse. xhm-make-html-table, xhm-make-html-table-undo.

• Convert lines to list xhm-lines-to-html-list.

• Command to colorize computer language code. xhm-pre-source-code, xhm-htmlize-or-de-precode, xhm-get-precode-make-new-file

• Remove HTML tags: xhm-remove-html-tags, xhm-remove-span-tag-region

• Extract URL in a text selection. xhm-extract-url.

• Htmlize keyboard shortcuts notation xhm-htmlize-keyboard-shortcut-notation

• Replace region text to HTML entities or Unicode equivalent. xhm-replace-html-chars-to-unicode, xhm-replace-html-chars-to-entities

• Updating title and h1 tags of current file. xhm-update-title

• Change inline image tag and image file name. xhm-rename-html-inline-image

Todo

here's major features am working on

lets you navigate/delete tags. Similar to sgml-skip-tag-forward and sgml-delete-tag, but hopefully better.

real-time syntax coloring of nesting tags. Similar to show-paren-mode.

some type of sematic unit editing (similar to paredit mode for elisp). The idea is that, you always edit by tag units, so that your tags are never mismatched.

robust handling of comments. Right now i'm using https://code.google.com/p/ergoemacs/source/browse/packages/xah-comment.el, which is another alpha software.

Possibly adding JavaScript and PHP keywords, so the mode could become a general mode for web dev.

normal problem of multi-mode is mostly avoided, because this package doesn't really try to do any syntax checking. I'm thinking this approach might be better in practice.

Download

download here: https://code.google.com/p/ergoemacs/source/browse/packages/xah-html-mode.el

Want this mode to grow? Voice your support. 〈Emacs: new major modes for HTML, CSS, PHP, ELISP, and Lean Emacs LISP Manual〉 @ http://pledgie.com/campaigns/19973

also checkout Emacs: Xah CSS Mode. Much simpler.