Using Unicode in HTML Attributes

Perm url with updates:

Using Unicode in HTML Attributes

Xah Lee, 2010-12-18, 2011-01-23

Discovered that you can use unicode in your html tag attribute values. Here's a sample html:

<title>Unicode in HTML Tag Attributes</title>
p.α {color:red}

<p class="α">yay!</p>


In the above, notice the greek alpha α character, used as attribute value.

Here's the page you can see the above source code rendered: Sample Page of Unicode in HTML Tag Attributes.

This works in all latest versions of Firefox, Internet Explorer 8, Google Chrome, Safari, Opera, on Windows. (as of 2010-12)

You can use any other unicode, including various bullets symbols, math symbols. For a sample list of unicode chars, see: Sample Unicode Characters.

If you use emacs, you can enter unicode chars easily. See: Emacs Math Symbols Input Mode (xmsi-mode)Emacs and Unicode Tips.

ID's Value Cannot Contain Unicode

However, ID's value must not contain unicode. It can be letters A to z, 0 to 9, and -_:.. It cannot contain space and cannot start with a number.

How is it Useful?

This could useful to reduce file size and reduce attribute value space jam, especially in html generating codes. (e.g. concent management system's engines)

For example, here's a source code of OCaml language.

(* array examples *)
let x = [| 2; 8; 3 |];;
print_int x.(1);;
x.(1) <- 9;;
let x = Array.make 9 4;;

The following is the syntax colored version:

(* array examples *)
let x = [| 2; 8; 3 |];;
print_int x.(1);;
x.(1) <- 9;;
let x = Array.make 9 4;;

The following is the html source code for it:

<span class="comment">(* array examples *)</span>
<span class="tuareg-font-lock-governing">let</span> <span class="variable-name">x </span><span class="tuareg-font-lock-operator">=</span> <span class="tuareg-font-lock-operator">[|</span> 2<span class="tuareg-font-lock-operator">;</span> 8<span class="tuareg-font-lock-operator">;</span> 3 <span class="tuareg-font-lock-operator">|];;</span>
print_int x.<span class="tuareg-font-lock-operator">(</span>1<span class="tuareg-font-lock-operator">);;</span>
x.<span class="tuareg-font-lock-operator">(</span>1<span class="tuareg-font-lock-operator">)</span> <span class="tuareg-font-lock-operator">&lt;-</span> 9<span class="tuareg-font-lock-operator">;;</span>
<span class="tuareg-font-lock-governing">let</span> <span class="variable-name">x </span><span class="tuareg-font-lock-operator">=</span> <span class="type">Array</span>.make 9 4<span class="tuareg-font-lock-operator">;;</span>

See how verbose it is? For each token in the OCaml lang, it is wrapped by a span tag with a particular class name. Each of these class name can be replaced by a short unicode char, but remain unique, meaningful, and doesn't pollute your class value space for normal use. For example:


<span class="tuareg-font-lock-operator">…</span>
<span class="variable-name">…</span>
<span class="string">…</span>


<span class="♠o">…</span> <!-- for operator -->
<span class="♠v">…</span> <!-- for variable -->
<span class="♠s">…</span> <!-- for string -->

Here, we used the spade symbol ♠ for all class values that is used for syntax coloring. Effectively created our own namespace.

For a example of how verbose it can become, see: Emacs nxml-mode Fontification Changes.

If you use emacs, you might be interested in: Using Emacs To Syntax Color Source Code In HTML.

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs