2011-05-08

Use of Unicode Matching Brackets as Specialized Delimiters

Perm url with updates: http://xahlee.org/comp/unicode_brackets_use.html

Syntax Design: Use of Unicode Matching Brackets as Specialized Delimiters

Xah Lee, 2011-05-08

In my tech blogs, often i give instructions involving the graphical menu. For example, i'd say: it's at the menu “File▸Open”. Today i decided to use a special delimiter to indicate menu. The delimiter is the unicode 〖WHITE LENTICULAR BRACKET〗. So, the menu would be written as 〖File▸Open〗. I just spend a couple hours changing all mentions of menu on my site to use the new delimiter.

Here's a summary of my usage of special unicode brackets:

  • ANGLE BRACKET. Article title. e.g. 〈Xah's Emacs Lisp Tutorial〉.
  • DOUBLE ANGLE BRACKET. Book title. e.g. 《Basic Economics》.
  • BLACK LENTICULAR BRACKET. Key combinations. e.g. 【Ctrl+c】.
  • WHITE LENTICULAR BRACKET. Menu. e.g. 〖File▸Open〗.
  • TORTOISE SHELL BRACKET. File names, path, url. e.g. 〔~/Documents/notes.txt〕.
  • CORNER BRACKET. Computer code, or math expression. e.g. 「x = 3;」.
  • ANGLE QUOTATION MARK. A variable for computer language syntax description. e.g. function ‹parameter name› = ‹expression›.
  • DOUBLE QUOTATION MARK. Generic delimiter. e.g. “something”.

Why Are These Brackets Choosen?

There are many other brackets in unicode. (See: Matching Brackets in Unicode.) I choose these brackets and my use of them carefully. The following are the reasons, in no particular order:

  • ① It must be a fairly common character, so that most browsers, editors, fonts, or other tools can display them.
  • ② The meaning i assigned to them must be compatible with the semantics given to the char in unicode.

All the brackets i've used are common ones. The “curly quote” and ‹angle quote› are widely used in western languages. The 〈〉《》【】〖〗「」〔〕 are used daily in Chinese and Japanese. (See: Intro to Chinese Punctuation with Computer Language Syntax Perspectives.) These languages are widely used in computing in China and Japan, and they are also widely supported even in non-Asian countries.

If a font or tool has any support for unicode, these brackets are probably among the top 100 or so symbols supported.

Are the Use of These Delimiters Necessary?

Are the Use of These Delimiters Necessary? Not all, but they provide meaningful info, as visual enhancement but especially for computer processing.

For example, once you realized that the lenticular bracket 【Ctrl+x】 is a marker for computer keyboard shortcut notation, users can easily recognize all keys on the page at a glance. For a sample article with these marks, see: How To Set Emacs's User Interface to Modern Conventions.

For another example, with these markers, i can easily write a program that extract all book titles, computer keys shortcuts mentioned, program menus, or code snippets from my website articles (of few thousand files). Without these markers, the problem is non-trivial.

Here's a example of the benefit of computer recognition: suppose in my Emacs Tutorial, i want to add interactive annotation for all emacs key shortcuts mentioned in the tutorial. (emacs has few hundred key shortcuts by default) When user hovers mouse over a emacs key shortcut on the article, it should have a pop-up box indicating the associated name of the command. When keys are marked with a specific delimiter for that purpose, such as 【Ctrl+x】, a program can trivially identify all of them.

What About Using HTML Markup Instead?

HTML markup is great. It serves the same purpose. I have dithered on whether to use HTML markup instead, or by special brackets in unicode, or a mixture of both. I've experimented with that over the past 2 years. Right now, i use a mixture of both.

Here's a sample html markup snippet:

Computer code: <span class="computer_code">x = 3;</span>
Keys: <span class="keyboard_shortcut">Ctrl+c</span>
Book Title: <span class="book_title">Emacs Tutorial</span>

Here's a CSS definition that automatically makes a text colored, and also inserts the brackets for display, for any text marked up with the “code” tag:

code{color:red;font-family:"DejaVu Sans Mono",monospace}
code:before,code:after{color:black;background-color:white}
code:before{content:"「"}
code:after{content:"」"}

The advantage of HTML markup is that it's a more elaborate system. For example, you can color the text, specify font, text size. You can add brackets if you want. The markup is also more precise. For example, <span class="book_title">…</span> unambiguously indicate that the enclosed text is a book title, while a text enclosed by bracket 《…》 could mean something else (just look at this page you are reading, where the text inside that bracket is not necessarily book title.)

The disadvantage is that it's much more verbose, and makes the raw source code much harder to read.

Right now, all my book titles, article titles, computer code snippet, are marked using HTML, and using CSS to add specialized brackets for visual clue.

A Finer Point: Are Delimiter Brackets Semantically Meaningful or Just for Visual Enhancement?

Suppose you use CSS. For example, a book title is wrapped up by html tag like this:

<span class="book_title">The Story Of My Life</span>

and here's CSS code to add color:

span.book_title{color:red}

You can also add brackets:

span.book_title:before{content:"《"}
span.book_title:after{content:"》"}

If you want the text to be colored, you must use CSS. However, you can add the bracket in the text without relying on CSS, like this:

《<span class="book_title">The Story Of My Life</span>》

The question for me was, should the bracket be part of the text or added by CSS? Which format should i choose?

The answer depends on whether the bracket is considered just a visual enhancement, or semantically meaningful. If it's just visual enhancement, then it should be part of CSS (cascading Style Sheet), as implied by the word “style” in its name. When CSS is off, readers won't see the bracket, and it doesn't matter. However, if the bracket is considered semantically meaningful, then it should not be in CSS. That way, doesn't matter whether CSS is on or off, you still see the bracket.

There are opposing views on whether the bracket should be in text or added by CSS.

① The brackets are semantically meaningful, thus should be part of text. For example, in Chinese, book titles are enclosed by angle brackets. They are semantically meaningful. It is not just a decoration. In the same way, western text involving matched pairs: “curly quotes”, «french quote», or various brackets (paren), [square bracket], {braces}, are almost always semantically meaningful. If you remove them, it effects the text in major ways.

② A bracket in a text when the text is already marked up, is redundant. Therefore, in this view, one should add the brackets by CSS and not in the text. Even though CSS is considered for appearances, but the fact is that appearances, layout, and semantics are often intertwined in various degree. Positioning (layout), sizes, often adds subtle but non-trivial semantics to a page. In practice, probably a significant percentage of web pages would become unreadable or its meaning effected if you turn off CSS, and as a fact, probably less than 0.01% pages are ever read without CSS. The bottom line of this reasoning is that, if you use HTML/CSS tech bundle, then you shouldn't add the bracket in the text, because it's already precisely marked up. Add the bracket by CSS.

Right now i haven't decided which is “better”. More precisely, i think one way might be better than the other, if a more precise goal, purpose, is given. As for now for me, it doesn't matter much for the purpose of online articles.

As a example where it might matters, is when in defining a document using XML, or the article in HTML is a basis for printed publication that goes thru further processing. (for example, The finely printed book A New Kind of Science is based on Mathematica notebook format. (see also: Notes on A New Kind of Science.) Some books are based on HTML/CSS tech. For example, Håkon Wium Lie's book. Some books are based on unix's troff system (man pages). Then there systems expressly designed for publishing, layout, typesetting: QuarkXPress, Adobe InDesign (PageMaker), DocBook, LaTeX, etc. )