Unicode in Mathematica

Perm url with updates: http://xahlee.org/math/mathematica_unicode.html

How Mathematica does Unicode?

Xah Lee, 2010-12-05

This page explains some tech detail about how Mathematica uses unicode. This article may not be 100% accurate. I'm putting it up for now after spending a few hours. Am tsill working on it. If you are a Mathematica expert, please comment or correct.

Mathematica supports unicode, but does not use Unicode when saving to file. (See: UNICODE Basics: What's Character Encoding, UTF-8, and All That?)

Mathematica (mma) files uses ascii only. (See: Mathematica Notebook Technology.)

How does it support unicode if it uses only ASCII?

Mathematica's Named characters 「\[Name]」

It has a set of special characters with the syntax 「\[name]」. For example:

GlyphSyntax
é\[EAcute]
É\[CapitalEAcute]
α\[Alpha]
Δ\[CapitalDelta]
\[CirclePlus]
\[Because]
\[Element]
\[Equivalent]
\[DoubleStruckCapitalR]

So, when you type 「\[Alpha]」 in mma, it is displayed as “α”. (All builtin symbols in mma starts with capital letter.)

You can think of them as html's “named character entities”. (See: Character Sets and Encoding in HTML.) There are about 900 named chars. For the complete list, see: Listing of Named Characters.

Many of the named chars are also in unicode, but not all. Similarly, many Math Symbols in Unicode are not in this list. Also, unicode's chinese chars, arabic alphabets etc, are not in mma's named chars.

Map Between Unicode and Named Chars

When you paste a unicode char into mma, there is a map that automatically interprete the unicode as one of the named char.

So, for example, if you paste “α” (GREEK SMALL LETTER ALPHA; “U+x3b1”), it automatically becomes Mathematica's 「\[Alpha]」, and displayed as “α”.

Syntax for Unicode Char 「\:nnnn」

For any unicode that's not one of mma's named char (such as chinese chars), their syntax is this: 「\:nnnn」, where the nnnn is unicode's 4 digit hexidecimal representation of the char. For example, the chinese char “水” (water), unicode hex is “6c34”, in mma is: 「\:6c34」.

The above roughly summarize how mma takes unicode as input.

Some Named Chars has Builtin Meaning

Of the named chars, many has special meaning in mma. For example, 「\[Pi]」 is automatically considered identical to the builtin symbol 「Pi」, which in Mma means mathematical constant. (So, if you type 「N[\[Pi]]」 or 「N[\:03c0]」, they are displayed as 「N[π]」 with meaning of 「N[Pi]」, and if you evaluate it, you get “3.14159”.). Here's some examples of special meaning named chars.

GlyphMma named charUnicode nameUnicode hexidecimalDefault Interpretation
\[GreaterEqual]GREATER-THAN OR EQUAL TO2265GreaterThan 「>=」
π\[Pi]GREEK SMALL LETTER PI03c0Pi
\[Infinity]INFINITY221eInfinity
\[Integral]INTEGRAL222bIntegrate
\[Intersection]N-ARY INTERSECTION22c2Union
\[Sum]N-ARY SUMMATION2211Sum
\[Sqrt]SQUARE ROOT221aSqrt
\[CirclePlus]CIRCLED PLUS2295CirclePlus

Note: it appears that it is possible to over-ride the default interpretation of named char to builtin symbol (function, constant), for all or some of the named char. (i haven't investigated on how yet.) See: MakeExpression.

http://reference.wolfram.com/mathematica/tutorial/Operators.html

Alias Shortcut for Named Chars

Some of the named char has one or more aliases for ease of input. For example, to enter α, you can type 【Esc a Esc】 or 【Esc alpha Esc】. Here's some examples:

GlyphCommon Alias
αa
πp
inf
<=
°deg
ΔD
el
->

See: http://reference.wolfram.com/mathematica/tutorial/Introduction-ListingOfNamedCharacters.html.

  • Characters that are alternatives to standard keyboard operators use these operators as their aliases (e.g. Esc -> Esc for , Esc && Esc for ∧).
  • Most single-letter aliases stand for Greek letters.
  • Capital-letter characters have aliases beginning with capital letters.
  • When there is ambiguity in the assignment of aliases, a space is inserted at the beginning of the alias for the less common character (e.g. Esc -> Esc for \[Rule] and Esc -> Esc for \[RightArrow]).
  • ! is inserted at the beginning of the alias for a Not character.
  • TeX aliases begin with a backslash \.
  • SGML aliases begin with an ampersand &.
  • User-defined aliases conventionally begin with a dot or comma.

See: Special Characters.

List of Named Char with Special Meanings

... work in progress

Inputing Special Chars

You can input a special character by:

  • Use one of the graphical palettes.
  • Copy the unicode char somewhere and pasting it in mma.
  • Type it like this: 「\[Name]」.
  • Type the Unicode hexadecimal like this: 「\:nnnn」
  • Type Esc, then the char's alias name, then Esc again.

... work in progress

See also: Wikipedia:LaTeX symbols

Was this page useful? If so, please do donate $3, thank you donors!

Popular posts from this blog

Browser User Agent Strings 2012

11 Years of Writing About Emacs

does md5 creates more randomness?