Semantic Web: Emerging Practice of Including Language Name in Embedding Computer Language Source Code on Web Pages
There's a new practice since HTML5 about embedding computer language source code on web pages.
Normally, you just use “pre”, like this:
<pre> x = 5 print x </pre>
However, there's no indication of what language it is. Indicating a language is desired because search engines and other tools can get that info and process it accordingly (such as syntax coloring tool). (this is the idea of semantic web.)
So, it appears, there's this practice of introducing the language info, by embedding a “code” tag with “class” set to “language-‹name›”. Like this:
<pre> <code class="language-python"> x = 5 print x </code> </pre>
However, there's no standardized string for the language string.
- Reference: The code element @ Source dev.w3.org
Overall, this practice seems ad-hoc and questionable, but a mob-standard might be better than none.
What's the problem? For example, the semantic of “class” tag isn't designed to be used to encode language names. Also, having “code” nested inside “pre” is a redundant hack. A improvement might be:
<code class="language-python" style="white-space:pre;display:block"> x = 5 print x </code>
But this throws-off intuition, because most people are much more familiar with “pre” tag. So, perhaps we can do:
<pre class="language-python"> x = 5 print x </pre>
All of the above are hacks in using HTML Microformat as a way to embed info for semantic web. A proper solution is a dedicated tag in XHTML, but XHTML is going the ways of dinosaur. Among the html microformat hacks above, i can't say which one is superior.
For the politics of HTML5 and XHTML, see: Are You Intelligent Enough to Understand HTML5? ◇ HTML6: Your JSON and SXML Simplified ◇ HTML5 Doctype, Validation, X-UA-Compatible, and Why Do I Hate Hackers.