2009-06-04

Why Must Software be Rewritten for Multi-Core Processors?

Perm url with updates: http://xahlee.org/UnixResource_dir/writ/multi-core_software.html

Why Must Software Be Rewritten For Multi-Core Processors?

Xah Lee, 2009-06-04, 2011-03-02

I had a revelation today, namely, that it is necessary to rewrite software to use multi-processor in order to benefit from it.

This may sound stupid, but is a revelation to me. For the past decade, the question has been on my mind, about why should software needs to be rewritten to take advantage of multi-processors. Because, in my mind, i thought that software are at some fundamental level just algorithms, and algorithms, have nothing to do with hardware implementation aspects such as number of processors. I always felt, that those talks about the need or difficulty of rewriting software for multi-processor (or multi-core these days) must be the product of idiocy of industrial imperative coding monkeys. In particular, some languages such as java, the way they deal with it, seems to me extremely stupid. e.g. the concept of threads. In my mind, there should be a layer between the software and the hardware, such as the operating system, or the compiler, that should be able to automatically handle or compile the software so that it FULLY use the multi-processors when present. In short, i thought that a algorithm really has nothing to do with hardware details.

I never really thought hard about this issue, but today, since i got a quad-core PC, so i looked into the issue, and thought about it, and i realized the answer. The gist is that, algorithm, fundamentally means manipulating some hardware, in fact, algorithm is a step by step instruction about some form of hardware, even the hardware may be abstract or virtual. For example, let's say the simplest case of 1+1. It is a algorithm, but where is the hardware? You may say it's abstract concept, or it being a mathematical model. What you call 1+1 depends on the context, but in our context, those numbers are the hardware. To see this, lets say more complex example of listing primes by sieve. Again, it comes down to “what is a number”? Here, numbers can be stones, or arrangement of beads on abacus, it's hardware! As another example, say sorting. To begin with, you have to have something to sort, that's hardware.

Another way to see this is that, for a given computing problem, there are infinite number of algorithms to achieve the same thing. Some, will be better ones, requiring less steps, or requiring less storage. All these are concrete manipulation issues, and the thing being manipulated, ultimately we have to call it hardware. So, when hardware changes, say from one cpu to multi-cpu, there's no way for the algorithm to magically change and adopt the changed hardware. If you need a algorithm that is catered to the new hardware, you need a new algorithm.

One might think that there might be algorithm Omega, such that it takes input of old hardware O, new hardware N, and a algorithm A, and output a algorithm B, such that B has the same behavior as A, but N+B performs better than O+A. This is like asking for Strong AI.

One thing we often forgot is that algorithms ultimately translates to manipulating hardware. In a modern digital computer, that means software algorithms ultimately becomes machine instructions in CPU, which manipulate the 1s and 0s in register, or electricity voltage in transisters.

In a more mundane point of view, a automatic system for software to work on multi-processors is a problem of breaking a given algorithm into independent units (so that they can be computed in parallel). The problem of finding a algorithm is entirely different from the problem of breaking a algorithm into independent units. The problem of dissecting a given algorithm into independent units is a entire new branch of mathematics. The problem of writing a unitized algorithm is also relatively new, with the need to analyze a problem by independent units. For example, let's say factoring. Factoring is a well defined mathematical problem. There are millions algorithms to do it, each class has different properties such as number of steps or storage units. However, if we look at these algorithms from the point of view of independent units, it's a new perspective on classification of algorithms. Software are in general too ill-defined and fuzzy and complex, the software we use on personal computers such as web browsers, games, video players, don't even have mathematical models. They don't even have mathematical models of their inputs and outputs. To talk about automatic system of unitizing a given software, would be more like a AI fantasy. Roughly, a term that describes this aspect of research is Parallel computing.

In the Wikipedia article, it talks about types of parallelism: Bit-level parallelism, Instruction-level parallelism, Data parallelism, Task parallelism. Then it also discusses types of hardware related issues: multicore, symmetric multiprocessing, distributed computing, cluster, grid. The subjects mentioned there more close to this essay are “data parallelism” and “task parallelism”. However, neither are high level enough as i discussed here. The issue discussed here would be like “algorithmic parallelism”. Indeed, Wikipedia mentioned “Automatic parallelization”, which is exactly what i'm talking about here. Quote:

Automatic parallelization of a sequential program by a compiler is the holy grail of parallel computing. Despite decades of work by compiler researchers, automatic parallelization has had only limited success.[40]

Mainstream parallel programming languages remain either explicitly parallel or (at best) partially implicit, in which a programmer gives the compiler directives for parallelization. A few fully implicit parallel programming languages exist — SISAL, Parallel Haskell, and (for FPGAs) Mitrion-C.

It says “A few fully implicit parallel programming languages exist”. If you are a starry-eyed comp lang tech geeker, don't get carried away by what those words might seem to mean.

(Note: Wikipedia has a dedicated article on the subject: Automatic parallelization)

Reading those Wikipedia articles, it solidifies the popular thought of why functional languages have a great time with the multi-processor future. Because in functional languages, you write software by synthesizing functions, and functions are by definition, independent units. This leaves the compiler much more able to analyzes and dissect to produce efficient byte code for multi-processor machines.

More Wikipedia readings: Multi-core (computing), 64-bit, X86 architecture, Instruction set architecture.

For a excellent discussion of this topic, with actual code examples, see:Guy Steele on Parallel Programing.

✻ ✻ ✻

I've been studying math and computing fanatically since about 1991. Sometimes, i suddenly had a insight on questions that has bugged me for years. I the past few years, i have written out some of these. Here they are: Multiplication and Multiplicative IdentityThe Geometric Significance of Complex ConjugateWhat is Technical Drawing, Descriptive Geometry, Projective Geometry, Linear Algebra.

2009-06-03

The Complexity And Tedium of Software Engineering

Perm url with updates: http://xahlee.org/UnixResource_dir/writ/programer_frustration.html.

The Complexity And Tedium of Software Engineering

Xah Lee, 2009-06-02

This page is a blog of a experience of few examples that illustrates some seemingly trivial task can become quite tedius and complicated in the software industry.

A Complexity with Emacs

Discovered a emacs problem.

Summary:

  • Seems that emacs 23 will have a problem loading a css-mode written by Stefan Monnier
  • The css-mode.el file does not contain any sort of version number, which makes the problem worse.

Detail: I have a Mac running emacs 22 with OS X, and i have PC running emacs 23 and Windows Vista. When i use the emacs 23 to load css mode, it gives this error: “if: Wrong type argument: integerp, (0 . 8)”.

The problem seems simple in retrospect, but wasn't simple at all when you trying to get things done and things don't work as expected. Here's the story.

Emacs 22 does not have a css mode, emacs 23 does. There's one css mode written by Stefan. I've been using it on the Mac for the past couple of years. The same package is now bundled into emacs 23, which i'm using on PC. However, the code in the 2 files are not identical. I have my emacs setup to load css mode. Since i am migrating to PC, alone with my whole emacs system, and am setting up a Mac and PC and network that'd let me work with either machine harmoniously. On the PC, when i start css mode, it gives error, but not always. You don't know what's wrong. It could be a fuckup in the emacs distro i'm using on PC (which is emacsW32), or it could be emacs 23 problem (emacs 23 is still in beta), or it could be something in my emacs customization that works perfectly well on the Mac but not on the PC with whole new environment. Eventually, i realized that's because sometimes i started plain emacs 23 without loading my own setup, it was using the bundled css mode, so no error, but when i loaded my own emacs setup, it gives error. This seems still simple in retrospect, but wasn't then. I added a version check to my emacs init file(s), so that if emacs is 23, don't load my css mode. The next day, same symptom occurs. Eventually I realized that's because the load path is set so that my own version of css mode comes before the bundled css mode, so that, when i load css, again my old version is loaded. Eventually, the solution is to check if css mode bundled with emacsW32 runs ok on emacs 22 on my Mac; if so, simply use that as the single source for my Mac and PC. When doing this work on checking what versions are the files, i realized that the 2 css mode's files don't contain version number at all. All this takes 5 minutes to read, but was one of the gazillion seemingly trivial issues and problems when setting my Mac/PC networking environment with cygwin and all. This took me one week, and haven't gotten to wholly converting my Mac files to PC. Added the time to google for for the answer, possibly write a report to the emacsers, etc, all in all, i'd say this problem caused me 4 hours.

Here's the emacs version i'm using. On the Mac: “GNU Emacs 22.2.1 (powerpc-apple-darwin8.11.0, Carbon Version 1.6.0) of 2008-04-05 on g5.tokyo.stp.isas.jaxa.jp”. On Windows “GNU Emacs 23.0.94.1 (i386-mingw-nt6.0.6001) of 2009-05-28 on LENNART-69DE564 (patched)”

URL Encoding

Discovered a subtle issue with automating url encoding. In url, if you have the ampersand “&” char, and if this url is to be inside a html doc as a link, then, there's no automated procedure that determines correctly whether the char should be encoded as “%26” or “&”. If the char is used as part of the file name or path, then it should be encoded as “&”, but if it is used as a separator for CGI parameters, then it should be encoded as “%26”.

The rule for Percent encoding the char is that the ampersand is a reserved char, therefore it must be percent encoded if it is used for normal file path names. So, when it is NOT used as part of path names, but used as CGI parameter separaters, with GET method of HTTP request, then it must be left as it is. Now, in html, the ampersand char must be encoded as html entities “&” when adjacent chars are not space (basically). So, it becomes “&”.

In summary, if the url in html link contains “&”, and the char is a CGI param separator, then encode it to “&”, else, encode it as “%26”, and my discovery is that the purpose of the char used in url cannot be syntactically determined with 100% accuracy.

Of course, in practice, all this matters shit. Just use “&” plainly and it all works in 99.999% of situations.

Unicode File Names

Unison and Unicode File Names

Summary: Unison does not support unicode chars.

When using Unison version 2.27, to sync files from Mac to PC, the file name on the mac is “赵薇_flag.jpg”, it is copied to PC using Unison 2.27, and the file name became “èµµè-╪_flag.jpg” when viewed under Windows, but the file name shows up correctly when viewed in EmacsW32.

Mac version: 10.4.11, Windows Vista SP1.

This may indicate that Unison encode file names in utf8, or just ascii. Indeed, it is said on Wikipedia that unicode have problems with non-ascii file names.

When the file is copied from Mac to Windows or Windows to Mac, operating either on Windows or Mac as the local machine, using either OS's file manager, the file name is copied correctly.

Setting up Unison itself is not so trivial. It is trivial in concept, but actually took hours. I have Unison on my Mac installed, and use it few times a year, since about 2006, so i'm familiar with it. On PC, first you have to install cygwin. I know there are Unison binaries for Windows but since i use cygwin, so my first choice is staying with cygwin, since it contains the whole unix environment. Installing cygwin is another story, but once you installed Unison in cygwin, and tried to test sync my Mac and PC, you run into the problem that sshd must be turned on in one of the machines. Namely, sshd should run on the “remote” machine. (setting up a local network among Win and Mac is another complex and tedious story) Then, there's the issue of deciding which machine you want sshd to run or both. On the Mac, i can turn on sshd in a minute. On Windows, i'm not sure. I'm not sure if Windows Vista home edition provide ssh server, and am not sure how to turn it on if so. As far as i know, Windows Vista Home does not come with a ssh client. In the process, also realize that firewall must be turned off for ssh port. So, you spend 30 min or possibly hours (here and there) reading or probing with Windows Firewall control panel and whatnot other admin tools. After a while, i decided it's easier just to turn on sshd on the Mac then Unison from the Windows side to the Mac. At least, have this direction work first, and when that works, i can play with the other direction. After all this done, i tried to Unison, but Unison reports that the Unison version on my Mac and PC is not the same, so it doesn't work. Geeze. The one on my Mac turns out to be Unison 2.13.x, and the one i have in Cygwin is 2.31.x. Now, i figured that with each release of Unison, it probably obsolete some older versions. So, back to digging Unison docs and the web. The simplest solution comes to mind is to update my Unison on my Mac to latest, of which, the unison on fink haven't been updated for a couple of years. I use Fink, and am fairly familiar with using Fink. However, after “fink selfupdate” etc, then “fink desc Unison”, the version there reported seems to be 2.13. Then, searching web on fink home page indicates they have unisone-2.27.57-1007, for my OS 10.4 PowerPC. So, why doesn't it show up in my fink? I remember i had a case last year where fink had obsolete mirror databases, a fault entirely on their end.

After spending maybe some more 30 min, i decided to install Unison from a website, binary download. After that done, i got Unison 2.27.x on the Mac. I tried to sync again, still no go. So, it seems like that the Unison version number must be the same or very close. Checking on Unison website, it looks like the current stable release is 2.27.x, so, my cygwin's 2.31.x is actually a beta. Damn. So, now back to cygwin. Luckily, it appears there are several version of Unison there to be installed, and i easily installed 2.27. Then, finally, test sync is successful. Now, i go back to get my files ready in a state to be synced. (long story there. See: Perl Script for Removing Mac Resource Fork and Switching from Mac/Unix To PC/Windows ) When finally i'm ready to Unison, then there's the Chinese character problem!

Emacs on Windows and Unicode File Names

When using emacsW32, dired has problem dealing with files with chinese char on remote machine. Consequently, when trying to copy, delete, etc, the behavior may be dangerous.

e.g. i have this file 林志玲.jpg on a Mac. I use emacsW32 on Windows to view it thru network. (e.g. the path is //169.254.145.104/xah/web/ ... ) the file name shows up as “_viu0y~a.jpg” in dired.

“GNU Emacs 23.0.94.1 (i386-mingw-nt6.0.6001) of 2009-05-28 on LENNART-69DE564 (patched)”, Mac version: 10.4.11, Windows Vista SP1.

2009-06-01

Perl Script for Removing Mac Resource Fork

Perl Script for Removing Mac Resource Fork

perm url with updates: Perl Script for Removing Mac Resource Fork

Xah Lee, 2009-05-31

This page shows some perl script and tips for preparing Mac files to be used on Windows or Linux.

Resource Forks and File Type Code

Before Mac OS X, Mac files heavily relies on Resource fork. With OS X, it is decided in the early 2000s that resource fork is going the ways of dinosaur.

Another confusing thing is that Mac files often has file Type code. Its purpose is similar to Filename extension used by Windows and Internet media type for indicating which app can be used to open the file. The Type Code is not stored in Resource Fork, but is part of the HFS+ file system.

Resource fork for data files is discourage by Apple since early 2000s, and i think vast majority of modern apps does not create files with resource fork. However, Mac applications (those in “/Applications/” folder, may still rely on resource fork to function.

File type code are still used in OS X. However, it can be deleted without creating much problem, since all it does is associating files with applications, and this mechanism is largely replaced by filename extension.

For converting your old Mac files, you cannot simply delete resource fork, because for some files, such as “unflattened” QuickTime movie files, the main data is in the resource fork.

Here's a perl script that removes resource fork of image files. It can be easily modified to report files that contains resource fork. delete_image_file_resource_fork.pl. (resource forks in image files are almost always just thumbnails, created by application such as GraphicConverter. The thumbnail of image files created by Finder are not stored as resource forks.)

For some more details about how to use OS X's command line to check resource fork or file type, see: Mac OS X Command Line Tools Tips.

“.DS_Store” and “Thumbs.db” files

Mac also creates a .DS_Store file in each folder. ( Windows creates Thumbs.db) You'd want to remove them if you are copying them to Windows. You can run the following command in bash to remove them:

find . -name ".DS_Store"
find . -name ".DS_Store" -exec rm {} \; 

Here's a perl script that does the job in more flexible ways: delete_DS_Store.pl

“Icon^M” files

Another Mac specific file are those files named as “Icon^M”, where the “^M” is the Return character (ASCII 13). These are folder icon files, but am not sure they are necessarily in Apple Icon Image format. I'm not sure what's their official status with OS X. However, you can still find them in OS X. For example, you'll find it in “/Applications/Adobe Reader 8/”, as well in StuffIt 10, Mac Pov-Ray 3.6, Adium, and if you use Jamie Zawinski's XScreenSaver for OSX, you'll find a lot “Icon^M” files in your “~/Library/” dir.

Here's a perl script that delete these files: delete_macos9_icon_file.pl.