Learning Notes On Goto, Continuation, Coroutine

perm url with updates: http://xahlee.org/comp/goto_continuation_coroutine.html

Learning Notes On Goto, Continuation, Coroutine

Xah Lee, 2010-02-05, 2010-02-05

Some learning notes on Goto, Continuation, Coroutine, in programing languages.

I realized today that these concepts are not about implementation, hardware, efficiency, nor algorithm, but rather, they are a particular construct, choice of design, in the semantics of computer languages. For example, to express a repeatition, some languages has “for” loop, some has “while”, some has “do ... until”, some relies on “Nest”, “Fold”, “FixedPoint”, most languages have a combination of these, but any single one of them is sufficient with respect to expressing a algorithm. Same thing can be said for constructs like “case”, “which”, “switch”, “if then elseif ... elseif ...”, all can be expressed by just having a simple “if else” construct. Continuations and coroutine, are a bit more complex, but they are basically a form of GOTO statement.

More specifically, i realized, that any language without any of goto, continuation, coroutine, does not necessarily mean the language is less powerful, or less expressive, or require more lines of code in practice. In fact, i think a language with them is probably not desirable. I think these are mostly useful in concurrency applications, and languages designed for concurrency have better models or constructs than any of Goto, Continuation, Coroutine.

These language design issues are quite interesting. As far as i know, there is not any practical, rigorous mathematical foundation about this.

When you code in some lang, suppose it doesn't have “do ... until”, but has “while”, it is trivial to code around it. However, in many situations, it is not easy to code one construct using another. If you have coded in several languages, you'll know this. Often the difficulty is because you are not familiar with the lang, and you are coding in lang X with lang Y idioms familiar to you. However, it does happen sometimes that you know what you are doing and certain construct is optimal for your task but lang X doesn't have it.

For example, when you code in a lang using nested for loop construct and need conditional exit in several places in inner loops, and if the lang does not have any “break” or “continue”, usually it is a pain to code around it. Also, in some functional langs that doesn't have any sort of iterations but only map and nest constructs, sometimes a iteration type of construct is the most clear and optimal. Likewise, sometimes all you need is a Map or Fold, or passing functions as a expression as a argument, but most imperative langs don't have it.

It's important to note here few issues:

  • In online forums, most programer's complaint about some lang lacking some construct is due to their inexperience in the lang or the particular style of programing of that lang's domain.
  • There are situations that some construct is optimal and not in the lang, even for a well designed lang.
  • There is not a PRACTICAL, rigorous, mathematical model (algebra), that can tell us what construct should be in a lang, or even how to measure expressibility of a lang.

I've been doing functional programing for 15 years. I consider myself to be among the world's top 500 expert in Mathematica. Over the years i learned a little bit of other functional langs: Scheme, Haskell, OCaml. But never obtained any practical coding expertise in these (except Emacs Lisp). In argument in rowdy discussion forums such as newsgroup comp.lang.scheme, often some tech geekers will throw out terms like continuation and coroutine and taunt you for ignorance in a heated debate. Sometimes i challenge them to explain what these really means. I want them to, tell me, in mathematical terms, what these are. I was never able to get any answer. Of course, today i know, that a mathematical answer is impossible, because there is no math model to speak of it, and continuation and coroutine are rather complex to explain because they involve passing program state, which, inevitably involves implementation issues such as stacks that is not exactly a language semantics issue. However, it is possible to give a clear context of what these construct means, or their significance, in the domain of language semantics and design. However, to do that is beyond their ability, either in actually understanding it, or expressing it in words.

So, what's continuation in one sentence? We can say that a continuation is a controlled form of goto statement. Instead of simply moving execution point from one place to another, it also passes the current program state (e.g. local vars) alone.

So, what's coroutine in one sentence? We can say that a coroutine is a subroutine/function in which you can call other subroutine in the block and passing all local state info to the called routine. In a sense, in is a local form of continuation. So, suppose you have subroutine f and g. f can call g in its definition, and when it calls g, it passes all state info to g, and f is simply done/exit, and all control is now in g. In particular, f does not wait for g to finish and return control to f. Likewise, g can call f the same way. So, 2 subroutines can mutually call each other and form a recursion, but without incurring the non-ending resource growth problem.

The following explains in more detail.

Understanding Goto

For modern programers, to understand goto, continuation, coroutine, a easy way to understand them is by first understanding goto as used in 1960 or 1970's languages. First, read Wikipedia Goto.

Unstructured vs Structured Programing

Essentially, older languages, like BASIC, at that time, does not have much of a concept like today's subroutines, code blocks such as “while”, “for” etc, and of course no libraries or module concept. Here's a code example from Wikipedia that illustrate this:

10   INPUT "What is your name: ", U$
20   PRINT "Hello "; U$
30   INPUT "How many stars do you want: ", N
40   S$ = ""
50   FOR I = 1 TO N
60   S$ = S$ + "*"
70   NEXT I
80   PRINT S$
90   INPUT "Do you want more stars? ", A$
100  IF LEN(A$) = 0 THEN GOTO 90
110  A$ = LEFT$(A$, 1)
120  IF A$ = "Y" OR A$ = "y" THEN GOTO 30
130  PRINT "Goodbye "; U$
140  END

Note that basically a source code is just one single file of hundreds of lines. There's not much of a code block or subroutine, not much of local variables, and the program basically just execute lines sequentially, and GOTO is used to jump to different lines, as primary means of aptly-called “control flow” for program logic.

Now look at the Wikipedia article on GOTO again, you'll understand it. Now, you can imagine this unstructured way is complex and is a problem. The use of GOTO as flow control turns your code logic into spaghetti. There is no clarity of logical units. So, in the 1970s or 1980s, languages evolved into what's called structure programing. This is familiar to programers today. Regardless what language you use, we are all doing structured programing. So, what's structured programing? You can see from this example of improved BASIC language:

INPUT "What is your name: ", UserName$
PRINT "Hello "; UserName$
  INPUT "How many stars do you want: ", NumStars
  Stars$ = STRING$(NumStars, "*") 
  PRINT Stars$
    INPUT "Do you want more stars? ", Answer$
  LOOP UNTIL Answer$ <> ""
  Answer$ = LEFT$(Answer$, 1)
LOOP WHILE UCASE$(Answer$) = "Y"
PRINT "Goodbye "; UserName$

You can see that now there's more clear concept of code blocks or code units, and GOTO is not used here. This is more of a Structured programming, which is familiar to all of us. But more interesting is that this obvious switch from unstructured to structured languages isn't so obvious in the 1970s. You can read Structured program theorem, and see back then it was in debate, and in fact the theoretical background was just come into being.

At this point, it is worth to remember that whether the language is unstructured or allows structured code, both are equivalent with respect to the possible algorithms they can express. More precisely, here's how Wikipedia puts it:

[Structured program theorem] states that every computable function can be implemented in a programming language that combines subprograms in only three specific ways. These three control structures are

  • 1. Executing one subprogram, and then another subprogram (sequence)
  • 2. Executing one of two subprograms according to the value of a boolean variable (selection)
  • 3. Executing a subprogram until a boolean variable is true (iteration)

Understanding Continuation

Continuations is in a sense a controlled form of goto. Scan these articles:

The last one is most easy to understand. By understanding continuation passing style with actual code, you might get some idea about continuation.

Suppose we define a function named “times”, that takes 2 arguments and multiply them together. However, we make it to have 3 parameters. The 3rd parameter will be a function with 1 argument. (the purpose of the 3rd parameter will be explained soon) The “return value” of times is the return value of the f applied to “x * y”.

def f (x y g) {
g(x * y);
(define (times x y f)
  (f (* x y)))

So, to compute “x * y” using our “times” function, we give it a third argument of a “identity” function, which takes one argument and returns it. (it doesn't do anything)

; define the “identity” function
(define (identity x) x)

; compute “3 * 4” using our function “times”
(times 3 4 identity)

Following is a code in Scheme Lisp that defines a function “pyth” that computes Sqrt[x^2+y^2].

(define (pyth x y)
  (sqrt (+ (* x x) (* y y))))

Following is the same function written in Continuation Passing Style.

(define (pyth x y k)
  (* x x (lambda (x2)
           (* y y (lambda (y2)
                    (+ x2 y2 (lambda (x2py2)
                               (sqrt x2py2 k))))))))

You can see that, basically it is like a style of piping, or called filtering or function sequencing.

Am not clear what is the significance of this continuation passing. Or, what's the level or context of this style. Is it implementation of a language (compiler)? Syntax design? Language semantic design? or just different style of coding of a given language for a given problem? If the last, then it doesn't make much sense, since it is better or easier to write it simply as nested function or function sequence. Then, perhaps with continuation passing style, the compiler creates efficient code... but then that is not necessarily so due to the style, since there's no reason a function sequencing or nesting style wouldn't produce the same background for creating efficient machine code.

Don't fully understand continuation yet, will have to dig later.

Understanding Coroutine

Here's Wikipedia quote:

In computer science, coroutines are program components that generalize subroutines to allow multiple entry points for suspending and resuming execution at certain locations. Coroutines are well-suited for implementing more familiar program components such as cooperative tasks, iterators, infinite lists and pipes.

Here's a sample pseudo-code:

var q := new queue

coroutine produce
        while q is not full
            create some new items
            add the items to q
        yield to consume

coroutine consume
        while q is not empty
            remove some items from q
            use the items
        yield to produce

As with Continuation, will need to have hands on coding experience of a lang with that feature to really understand fully.

Coroutine As A Model Of Instruction Set For Abacus

Was reading about Eve online, a massive multiplayer online spaceship game. Here's a interesting quote:

Both the server and the client software for Eve Online are developed in Stackless Python, a variant of the Python programming language. Stackless Python allows a relatively large number of players to perform tasks without the overhead of using the call stack used in the standard Python distribution. This frees the game developers from performing some routine work and allows them to apply changes to the game universe without resetting the server.[52]

Humm... Stackless Python. And then on this article, it mentioned that Second Life is also using it. The source of this claim is here http://wiki.secondlife.com/w/index.php?title=Eventlet&oldid=51543.

Central to the issue here is Coroutine, which i never understood. From reading the Wikipedia article, it appears to be a language feature that allows mutual loop. For example, 2 functions f and g, each calling the other at the end. Normally, that would quickly run out of memory or such. hum... still not sure i fully understand. How is the mathematical nature of this feature?

When thinking about computers, often i model it to manipulating abacus. (you could use Turing Machine to model but more complicated to think about) So, a subroutine, is then a reusable part of algorithm, or, reusable part of instruction on how to manipulate the abacus. So, normally, subroutine is expected to terminate, meaning, given a unit of abacus manipulation instruction set, the way you use this is to expect it to reach the end of the instruction set, then follow another instruction set. But coroutine, is then such a instruction set that is not expected to reach a end. Rather, you expect that in the middle of the instruction to follow some other instruction set (kinda like Goto). This all makes much sense now.

Some Wikipedia Reading On Compiler Optimization

Some easy to understand compiler optimization techniques: Constant folding, Dead code elimination.

Also, to facilitate the above optimizations , modern compilers uses a Static single assignment form for their intermediate form.

On reading these, gave me some thoughts about how to write programs today. For example, in the past, especially functional programing, you avoid defining variables, so that your code runs faster. For example, say you want a constant “delayTime” to be 2 hours, and you may write “delayTime = 60*60*2” which is easier to understand than “delayTime = 7200”. However, ten years ago you may want to do that instead because your program avoids computing some multiplication. But today, compiler technology has advanced a lot. The bottom line is basically the same: Always write your code in the MOST human readable way, and never optimize unless in the final stage and if you really have a NEED. AND, when u optimize, the most important step is to analyze at the algorithmic level. That is, your approach to the problem and the overall algorithm chosen. Then, bottle necks. The aspect of optimization you should have the least concern, is about code diddling, such as trying to use more “idioms”. The reasons for this principle are few:

  • (1) Hardware speed increases faster than typical software have to perform, in general.
  • (2) Compiler technology optimizes away mathematical equivalences in a way more able than human ever possibly can.
  • (3) Premature optimization is harmful and wasteful. It reduces your code's flexibility, and human time is some thousand times more expensive than cpu time. Tech geeking programers often have a unhealthy infatuation with optimization by diddling their code (e.g. perlers, and otherwise “idiom” lovers, or “pythonic” fuck.).

More items to study:

Popular posts from this blog

11 Years of Writing About Emacs

does md5 creates more randomness?

Google Code shutting down, future of ErgoEmacs