The Schemepy weekly status report: the new fallback

July 20, 2008 – 12:16 am

I committed almost nothing to the Thousand Parsec repository this week. That’s because I’m mainly working on skime — a pure-Python VM for Scheme. After one-week hard working, the basic shape of the VM is already there.

Although there are still many work (e.g. the macro system) to do before it can be a really useful VM, I decide to write a simple layer to fit the Schemepy API and run the benchmarks to see whether the time spent on a new fallback is worth.

The result looks promising. Firstly, there’s the time spent to load the VM (mzscheme is not included in the benchmarks shown below. The mzscheme package in Debian sid upgraded to version 4.x, but even the mzscheme executable failed to start in my local box. I still haven’t got time to inspect the reason.):

Time to load the VM ------------------------------------------------------------
                   user      system        real       total
guile          0.040000    0.070000    0.170000    0.280000
skime          0.000000    0.010000    0.010000    0.020000
pyscheme       0.020000    0.040000    0.050000    0.110000

Unlike other benchmarks, pure-Python implementation always wins when it comes to loading the VM because loading a shared library is generally more expensive than importing several Python modules. skime is even faster than pyscheme here. But that might be because skime is still not very complete currently.

However, besides the load-vm benchmarks, skime also beats pyscheme in other benchmarks. One example is the compiling benchmark to measure the performance gained by compiling the code before running it (and the compiled code can be run many times). Both guile and pyscheme do not support compiling. The compiling process of pyscheme backend is simply parsing, while nothing is done in the compiling process of guile backend. However, skime compiles the Scheme code to bytecode and the skime VM is in fact a bytecode interpreter. Consider the following simple insertion-sort example:

(begin
  (define (insertion-sort lst)
    (define (insert-in-list new lst)
      (if (null? lst)
        (list new)
        (if (< new (car lst))
          (cons new lst)
          (cons (car lst) (insert-in-list new (cdr lst))))))
    (define (helper unsorted sorted)
      (if (null? unsorted)
        sorted
        (helper (cdr unsorted)
                (insert-in-list (car unsorted) sorted))))
    (helper lst '()))
 
  (define (sorted? lst)
    (if (null? lst)
      #t
      (if (null? (cdr lst))
        #t
        (if (< (car lst)
               (car (cdr lst)))
            (sorted? (cdr lst))
            #f))))
 
  (define lst '(5 9 2 4 7 6 3 1 8 10 -5 83))
  (define sorted (insertion-sort lst))
 
  (sorted? sorted))

We compile the code, execute the compiled form 100 times and measure the time:

performance improvements by compiling (sort) -----------------------------------
                   user      system        real       total
guile          0.000300    0.000000    0.000400    0.000700
skime          0.015900    0.000100    0.016000    0.032000
pyscheme       0.107300    0.000200    0.107600    0.215100

It seems that skime can be almost an order of magnitude faster than pyscheme. However, that in fact depends on how many times we repeat the execution. When we repeat 1000 times at another example, the result is even more fantastic:

performance improvements by compiling (calc) -----------------------------------
                   user      system        real       total
guile          0.000250    0.000000    0.000250    0.000500
skime          0.000360    0.000000    0.000360    0.000720
pyscheme       0.010900    0.000000    0.010930    0.021830

The performance of skime is within the same order of magnitude of guile, and about 30 times faster than pyscheme. The test case involves mostly simple calculations:

(begin
  (define a 5)
  (define b 10)
  (define c (- a b))
 
  (+ a
     (* b 10 11)
     (apply + (list (* b c) 1 2 (- 3 a) b 5))
     9
     b))

Not all benchmarks are supported. Some primitives are needed for skime to run the strcat benchmark (so does pyscheme). However, the simple but famous tak benchmark can also show us something:

tak benchmark ------------------------------------------------------------------
                   user      system        real       total
guile          1.030000    0.010000    1.050000    2.090000
skime        254.350000    0.060000  254.920000  509.330000
pyscheme    1310.230000    0.360000 1312.500000 2623.090000

The benchmark code is shown below:

(define (tak x y z)
  (if (not (< y x))
      z
      (tak (tak (- x 1) y z)
           (tak (- y 1) z x)
           (tak (- z 1) x y))))

I didn’t run all the test cases because it takes too long. Mainly recursive-call is measured in this benchmark. pyscheme uses Python stack to execute Scheme procedures, so it is relatively expensive to make context switching. And since tail-call is not supported in Python, pyscheme uses trampolined-style Programming to avoid stack overflow, which may cause significant performance issue. On the other hand, skime use light-weighted context switch along with spaghetti stack, which does not suffer from deep recursion and may be trivial to implement continuation.

More over, tail-call is supported by skime. The result of the tail-call benchmark can confirm this:

Tail call performance ----------------------------------------------------------
                   user      system        real       total
guile          0.007000    0.000000    0.007000    0.014000
skime          1.578000    0.002000    1.602000    3.182000
pyscheme       8.498000    0.002000    8.553000   17.053000

In general, skime looks promising. Besides its performance advantage over pyscheme, the feature is (already) more complete. So I’m sure skime will take the place of pyscheme as the default fallback of Schemepy and further development is needed.

Although the task is a little difficult, I enjoy very much developing the skime VM. I’ve been looking forward to have a chance to write a language VM for a long time and this is my first try. I found something completely wrong and rewrote large piece of code several times, but I really learned a lot during the process, and the result is not too bad.

I use a bottom up way to write the bytecode compiler. But after totally confused with context, environment, operand stack and related things, I finally managed to develop the VM with a top down method. Although things can be more fast with a lower-level language, writing the whole thing in Python really helps a lot by letting me concentrate on the system other than tricky language issues. :)

Post a Comment