This week in Schemepy: Test suites and benchmarks

May 17, 2008 – 12:40 am

I’m really borrowing the title of the popular series This week in Ruby from Zen and the Art of Programming. But in fact, there are really some interesting progress related to Schemepy this week.

Test suites

I just switched from py.test to nose for doing unit test. I arranged the tests directory so that now it contains two sub-directories. One of them focuses on testing the Schemepy interface. The other focuses on testing the Scheme backend implementation, we’ll cover some basic aspect of Scheme here. Because we want to make sure a unique behavior no matter which backend is used.

Now we have fairly complete test cases for the already defined VM interface. I also borrowed some of the test cases from PyPy Scheme and converted them to fit the Schemepy interface. However, not all test cases are converted. Specifically, I stalled on converting the call/cc test cases. I’ll describe this later.

Guile backend

I have finished the Guile backend, roughly. It passed all the existing test cases. But I say roughly because there’s still many things to do:

  • Support guile 1.6
  • Support Windows
  • Support more advanced types, like array, vector and hash, etc.
  • Support for call/cc, which is still broken currently.
  • etc.

However, it is still a milestone. :)

call/cc of Guile

However, I do have problem to get call/cc work properly. In a word, Guile call/cc is manipulating the call stack. It works fine with C, because all stack data in C are plain old ones. But with ctypes, where (I guess) there might be Python Objects (pointers) on the stack, it will cause problems.

Here’s a piece of code that can cause Python to be segment faulted:

from ctypes.util import find_library
from ctypes import *
lib = find_library("guile")
guile = cdll.LoadLibrary(lib)
guile.scm_c_eval_string.argtypes = [c_char_p]
guile.scm_c_eval_string("(define cont #f)")
guile.scm_c_eval_string("(call-with-current-continuation (lambda (k) (set! cont k) 3))")
guile.scm_c_eval_string("(cont 3)")

I’m still trying to solve this problem. More information can be found in this thread. BTW: Mithro has created a mailing list for schemepy. if you are interested in Schemepy, just come and join this list.

pyscheme backend

I’ve also ported pyscheme to Schemepy’s interface. Since pyscheme is a Scheme implementation written in Python, it is relatively easy to do the wrapping. Though something can still be tricky. For example, Schemepy should be able to distinguish between user defined Python callables (converted to Scheme values) and builtin primitives (which are also originally Python callable). When converting those values to Python, a primitive should become a schemepy.types.Lambda while a user defined callable should become exactly the original callable.

And finally, I finished the porting. But not all test cases are passing. This is due to the limitation of pyscheme itself. It is not a complete Scheme implementation, or only a very minimal one. Some very basic primitives like complex number and rational numbers are not supported. And mapping the exceptions to Schemepy exceptions can be tricky (pyscheme has only two types of exceptions as far as I know).

While I can hack the pyscheme code directly to fix those problems, this task is definitely at a very low priority. And if we find some better Scheme implementation in Python (PyPy Scheme is a good candidate), this task is even unnecessary.

Benchmark utility

And finally, we have the benchmarks. We already have two backends, so we can compare the performance. But there seems to be missing a handy benchmark tool in Python. I really miss some useful libraries in Ruby, like Rake, RSpec and the builtin benchmark module. I’m currently using GNU Make and nose, both very decent tools, but not as good as Rake and RSpec in my view :p .

Mithro mentioned benchrun to me. At the first glance of its homepage, I was wondering why I hadn’t found such a great tool in Google. But when I downloaded the code, I found it fairly short. After double checking the code, I finally noticed the feature list in its homepage is in fact planned features :D .

So I decide to write my own. I wrote it at the classroom when waiting for the final review (defence) of my SRTP project. Now (I think) it is as handy as Ruby’s Benchmark module, here’s a quick and dirty example:

from benchmark import Benchmark
N = 100000
s = "abc"
def concat():
    rlt = ""
    for i in xrange(N):
        rlt += s
def join():
    rlt = []
    for i in xrange(N):
    rlt = ''.join(rlt)
def exception():
    raise Exception("This should be caught")
bm = Benchmark(title="building a big string", repeat=100, rehearsal=True)
bm.measure("concat", concat)
bm.measure("join", join)
bm.measure("exception", exception)

which produce the following results:

building a big string (Rehearsal) ----------------------------------------------
                    user      system        real       total
concat          0.021300    0.000200    0.021600    0.043100
join            0.016700    0.018300    0.035400    0.070400
exception   Exception: This should be caught
building a big string ----------------------------------------------------------
                    user      system        real       total
concat          0.022100    0.000100    0.022300    0.044500
join            0.016800    0.018800    0.035600    0.071200
exception   Exception: This should be caught

I’m also planning to add a formatter to produce graphics plotting results. Maybe I could release this utility when it is proved to be useful. :)


I’m writing some basic benchmarks with my benchmark utility. However, the results are not very interesting. pyscheme wins on the loading time, because there’s nearly nothing to load for pyscheme while guile need to load a shared library.

Time to load the VM ------------------------------------------------------------
                   user      system        real       total
guile          0.040000    0.050000    0.120000    0.210000
pyscheme       0.010000    0.030000    0.040000    0.080000

But guile make a big win on the tail call benchmark:

Tail call performance ----------------------------------------------------------
                   user      system        real       total
guile          0.006000    0.001000    0.006000    0.013000
pyscheme       7.498000    0.002000    7.514000   15.014000

This is just expected results. As I mentioned in my last post, since Python has no tail recursion optimization, pyscheme use the trampolined-style programming to avoid stack overflow. But it doesn’t improve the performance. As we can see from the benchmark result, it is nearly 1000 times slower. And I guess it will be even slower when I modify the benchmark code to run a more deep tail call.

Other results are even worse, since pyscheme doesn’t implemented the do primitive, the benchmark for testing string-append, substring and string-length just failed here:

string-append and substring performance ----------------------------------------
                   user      system        real       total
guile          0.013000    0.034000    0.047000    0.094000
pyscheme   ScmUnboundVariable: Unbound variable do

I think I need to get another non-trivial backend implemented before adding more benchmarks. :(

Plan for next week

So for the next week. I’ll either try to overcome the call/cc problem or try to start to implement another backend.

  1. 2 Responses to “This week in Schemepy: Test suites and benchmarks”

  2. Good report! You however forgot to tag is as Thousand Parsec so it is not appearing in that feed. It’s probably a good idea to send an email to the schemepy list when you post a status report too – maybe with the first paragraph or something?

    I recommend looking at the timeit module as I pointed out on IRC. It stops you from falling into common pitfalls when benchmarking python code.

    You should probably do a schemepy website sometime soon so other people can start finding out about it. Keep it simple and too the point. Something like the libmng-py website would work fine. Link to the code repository and mailing list.

    I also found another python based scheme implementation called Psyche. It doesn’t look very maintained and the PyPy scheme is probably still a better solution. Might be worth porting anyway to find out what the speed differences are between the 3 python implementations (it would help with improving speed of our chosen fall back method).

    Keep up the hard work!

    By Mithro on May 17, 2008

  3. @Mithro,

    Hmm? I remember I had tagged as Thousand Parsec. Maybe something wrong with the feed?

    I tried the timeit module, but I’m very strange why it is using a string as the code to run. The binding of eval/exec is very different from normal code I think. For example,

    def run(code):
    def foo():
        a = 10
        run("print a")

    The code above will fail. That’s the problem I had with timeit. I don’t know why it is accepting a string instead of a callable.

    Good idea. A simple homepage would be nice.

    I have a brief look at Psyche, there seems to be some basic things missing:
    – Correct numeric support
    – Continuations
    – Hygienic Macros

    I also take a look at some other Scheme implementations. Unlike Guile, which is designed to be embedded in other system, it will be a litter harder to write backend for those Scheme implementations than Guile.

    By pluskid on May 17, 2008

Post a Comment