Archive for May, 2008

5th week of Schemepy: the mzscheme backend

Saturday, May 31st, 2008

I was all doing the mzscheme backend the last week. I had complained much about the interface before. However, after that, (at least I think) I have figured out how to do various things to finish the backend by reading the document, reading the source code and guessing.

For example, the namespace mechanism is mentioned very vaguely in the Inside PLT MzScheme. Firstly, I guess I can use module to do the namespace. But it doesn’t work. Then I discovered there’s a make_namespace procedure, though not documented, I guess is what I was finding. But I still get some strange errors (related to some #%-prefixed modules). I’m not sure how the namespace should be used. And I’m also having some segment faults.

So that’s the current situation. I have a mzscheme backend, but it doesn’t work as well as I expected. I didn’t find the IRC channel of mzscheme. I think I should started to ask questions in the mailing list for helps. While the email might not be replied immediately, I’ll also contact Jakub about the possibility of using PyPy scheme as a backend of Schemepy, as well as looking at porting the TPCL related code.

Blog 一周年

Wednesday, May 28th, 2008

看了一下,在这个 Blog 发的第一篇文章是 2007 年 5 月 21 日,一晃就已经过去一年了。我决定浏览一下自己这一年来发的 Blog ,做一个整理。一方面我是一个比较喜欢怀旧的人,另一方面也算从一个侧面看看这一年来我都干了些什么吧。

按照 WordPress 的统计,到目前为止,我一共发了 151 篇 Blog ,并有 556 个评论。我还是按照时间顺序来浏览吧。

Read the rest of this page »

4th week of Schemepy: the not-so-friendly API of mzscheme

Friday, May 23rd, 2008

There’s not too much done in this week for Schemepy. At first I was trying to port libtpclient-py from pyscheme to Schemepy. However, the code base of libtpclient-py is not too big, and I only found little code that used pyscheme. I thought I should learn the basic behavior of the lib and the client before I start hacking it. Unfortunately, I have no idea of what 4x game is. I tried several times but still not figured out how to play a TP game.

Read the rest of this page »

Play with GC: mark your treasure

Wednesday, May 21st, 2008

bug.pngGarbage collecting is amazingly useful. It is a must-have of any modern language. You never need to concern about when to free the allocated memory again. Just allocate, those objects not used will be collected automatically at a some time.

Yes, it’s true. But, wait, it’s not true! I still remember Bjarne Stroustrup had said this:

Complexity will go somewhere: if not the language then the application code.

There will be someone that will be frustrated at dealing with all those garbages. Sometimes that person is just you. So knowing how garbage collector works is still necessary. Though I was always interested in garbage collecting algorithms, I only realized this after I spent a whole after debugging a frustrating heisenbug.

Read the rest of this page »

rmmseg-cpp: rmmseg in C++

Wednesday, May 21st, 2008

RMMSeg is an implementation of MMSEG Chinese word segmentation algorithm. It features full integration with Ferret. The original version is written in pure-Ruby, which includes two algorithms:

  • Complex Algorithm: Maximum matching with three-word chunk filtering. The accuracy is good. But the performance is very bad — it is very slow, consuming lots of memory and there even seems to be memory leaking.
  • Simple Algorithm: Simple maximum matching algorithm. The performance is relatively acceptable, the accuracy is also not too bad, but definitely not as good as the Complex Algorithm.

I tried various ways to improve the performance and achieved some improvements. But the result is not so good for real production. And there are also strange leaking. I tried various tools like ruby-prof and BleakHouse. They all showed that I’m not leaking, but the memory usage is definitely growing (linearly). I have to admit Ruby (MRI, currently) is very slow.

Then yesterday when reading Beautiful Code, and finding some beautiful C codes, I started to get enthusiastic. I got back after supper and started to implement RMMSeg in C++ — rmmseg-cpp.

Now I have something to show off:

rmmseg-cpp.png

With a simple Ruby wrapper, the interface of rmmseg-cpp is almost identical to the original rmmseg. Due to my simple test, it now runs roughly 40 times faster (though I had expected more) while consuming only 10% memory as before.

However, I’d also have to admit coding in C++ is more dangerous than in Ruby (or similar languages). I encountered several segment faults when writing rmmseg-cpp. And I’m still not very sure whether it is really bug-free (I used many tricky stuffs in order to make it faster and more compact. But in fact, no software is really bug-free :p ). Another drawback is that rmmseg-cpp is less (or difficult) extensible/customizable than rmmseg, because it is not very convenient to do such things.

Multiton again, in Python

Sunday, May 18th, 2008

I have introduced Multiton in one of my previous blog post. Multiton is just like Singleton, except that there will be multiple instance when the init parameters are different. One example is the Lisp symbol:

  • Different symbol object for different symbol name.
  • Identical symbol object for identical symbol name.

The last time I was implementing a Multiton is when I wrote my Scheme interpreter in Ruby. It is fairly easy in Ruby, I wrote a Multiton module, just include that module, your class becomes a Multiton.

But now (when doing Schemepy) I have to do the same thing again in Python. In Ruby the relationship between instance and its class, class and its class (class’s class) is identical. However, there are metaclass in Python, I’m some what confusing about that currently. I know both __init__ and __new__ is not enough for such a task. But I don’t know how to deal with metaclass.

Read the rest of this page »

Inside the {C++, Java, Lisp, Python, Ruby} Object Model

Sunday, May 18th, 2008

We just held a technical salon today named "template<language L> Inside the L Object Model". When I was looking at some code of Ruby, I found the object model is very different to a static language like C++. So I suggested the idea of discussing various object model of different languages. Finally, our amazing MSTC staffs made it happen in the form of a salon.

We finally selected 5 languages and invited 5 people to talk about their object model:

  • C++: shifan, the board manager of C++ board of freecity BBS. An interesting person, know C++ very well, commonly known as "模教教主". :)
  • Java: gbb, the board manager of Programming Technique board of cc98 forum. A very enthusiastic person. WARNING: he might become extremely exciting when discussing about any technical topic. :p
  • Lisp: binghe, the board manager of Computer Language board of freecity BBS. THE Lisp hacker of our school. He always think Lisp (especially Common Lisp) is the ultimate super cool and powerful language ever exists. :)
  • Python: Mike, the board manager of Linux board of cc98 forum. Still a freshman but already has a very good knowledge of Linux, Open Source and programming. Bright future! :)
  • Ruby: me. I’m also board manager of cc98 forum and freecity BBS. :p I’m interested in various languages.

We gave about 20 minutes to each speaker to introduce the object model. The remained time is for discussing. The salon is held in the office of MSTC. We have many people there.

DSC_0730

Read the rest of this page »

Big surprises of the day: gift from Google, Rubinius, etc.

Saturday, May 17th, 2008

We were having massive earthquake the last several days in China. Though Hangzhou is almost not affected, there were many people died in Sichuan province and the number of death is still growing. I’m very sorry about that. But I can’t do more besides denoting some money and blessing. My hometown, Guizhou, is just near to Sichuan. My mother was willing to be a volunteer to help the suffering people. I talked much to her to prevent her from going. Maybe I’m selfish, but it’s still really very dangerous there. And I’m holding the point that while many people died, those living people should treasure their lives; every people has his own value in the world. I’m trying to work hard, for me, and for those people who should have lived longer. I really hate death! :(

So this is still a good day. First of all, I just received the surprise book from Google as the start of Program Gift. But we were asked not to reveal the book name until every GSoCer has received the book. I’ll follow this rule. If some GSoCer know the book name before he gets his one, it won’t be such a big surprise for him. However, what I do want to say now is that I love this book very much! I’d thank Google very much!

Read the rest of this page »

This week in Schemepy: Test suites and benchmarks

Saturday, May 17th, 2008

I’m really borrowing the title of the popular series This week in Ruby from Zen and the Art of Programming. But in fact, there are really some interesting progress related to Schemepy this week.

Test suites

I just switched from py.test to nose for doing unit test. I arranged the tests directory so that now it contains two sub-directories. One of them focuses on testing the Schemepy interface. The other focuses on testing the Scheme backend implementation, we’ll cover some basic aspect of Scheme here. Because we want to make sure a unique behavior no matter which backend is used.

Read the rest of this page »

Trampolined-style Programming

Thursday, May 15th, 2008

今天在 pyscheme 的代码里看到许多诸如 pogo.pogopogo.landpogo.bounce 之类的调用,感觉特别奇怪,不过它的注释写得很详细,做这样的东西是为了解决 Python 没有尾递归优化的问题。

在有尾递归优化的语言里(Scheme 是最典型的一个例子,因为它甚至把尾递归作为语言的一个重要特性放在语言规范中了),如果一个函数的最后一个动作(除了 return)是调用另一个函数的话,就直接用那个函数的栈帧替换当前的栈帧,省去了 call and return 的麻烦,还避免了栈溢出,时间空间都有优势。

作为一个最简单的例子,下面是“正常”的求阶乘和尾递归版本的阶乘:

Read the rest of this page »