Archive for the ‘Ruby’ Category

MSTC Staff 的睡眠趋势

Thursday, June 26th, 2008

大约是从去年寒假的时候开始,我就经常在 cc98 的 MSTC 版上发“晚安帖”,就是每天晚上睡觉的时候发一个帖子说一声晚安,后来版面上时常出现一大堆晚安帖的情况,遭到大家的抗议。 :p 后来只好集中到了一个帖子里面,养成了习惯大家也都时常来说晚安。

不管是早有预谋还是心血来潮,我对这个晚安贴的内容分析了一下,得到了类似于下面的结果:


plot_simple.jpg

其中横坐标是日期,纵坐标是睡觉时间,由于大家都睡得比较晚,所以把第二天凌晨的时间也记作当天晚上(如 25:00 就是第二天凌晨 1:00 )。每一条线就代表一个人的睡眠趋势了。由于并不是每个人每天都去那个晚安贴回帖的,所以每条线的横坐标分布并不是很均匀的。上面是只分析了最初几天的数据的情况,下面则是从今年 2 月 24 日到最近的完整数据分析的结果(点击查看原始大小的图片):

Read the rest of this page »

On the Rubinius FFI

Tuesday, June 17th, 2008

rubiniusContents:

The need for glue code

Ruby is a powerful language, but sometimes you’ll still want to interactive with some native functions written in C/C++. C/C++ and Ruby can not call each other directly, so you’ll need to add a glue layer. There are generally two ways to write the glue layer.

Read the rest of this page »

用 Graphviz 来做图的 Visualization

Wednesday, June 11th, 2008

graphviz“编译系统设计”课有一个作业是做一个某语言的 parser ,生成一棵语法树,并用合适的方法把这棵语法树显示出来。我用 Graphviz 来做了 visualization 的部分。这是一个用来做图的 visualization 的很方便的工具,语法树作为一棵树,其实是一个有向无环图了,所以用这个来做其实也是很方便的。

其实作业主要分为两个部分:分析和 visualization 。“某语言”可以是自己定义的,我一开始想做 Scheme 的语法分析,不过后来想想还是算了,那个实在是太简单了,恐怕到时候助教不让过。题目要求用 YACC 或者递归下降的方式进行分析。YACC 我还不会用(暂时也没有要学的打算),所以我用 Treetop 来做 parser ,Treetop 是使用 PEG 进行分析的,其实和传统的递归下降是很像的了。

Read the rest of this page »

[ANN] rmmseg-cpp 0.2.5 released

Sunday, June 8th, 2008

I developed rmmseg-cpp about half a month ago. After running in JavaEye, it is said the performance is good and the memory usage is stable. I’m very glad to see rmmseg-cpp be used in production.

Read the rest of this page »

Rubyforge support git now

Wednesday, June 4th, 2008

不知道是不是我火星了,今天去 Rubyforge 注册 rmmseg-cpp 项目的时候发现在 SCM 那里可以选 git 了。不知道是什么时候加上的支持,这下应该会方便许多了。不过刚注册的项目还没有通过审核,到时候立即试用一下,这样在 github 和 Rubyforge 同时有一个 repo 应该也是能很方便地同步的了! :)

Play with GC: mark your treasure

Wednesday, May 21st, 2008

bug.pngGarbage collecting is amazingly useful. It is a must-have of any modern language. You never need to concern about when to free the allocated memory again. Just allocate, those objects not used will be collected automatically at a some time.

Yes, it’s true. But, wait, it’s not true! I still remember Bjarne Stroustrup had said this:

Complexity will go somewhere: if not the language then the application code.

There will be someone that will be frustrated at dealing with all those garbages. Sometimes that person is just you. So knowing how garbage collector works is still necessary. Though I was always interested in garbage collecting algorithms, I only realized this after I spent a whole after debugging a frustrating heisenbug.

Read the rest of this page »

rmmseg-cpp: rmmseg in C++

Wednesday, May 21st, 2008

RMMSeg is an implementation of MMSEG Chinese word segmentation algorithm. It features full integration with Ferret. The original version is written in pure-Ruby, which includes two algorithms:

  • Complex Algorithm: Maximum matching with three-word chunk filtering. The accuracy is good. But the performance is very bad — it is very slow, consuming lots of memory and there even seems to be memory leaking.
  • Simple Algorithm: Simple maximum matching algorithm. The performance is relatively acceptable, the accuracy is also not too bad, but definitely not as good as the Complex Algorithm.

I tried various ways to improve the performance and achieved some improvements. But the result is not so good for real production. And there are also strange leaking. I tried various tools like ruby-prof and BleakHouse. They all showed that I’m not leaking, but the memory usage is definitely growing (linearly). I have to admit Ruby (MRI, currently) is very slow.

Then yesterday when reading Beautiful Code, and finding some beautiful C codes, I started to get enthusiastic. I got back after supper and started to implement RMMSeg in C++ — rmmseg-cpp.

Now I have something to show off:

rmmseg-cpp.png

With a simple Ruby wrapper, the interface of rmmseg-cpp is almost identical to the original rmmseg. Due to my simple test, it now runs roughly 40 times faster (though I had expected more) while consuming only 10% memory as before.

However, I’d also have to admit coding in C++ is more dangerous than in Ruby (or similar languages). I encountered several segment faults when writing rmmseg-cpp. And I’m still not very sure whether it is really bug-free (I used many tricky stuffs in order to make it faster and more compact. But in fact, no software is really bug-free :p ). Another drawback is that rmmseg-cpp is less (or difficult) extensible/customizable than rmmseg, because it is not very convenient to do such things.

Inside the {C++, Java, Lisp, Python, Ruby} Object Model

Sunday, May 18th, 2008

We just held a technical salon today named "template<language L> Inside the L Object Model". When I was looking at some code of Ruby, I found the object model is very different to a static language like C++. So I suggested the idea of discussing various object model of different languages. Finally, our amazing MSTC staffs made it happen in the form of a salon.

We finally selected 5 languages and invited 5 people to talk about their object model:

  • C++: shifan, the board manager of C++ board of freecity BBS. An interesting person, know C++ very well, commonly known as "模教教主". :)
  • Java: gbb, the board manager of Programming Technique board of cc98 forum. A very enthusiastic person. WARNING: he might become extremely exciting when discussing about any technical topic. :p
  • Lisp: binghe, the board manager of Computer Language board of freecity BBS. THE Lisp hacker of our school. He always think Lisp (especially Common Lisp) is the ultimate super cool and powerful language ever exists. :)
  • Python: Mike, the board manager of Linux board of cc98 forum. Still a freshman but already has a very good knowledge of Linux, Open Source and programming. Bright future! :)
  • Ruby: me. I’m also board manager of cc98 forum and freecity BBS. :p I’m interested in various languages.

We gave about 20 minutes to each speaker to introduce the object model. The remained time is for discussing. The salon is held in the office of MSTC. We have many people there.

DSC_0730

Read the rest of this page »

YARV (The Official VM for Ruby 1.9) Instruction Set

Friday, April 18th, 2008

YARV is Yet Another Ruby VM. But it is the official Ruby VM since Ruby 1.9. It is a stack-based VM which runs the YARV bytecode, or intcode because each element of the instruction sequence is in fact stored as an int.

There is an instruction table for YARV on the original YARV homepage, but it is out-dated now. So I build a new one and make it publicly available in case some one would be interested in.

Koichi said that the instructions are not stable currently. Something might going to change later. Some instructions that are not used frequently might be modified or even removed. But I guess at least those two instructions will be kept: ;)

  • bitblt: returns "a bit of bacon, lettuce and tomato".
  • answer: the answer to life, the universe, and everything. It returns 42.

Yes, there they are, both in the joke category of the instruction set. :D

Read the rest of this page »

BleakHouse 4! Find memory leak in your Ruby program!

Sunday, April 6th, 2008

今天看到 BleakHouse 发布了第 4 版的消息:采用了全新的实现:

…there is no framing necessary, and the analysis task runs in seconds instead of hours.

我在“内存泄漏分析工具的尴尬”这篇文章中曾经介绍过 BleakHouse ,不过那个时候用
BleakHouse 会产生出无比巨大的 dump 文件,并且分析过程需要花费非常多的内存和时间,有时基本成了不可能完成的任务。现在看到 v4 发布消息里提到了新的实现,实在是大喜!因为这本身是一个非常不错的工具。如果你有发现你的 Ruby 程序性能低下,一定要试一试这个工具:

Read the rest of this page »