Archive for the ‘Bug Archive’ Category

‘\n’ at the end of each line

Thursday, August 14th, 2008

其实我记忆中已经不止一两次碰到这个问题了,在今天又被它折腾了几个小时之后,我决定一定要把它记录在 Bug Archive 中。这就是从文件中一行一行地读取文本并进行处理的时候,末尾的那个换行的问题。

在 C 语言里一般不会遇到这样的问题,因为 C 语言有 getsfgets 两个函数,其中前者会去掉 '\n' ,而后者则会保留 '\n' ,而一般人都记不住这两者分别的行为,所以使用的时候都会小心翼翼。

而在 Python/Ruby/Perl 这样的语言中,从文件中按行读取简直太容易了,写一个处理文本行的程序就那么简单,可是却容易忘记了末尾的换行符。原本我想得到的是一个 "foobar" 的字符串,结果得到了 "foobar\n" 我还浑然不觉,继续处理,到最后结果完全出乎意料。

下次一定要记住,读入的文本行包含了换行符,如果必要的话,调用 strip 将它剔除!其实我觉得好像大多数情况下都不需要末尾的换行符,也许读取行的函数默认剔除换行符更加方便一些呢! :)

Another pitfall in Python: == and !=

Thursday, July 24th, 2008

Besides the False == 0 problem, I encountered another problem that surprised me. Consider the following Python code:

1
2
3
4
5
6
7
8
9
10
11
class Foo(object):
    def __eq__(self, other):
        if isinstance(other, Foo):
            return True
        return False
 
f1 = Foo()
f2 = Foo()
 
f1 == f2  # => True or False ?
f1 != f2  # => True or False ?

What do you expect the result to be in the last two lines?

Read the rest of this page »

False == 0 in Python!?

Saturday, July 19th, 2008

bug.pngI came across a very strange error when tweaking the skime compiler. I decided to use the more specific push_0 and push_1 instruction instead of the general push_literal when the literal is 0 and 1 respectively. However, after added this code, several test cases was broken immediately.

After examining the execution of the failing test cases, I found that I got a 0 when expecting a False. And I finally found the problem is that I use if literal == 0 to test whether I got a literal 0. But unfortunately, False == 0 (as well as True == 1) evaluates to True in Python. So there’s the bug.

I’m feeling rather surprised when I found False == 0. Yes, I know we use 0 to represent a false value in C, but I really don’t expect it to be in Python — such a high-level and duck-typing language. However, since it is already there, I have to remember.

Play with GC: mark your treasure

Wednesday, May 21st, 2008

bug.pngGarbage collecting is amazingly useful. It is a must-have of any modern language. You never need to concern about when to free the allocated memory again. Just allocate, those objects not used will be collected automatically at a some time.

Yes, it’s true. But, wait, it’s not true! I still remember Bjarne Stroustrup had said this:

Complexity will go somewhere: if not the language then the application code.

There will be someone that will be frustrated at dealing with all those garbages. Sometimes that person is just you. So knowing how garbage collector works is still necessary. Though I was always interested in garbage collecting algorithms, I only realized this after I spent a whole after debugging a frustrating heisenbug.

Read the rest of this page »

Fixnum Overflow in Ruby’s Hash Implementation

Sunday, March 2nd, 2008

Ruby’s build-in Hash is the first-choice if you want to do searching. Using your own customized object as hash key is simple: define the following two method for your object:

  • hash: to get the hash code of the object.
  • eql?: to compare whether two object are equal.

When working to improve the performance of RMMSeg, I tried to implement a Substring class which can hold a reference to a big chunk of text instead of doing an expensive copy. Then I implemented the hash and eql? method. The hash value calculated is identical to the related String, and eql? is properly implemented. But the whole thing seemed not working quite well.

Read the rest of this page »

Ruby: Caution with sub/gsub

Friday, February 1st, 2008

如果你不喜欢听我讲故事,那么请直接跳到末尾。其实故事很简单,最近几天的故事都是和 RMMSeg 有关。这次我是在做 RMMSeg 的主页,昨天晚上(或者说今天凌晨)我做完了和 Ferret 的集成,并发布了 0.0.1 版。可以看到,主页我也做好了。

其实主页早就做得差不多了,只是还缺一个和 Ferret 配合使用的例子。现在那里已经有一个例子了,用 Ferret 的 Highlight 输出为 HTML 格式:

highlights = $index.highlight("content:#{key}", id,
                              :field => :content,
                              :pre_tag => "<font color=red>",
                              :post_tag => "</font>")

其实原来是一个用终端的 Escape Sequence 进行高亮的例子:

highlights = $index.highlight("content:#{key}", id,
                              :field => :content,
                              :pre_tag => "\033[36m",
                              :post_tag => "\033[m")

修改是迫不得已,我那个主页是用 Gerbil 生成的,不知道怎么回事, \033 被它搞成了一堆乱七八糟的东西,类似于这样:

5d2dedb7d78d6d1f0629ea781cb92b6822c8648e33

当时很郁闷,想大概是处理 \ 的时候的 BUG 吧,因为已经很晚了,不想再去追究,就改成了 HTML 格式发布了。今天早上起来,便想探个究竟。先做了一些实验,发现诸如 \t 之类的都是正常的,而数字就不正常了。关键是,那一长串东西是什么?

Read the rest of this page »

修改参数造成的问题

Sunday, December 2nd, 2007

这次也是做 Numerical Analysis 的作业出现的问题了。按照惯例,是学生提供一个函数,让 OJ 的 main 函数来调用,得到结果。这次是迭代取得矩阵最大特征值的题目,那个被调用的函数中传递进来一个二维数组 a 用来表示矩阵。

Read the rest of this page »

KDB2 开发小结

Monday, November 12th, 2007

最近消失了好久,主要是考试吧,大三课程不多,但是都是学得累得很的那种。还有就是课程 Project ,最近这个就是很著名的 MiniSQL 了,经常都听学长们说,做一个 MiniSQL 下来确实会收获很多的。本来也是要认真做的,但是时间估计失误,在 6 号的时候才得知是 11 号截止,所以最后有些仓促了,不过最后还是做完了,已知的 Bug 都修正并且通过了压力测试,心里面也是很高兴的。这里写下一点总结吧,一是给大家分享一下,也是留给自己将来看的,我的 Blog 专门有一个分类就是 Bug Archive ,我主要就是想把自己平时实际开发中犯的错误和遇到的 Bug 都搜集起来,时而看看,希望能够不再遇到同样的问题吧。这次开发时间虽然很紧,但是其实主要开发时间和调试用的时间差不多也该对半分了,所以这篇小结也必须得放到这个分类里了。 :P

Read the rest of this page »

给儿子取个够 Cool 的名字

Thursday, October 11th, 2007

你取的名字有这个 Cool 吗? :P

exploits_of_a_mom.png

来自 http://xkcd.com/327/

疯狂提交找错法

Wednesday, October 10th, 2007

做 ACM 的那些人应该也都知道传说中的“疯狂提交找错法”吧。就是如果你题目没有过的话,提交的罚时是不会在最后的分数里面扣掉的。当然是希望在尽量少的次数内过掉,但是情急之下,疯狂提交也是一个办法,不管怎么算它都是有好处的:

  • 如果最后题目 AC (Accept) 了,虽然罚时会让排名下降,但是不管罚时多少,多做出一道题的总比少做出一道题目的排名要靠前。
  • 如果题目没有 AC ,也并没有什么损失。

但是疯狂提交也必须要能“找错”,否则就没有什么意义了。今天我也非常疯狂地爽了一把,并且最后成功找到问题,把题目 AC 了。

Read the rest of this page »