Fixnum Overflow in Ruby’s Hash Implementation

March 2, 2008 – 10:20 am

Ruby’s build-in Hash is the first-choice if you want to do searching. Using your own customized object as hash key is simple: define the following two method for your object:

  • hash: to get the hash code of the object.
  • eql?: to compare whether two object are equal.

When working to improve the performance of RMMSeg, I tried to implement a Substring class which can hold a reference to a big chunk of text instead of doing an expensive copy. Then I implemented the hash and eql? method. The hash value calculated is identical to the related String, and eql? is properly implemented. But the whole thing seemed not working quite well.

I though it’s my code’s fault because it’s the first time for me to write a C extension of Ruby. I use gdb to trace the program — it’s very hard to do, because Hash is a very commonly used data structure in Ruby. Many core libraries use it. :(

However, finally I figured it out (after a sleep) and created a small piece of code to reproduce it:

class MyStr
  def initialize(str)
    @str = str
  end
  def hash
    @str.hash
  end
  def eql?(o)
    @str.eql?(o)
  end
end
 
s1 = "foo"
s2 = "This"
my_s1 = MyStr.new(s1)
my_s2 = MyStr.new(s2)
h = { s1 => "value of foo", s2 => "value of This" }
 
puts "h[my_s1] = #{h[my_s1].inspect}"
puts "h[my_s2] = #{h[my_s2].inspect}"

The expected output should be

h[my_s1] = "value of foo"
h[my_s2] = "value of This"

but the actual output is

h[my_s1] = "value of foo"
h[my_s2] = nil

So what’s wrong with is? Why “foo” is right but “This” is wrong? Looking at the code of Hash in Ruby answers the question. Ruby’s treating String specially. When the key is a String, it use rb_str_hash directly to calculate the hash value.

rb_str_hash returns an int. But user customized objects don’t have this special treatment. The hash method is called in the Ruby environment returning a Fixnum which finally converted to an int.

The problem is that the value range of a Fixnum is small than int. The calculated hash value of “This”, 2073740424, when converted to Fixnum and then converting back, finally becomes -73743224.

That’s the problem. When key is a String, its hash is the original 2073740424. But when not, its hash becomes the weird -73743224. It’s a bug, not only with String, but also Symbol and Fixnum. I’ve post the bug report and a suggested patch to ruby-core ML and Rubyforge. Hope it get fixed soon. :)

  1. One Response to “Fixnum Overflow in Ruby’s Hash Implementation”

  2. This bug if fixed in the repository. Maybe it will be in the upcoming 1.8.7 release. :)

    By pluskid on Mar 4, 2008

Post a Comment