On the Rubinius FFI

June 17, 2008 – 6:01 pm

rubiniusContents:

The need for glue code

Ruby is a powerful language, but sometimes you’ll still want to interactive with some native functions written in C/C++. C/C++ and Ruby can not call each other directly, so you’ll need to add a glue layer. There are generally two ways to write the glue layer.

In Ruby, this is achieved by writing some wrappers in C/C++. Those wrappers will follow Ruby method calling conventions (first argument is self, always return a VALUE, etc.) and needed to be registered to the Ruby interpreter so that they can be called in Ruby.

For example, suppose you want to call the C function printf in Ruby. You’ll have to write the wrappers like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <ruby.h>
 
static VALUE
dummy_printf(VALUE self, VALUE format, VALUE num)
{
    int rlt = printf(RSTRING_PTR(format), NUM2INT(num));
    return INT2FIX(rlt);
}
 
void Init_printf()
{
    /* register the method to the Kernel module */
    rb_define_method(rb_mKernel, "dummy_printf",
                     RUBY_METHOD_FUNC(dummy_printf), 2);
}

Then you’ll need to write a simple extconf.rb:

1
2
3
require 'mkmf'
 
create_makefile('printf')

Run ruby extconf.rb to generate the Makefile and run make to compile the glue code as a shared library. Then you can require the library in Ruby and call the wrapper function dummy_printf, it will convert the Ruby values to proper C values, call the real printf function and convert the result back to Ruby:

irb(main):001:0> require 'printf'
=> true
irb(main):002:0> dummy_printf "hi! This is the number: %d\n", 6
hi! This is the number: 6
=> 26

This is supported in Rubinius through the Subtend layer. Subtend is the MRI-compatible C interface that allows external code to call Ruby code running on Rubinius. It is provided so that external libraries written to the MRI C-interface can work with Rubinius, with only a re-compile required.

The Rubinius FFI

Rubinius also support another way of writing the glue code — writing the wrappers in Ruby! That is supported by the FFI (Foreign Function Interface) of Rubinius. Eric Hodel has posted an article to introduce the Rubinius FFI. In a word, if we re-write the wrapper of the previous example through the Rubinius FFI, it would look like this:

1
2
3
4
5
6
7
module Kernel
  attach_function 'printf', :dummy_printf, [:string, :int], :int
 
  def dummy_printf(*args)
    Kernel.dummy_printf(*args)
  end
end

the call to attach_function in line 2 will setup a stub (wrapper), which will convert the arguments and return value automatically for you. The function will be attached as a singleton method in the module. So we define a method (with the same name) and forward the call to the singleton method. This is added so that we can call the method simply like a global function in IRB:

irb(main):001:0> dummy_printf "hi! This is the number: %d\n", 6
hi! This is the number: 6
=> 26

This is simple and clean! :)

The FFI document

The most commonly usage of FFI should be to call Module#attach_function to create a wrapper for a native function:

Module#attach_function(name, a3, a4, a5=nil)

call sequence:

  • attach_function(func_name, method_name, arg_types, ret_type) => method
  • attach_function(func_name, arg_types, ret_type) => method

params:

  • func_name: The name of the native function. e.g. "printf".
  • method_name: The name of the wrapper method to be attached. e.g. :dummy_printf. Will be func_name.to_sym if not given.
  • arg_types: An array of the type of arguments accepted by the native function in that order. e.g. [:string, :int].
  • ret_type: The return type of the native function. e.g. :int.

The native function is searched in the libraries specified by the @ffi_lib instance variable of the module. Call Module#set_ffi_lib to set @ffi_lib of a module:

1
2
3
module Foo
  set_ffi_lib("libz", "libreadline")
end

set @ffi_lib to FFI::USE_THIS_PROCESS_AS_LIBRARY (the default value) to let Rubinius search the native function in the current process. All functions defined in the Rubinius code base as well as other libraries like libc are available in the image of current process.

If the native function is found, a stub will be created and attached as a singleton method of the module. The following basic types are supported:

name Ruby type C type
:int/:uint Integer int/unsigned int
:char/:uchar Integer char/unsigned char
:short/:ushort Integer short/unsigned short
:long/:ulong Integer long/unsigned long
:long_long/:ulong_long Integer long long/unsigned long long
:float/:double Float float/double
:string String char *
:pointer MemoryPointer void *
:object Any object OBJECT
:state [1] rubinius_state *
:strptr or :string_and_pointer [2] [String, MemoryPointer] char *
:void [3] nil void
:char_array [4] char [] void
  1. This is only used as function argument type. Then the current state of Rubinius will be used as the parameter. It’s invisible, i.e. doesn’t get seen as in onbound arg by Ruby. And it can only be the first argument.
  2. The Ruby value is an Array of a String and a MemoryPointer. The String is a copy of the original C string, while the MemoryPointer is pointed to the original memory. This is generally used when you’ll have to manually free the memory (through the MemoryPointer object).
  3. This is only used as function return type. Then the return value will be nil in Ruby.
  4. This is only used in the layout spec of a C struct to represent an embedded char array of a struct.

However, when you are attaching functions, you should always use the specific type if possible. For example, for the chmod function:

attach_function 'chmod', [:string, :mode_t], :int

:mode_t should be used instead of :uint here. They are available through typedefs to the FFI. You can add your own typedef by calling FFI.add_typedef:

FFI.add_typedef(:ulong, :my_long)

However, this is generally not encouraged. Because those type definitions are usually platform dependent, they should not be specified in Ruby. Fortunately, almost all the well known typedefs are automatically available in Rubinius (see runtime/platform.conf). They are automatically generated at compile time, so they are guaranteed to be well defined for your platform.

MemoryPointer

MemoryPointer is Rubinius’s fat pointer class. It represents an actual pointer, in C language terms, to an address in memory. They’re called fat pointers because the MemoryPointer object is an wrapper around the actual pointer, the Rubinius runtime doesn’t have direct access to the raw address.

MemoryPointer.new(type, count=nil, clear=true)

call sequence:

  • MemoryPointer.new(num) => MemoryPointer instance of num bytes.
  • MemoryPointer.new(sym) => MemoryPointer instance with number of bytes need by FFI type sym.
  • MemoryPointer.new(obj) => MemoryPointer instance with number of obj.size bytes.
  • MemoryPointer.new(sym, count) => MemoryPointer instance with number of bytes need by length-count array of FFI type sym. Like above, sym here can also be num or obj.
  • MemoryPointer.new(arg) { |p| ... }

The form without a block returns the MemoryPointer instance. The form with a block yields the MemoryPointer instance and frees the memory when the block returns. The value returned is the value of the block.

When clear is set, the memory will be zero-ed automatically. By default, MemoryPointer object is in autorelease mode: when the Ruby GC collect the MemoryPointer object, it will automatically free the memory. This can be turned off by:

ptr.autorelease = false

Then you might need to call ptr.free explicitly to free the memory when necessary.

MemoryPointer#[](which)

Access the MemoryPointer like a C array, accessing the which number element in memory. However, this is not equivalent to the C version of subscription of a pointer. The return value will also be a MemoryPointer object. For example:

1
2
ptr = MemoryPointer.new(:int, 20)
new_ptr = ptr[9]

is equivalent to the following C code:

1
2
3
int *ptr = (int*)malloc(sizeof(int) * 20);
int *new_ptr;
new_ptr = &ptr[9];

A + method is also provided to do explicit pointer arithmetic. Unlike the pointer arithmetic in C, the offset is always in byte. So ptr[1] is equivalent to ptr + ptr.type_size instead of ptr + 1.

read_xxx

To read the content of the pointer, call the functions named read_int, read_long etc. Note read_string is a bit different to other read functions:

  • Ruby ptr.read_int is equivalent to C *(int *)ptr, but Ruby ptr.read_string is equivalent to C (char *)ptr — the pointer itself is considered as a char array (char *) instead of a pointer to a char array (char **).
  • You can pass an optional length parameter to read_string. This is useful when the string contains NUL-character ('\0').
read_array_of_xxx(n)

There are also functions like read_array_of_int, read_array_of_long, etc. The return value will be an Array of n values. There’s no read_array_of_string because it is nonsense here.

write_xxx(value)
write_array_of_xxx(array)

Similar writer functions are provided to update the content of the pointer.

Note: write_string is implemented by memcpy. If the original char * pointer is pointed to some constant string literal or the previous allocated memory is not big enough to hold the new string, it will cause serious problems without warning.

Note: MemoryPointer exposes direct, unmanaged operations on any memory. It therefore must be used carefully. Reading or writing to invalid address will cause bus errors and segmentation faults.

FFI::Struct

FFI::Struct enables you to manipulate a C struct in Ruby easily. However, you’ll need to let Ruby know the layout of the struct before you can use it. Because the layout of a struct is highly platform dependent (it depends on size of basic types like int, the aligning and padding, etc.), it is often a good idea to generate the layout spec in C code at compile time, which, in turn, is not so convenient. But if the struct is well known, it might be already supported by Rubinius, see the doc for FFI::Struct.config.

The layout spec is groups of three-element tuples of name, type and offset. Here’s an example:

1
2
3
class MyStruct < FFI::Struct
  layout(:a, :int, 0, :b, :int, 4)
end

Then you can create an instance of the struct by struct = MyStruct.new. Alternatively, you can also provide the spec on the fly instead of creating a sub-class of FFI::Struct:

struct = FFI::Struct.new(nil, :a, :int, 0, :b, :int, 4)

FFI::Struct is wrapping a MemoryPointer object. The first parameter of the code snippet above made nil to have it allocate a new struct in memory.

FFI::Struct.config(base, *fields)

Because the layout of a struct is highly platform dependent, Rubinius collected the layout information for some well known structures at compile time and make them available to the Ruby side. So

1
2
3
4
class AddrInfo < FFI::Struct
  config("rbx.platform.addrinfo", :ai_flags, :ai_family, :ai_socktype,
         :ai_protocol, :ai_addrlen, :ai_addr, :ai_canonname, :ai_next)
end

can setup the layout automatically. The generated structure layout information (and other information like typedefs, C macro constants, etc.) can be found in runtime/platform.conf, which is generated by rakelib/platform.rake when building the VM.

FFI::Struct#[](field)
FFI::Struct#[]=(field, value)

Read and write of struct field is fairly simple. The conversion between C and Ruby is automatically once the layout spec is setup properly:

1
2
a = struct[:a]
struct[:b] = 5

There are other helper methods like size to get the size of the struct, dup to do a memcpy of the struct and free to free the memory explicitly, etc.

NativeFunction
FFI.create_function(library, name, args, ret)

When we call Module#attach_function, it create a NativeFunction by calling FFI.create_function and attach it to the module.

You can also create unattached NativeFunction object like this:

1
2
3
4
5
func = FFI.create_function(FFI::USE_THIS_PROCESS_AS_LIBRARY,
                           "printf",
                           [:string, :int],
                           :int)
func.call "hi! This is the number: %d\n", 6

There’s a helper class NativeFunction::Variable that lets you call native functions with variable number/type of arguments (like printf):

1
2
3
4
5
6
7
func = NativeFunction::Variable.new(FFI::USE_THIS_PROCESS_AS_LIBRARY,
                                    "printf",
                                    [:string],
                                    :int)
 
func[:int].call "hi! This is the number: %d\n", 6
func[:string, :double].call "Wow, %s is %f\n", "here", 5.5

First create the function by specifying the fixed parameters (the char *format for printf), then use the type of the variable parameters to get the correct function to call.

Rubinius FFI vs. Python ctypes

Python has a very similar module named ctypes. ctypes has been added the the standard library since Python 2.5.

Python ctypes is generally more powerful than the current Rubinius FFI:

  • typedef: Rubinius FFI only support simple typedefs through FFI.add_typedef. But Python ctypes support arbitrary user defined types as both function arguments and return values as long as they follow some duck-typing conventions.

    For example, you can specify a callable as the return type of a function. Then you can check for error the return values in the callable and automatically raise an exception.

  • pointer: MemoryPointer objects in Rubinius FFI are simple raw pointers, you’ll need to use methods read_int, read_long to access the content. But ctypes pointers carry type information with them.
  • platform dependent stuff: Both Rubinius FFI and Python ctypes can not handle platform dependent stuff perfectly. C generally handle those at compile time and those information is not available from the compiled shared library. Examples include:
    • C macro constants (#define).
    • C typedefs.
    • C structs layout.
    • Byte ordering.
    • Aligning and padding.
    • etc.

    However, both do provide mechanisms to ease the task. Python ctypes try to use the same byte order, alignment, etc. in the same way the C compiler does it with respect to the native platform.

    Rubinius FFI takes another way: It collect all the well known C macro constants, typedefs and structs layout at compile time and make them available to the Ruby world. So the use of well known stuffs is very convenient. When you are dealing with some very specific things, similar way is recommended: collect the platform dependent information at compile time and make them available to the Ruby world — this is not very convenient, but definitely will yield the most correct result. :)

  • value: You can read the value of a global variable in a shared library in Python ctypes. This is not supported in Rubinius FFI currently.
  • callback: Python callables (even closures) can be used as callbacks in ctypes. However, callback is not supported in the Rubinius FFI currently. That’s because Rubinius uses Spaghetti stack and so requires a whole new C stack (woo getcontext) in order for C to be able to call back into Ruby land (when it switches back to the original context). And this will complicate the FFI a bit.

This is not a complete list of comparisons between Rubinius FFI and Python ctypes. And in my view, while Rubinius FFI lacks some features of Python ctypes, it is usually not a problem. Both should be used only to write simple glue codes.

If you want to re-write some C library codes in Rubinius FFI or Python ctypes, you’ll find the result more verbose than the original C version and generally hard to understand. Moreover, something like macros and inline functions in C/C++ cannot be handled in FFI and ctypes. Finally, one can crash the whole VM easily in both Rubinius FFI and Python ctypes, so do use it carefully.

The future of FFI

The current implementation is clean and simple. However, though I emphasized that FFI should be used to write just simple glue codes, there are still some room for improvements.

For example, the arity is not checked in the native function call currently. (Updated at 2008-06-17 23:29) Maybe this can be easily added. After some experimenting, I’ve added the code to do the arity check to the trunk.

(Updated at 2008-06-17 23:07) Another example is to improve the performance of calling a wrapped native function. It is mentioned in the Rubinius document that GNU lightning is used to implement the Rubinius FFI. But the current implementation doesn’t use GNU lightning at all. The type conversion of parameters and return values of the wrapped functions are done through a big switch each time when the function is called.

If the FFI is used extensively, the performance would be better when GNU lightning is used to generate native wrapper code instead of parsing the types every time. I guess maybe this is scheduled to be added (or is the initial idea) so that the document mentioned GNU lightning. The original version of FFI is implemented with GNU lightning, which generate native code on the fly to avoid the big switch thus runs very fast. But due to various reason (e.g. GNU lightning is really hacky and not very cross platform, long long is not supported, etc.) libffi is adopted currently. There’re already some initial work on supporting JIT in Rubinius, and the FFI may use the JIT engine to generate more efficient code in the future.

Another thing that would affect the FFI is the new VM — the whole VM (shotgun) is being re-written in C++. The FFI is related to the VM in many places. Maybe part of FFI might be also re-written in C++, but I guess the interface will be kept unchanged.

  1. 16 Responses to “On the Rubinius FFI”

  2. callbacks can be implemented with a spaghetti stack, GNU Smalltalk does it. it uses recursive invocations of the interpreter (resembling the Smalltalk->C->Smalltalk->C->Smalltalk->… call stack). Each interpreter is responsible for a callback, which is run into a separate Smalltalk process; as each process is started, the calling process is suspended thus giving the illusion of a C-like stack.

    By Paolo Bonzini on Jun 17, 2008

  3. How does FFI handle varargs? printf(), for example, can have more than two arguments….

    By Shalon Wood on Jun 17, 2008

  4. @Paolo Bonzini,
    Sounds interesting. But in this form: Smalltalk1 -> C -> Smalltalk2 -> C …, I guess Smalltalk2 may need share many thing from Smalltalk1. Though Rubinius support multi VM, the VMs are generally designed to share nothing between each other.

    By pluskid on Jun 17, 2008

  5. @Shalon Wood,
    NativeFunction::Variable is used to support this. There’s already an example in the article that uses NativeFunction::Variable to demonstrated printf with variable arguments. :)

    By pluskid on Jun 17, 2008

  6. Great article!
    Sorry that our FFI docs are wrong; we have gone through a number of implementations, including switching from GNU Lightning to libffi
    Those definitely need to be updated.

    By Wilson Bilkovich on Jun 17, 2008

  7. @Paolo,

    I’ve consider allowing for recursive interpreter loop calls. The biggest thing that it complicates is the GC. We’ve got an accurate GC, and I really do not want to get into write code to sweep the C stack, since that automatically makes it not accurate. Recursive interpreter calls means there are C frames which likely contain refs that have to be seen, thus, I haven’t done it.

    By Evan Phoenix on Jun 18, 2008

  8. 土问怎么给图片加阴影边框?

    By james on Aug 13, 2008

  9. @james,
    有很多种方法,本文里的那个 logo 的阴影和边框应该是用 ImageMagick 加的,参见这篇文章。另外,Windows Live Writer 或者一些截图程序以及大多数强大一点的图片编辑器应该也可以的。

    By pluskid on Aug 14, 2008

  10. Is there a way to implement callback functions within FFI?

    I am checking how it could be possible to implement the sqlite3-ruby gem for jruby. The problem is that the C-extensions need to register C-callback functions that are called from the sqlite3-runtime at the appropriate moment.

    Right now I have no clue how this could be accomplished. Even if all sqlite3 functions have been wrapped via FFI I would need to attach a ruby function in C to be used as callback. Any ideas?

    Regards Klaas

    By Klaas on Sep 2, 2008

  11. @Klaas,
    Unfortunately, I’m afraid callback is not supported in the current version of FFI (search “callback” in this page to find the reason).

    By pluskid on Sep 2, 2008

  12. Is there a workaround for callbacks? Maybe I just have to think out of the box to get the C API working with FFI?

    Will the support by added in the future or is FFI “finished”?

    By Klaas on Sep 2, 2008

  13. @Klaas,
    I’m not sure. But the FFI API version 1 is just being discussed, you may ask in the mailing list.

    By pluskid on Sep 2, 2008

  14. I asked on the mailing list, the support for callbacks is already added to the trunk version :) Here the discussion with an example by Wayne that implemented the stuff: http://groups.google.com/group/rubinius-dev/browse_thread/thread/acf9944eae0e6fca?hl=en

    By Klaas on Sep 3, 2008

  15. @Klaas,
    Oh! It’s really cool! Hope it can also be added to Rubinius soon. :)

    By pluskid on Sep 3, 2008

  16. Very Interesting post! Thank you for such interesting resource!PS: Sorry for my bad english, I’v just started to learn this language

    By Zashkaser on Aug 5, 2009

  1. 1 Trackback(s)

  2. Jun 26, 2008: This Week in Ruby (June 26, 2008) | Zen and the Art of Programming

Post a Comment