Accessing Recursive Haskell Data Structures from C/Python

As promised we are going to explain how to access recursive Haskell data structures in C (and in combination with the previous post one should be able to also access these structures in Python). As a bonus we do it the other way around too.

However, before one can begin sending recursive data structures from Haskell to C and vice versa one first needs to be able to do it for non-recursive data structures as explained here. Go and read it!

In short, the Haskell FFI Addendum specifies an interface in the form of a type-class Storable. It states that when you call poke someptr haskell_value it will translate haskell_value to something the C Runtime System will understand. In the same manner, when you call peek someptr, it will construct a Haskell value by interpreting someptr in combination with the desired Haskell type and since this all is fairly dangerous, it happens in the IO monad.

There are some basic types for which the Storable instance in defined (e.g. Int), but for all user-defined types you need to do it yourself. There is currently no compiler to take a Haskell data type and turn it in readable C structs.

To make writing Storable instances somewhat easier there is a tool called hsc2hs which creates from a .hsc file containing macros a C source file, which when run creates a Haskell (.hs) source file. It is basically a program that takes into account the alignment of your data structures, so that you do not need to worry about it, since it uses offsetof (man 3 offsetof), which is a standard C macro defined in stddef.h.

The offsetof macro is also used in a macro to make the implementation of the alignment method a mechanical procedure:

#let alignment t = "%lu", (unsigned long)offsetof(struct {char x__; t (y__); }, y__)

To explain how it works we need a simple definition, quoting Wikipedia: "A memory address a, is said to be n-byte aligned when n is a power of two and a is a multiple of n bytes."

Some basic machine types have certain alignment restrictions, or work faster within such restrictions. So, suppose a 32 bit int on a 32 bit architecture needs to be 4 byte aligned, then if we apply the alignment macro, the equation $$x \text{ mod } 4 = 0, x \geq 1, x$$ minimal is solved. This is a stronger property than what is required according to the GHC documentation for the alignment method.

There are other tools available, but all of them are written with the assumption that one has a C library that needs a Haskell binding. An ideal solution would be to have a compiler which takes a Haskell source file and generates for every data declaration a corresponding C struct definition (or whatever foreign language you are using) including Storable instances. All it needs is a naming convention and some time :)

Ok, so much for the background information. On to the code.

Meta-note: If you know a better WordPress plugin (this one (EasyLaTeX) does not interpret the EasyLaTeX version of the LaTeX $ x > 1 $ correctly), please comment.

First I will dump the C code clib.c:

#include 
#include  /* to get malloc */
#include "clib.h"

/* O(Constant sweetness) space */
void print_list(TListElt * list)
{
  while (list->tag == LIST_CONS)  {
    printf("%s\n", __func__);
    printf("list->tag: %i\n", list->tag);
    printf("tag == LIST_CONS\n");
    printf("address of list: %p\n", list);
    struct Cons * f = list->elt.cons;
    printf("f: %p\n", f);
    printf("data: %d\n", f->data);
    list = f->next;
    printf("address of next list: %p\n", list);
    list = f->next;
  }
  if (list->tag == LIST_EMPTY) 
    {
      printf("list->tag: %i\n", list->tag);
    }
}

/* Generates a linear recursive process */
void recursive_print_list(TListElt * list)
{
 if (list != NULL)  {
  printf("%s\n", __func__);
  printf("list->tag: %i\n", list->tag);
  if (list->tag == LIST_CONS){
    printf("tag == LIST_CONS\n");
    printf("address of list: %p\n", list);
    struct Cons * f = list->elt.cons;
    printf("f: %p\n", f);

    TListElt * n = f->next;
    printf("data: %d\n", f->data);

    printf("address of n: %p\n", n);
    print_list(n);
    }
  }
}

void make_test_list()
{
  printf("Entered %s\n", __func__);
  TListElt second =
    {.tag = LIST_EMPTY
    };

  struct Cons thecons = {.data = 666, .next = &second};

  /* This syntax initalizes the first defined member of the union,
     if you want to initialize any other, you need to use a separate statement 
     (http://www.experts-exchange.com/Programming/Languages/CPP/Q_10120178.html)
  */
  TListElt first = {.tag = LIST_CONS, .elt = {&thecons}}; 

  print_list(&first);
  printf("Exiting %s\n", __func__);
}

TListElt * make_test_list_returning_list()
{
  printf("Entered %s\n", __func__);
  TListElt * second = malloc(sizeof(*second));
  TListElt * first = malloc (sizeof(*first));
  second->tag = LIST_EMPTY;

  struct Cons * thecons = malloc(sizeof(*thecons));
  thecons->data = 666;
  thecons->next = second;

  first->tag = LIST_CONS;
  first->elt.cons = thecons; 

  print_list(first);
  printf("Exiting %s\n", __func__);
  return first;
}

And the header:

/* Define a struct that is compatible with the Haskell representation  */
typedef struct {
    int a;
    int b;
} Foo;

void print_foo(Foo *);

void add_a(Foo *f);

enum ListTag {LIST_EMPTY, LIST_CONS};

struct Cons {
      int data;
      struct ListElt *next;
};

union UnionOfOneElement {
    // void empty; implicit
    struct Cons * cons;
  };

struct ListElt {
  enum ListTag tag;
  // use of union is redundant here, it's just to illustrate the general translation
  union UnionOfOneElement elt;
};

// just to illustrate we can also use TListElt instead of struct ListElt. 
typedef struct ListElt TListElt;

And the Haskell Main.hs (which is using all the stuff we define in the rest of this post):

module Main where

{- this is a Haskell program using C library functions
  and Haskell functions constructing values suitable for C. -}
import Data.List
import Foreign.Marshal.Utils
import Foreign.C
import Foreign.Ptr
import Foreign.Storable
import Foreign.Marshal
import Numeric
import ExportHaskellToCStruct

main = do
    ptr_to_list  From here this is a literate Haskell document (the hashes in the comments are being incorrectly processed by hsc2hs so that it will not work without some substitution) describing how to use Haskell values in C and vice versa. The former is not a popular thing to do, because it is a task for which many people expect shiny tools exist, but there are none and moreover bridging this gap requires knowledge of various languages; this is why we hope you will learn from this post.  It is intended to be read by anyone who wants to be able to share data structures between Haskell and some foreign language. Also, the level of acquaintance with these topics from the reader is assumed to be low; I am trying to bring the use of the Foreign Function Interface (FFI) to a wider audience. 
  We will go through the code almost line by line. The very first line is to make the compiler accept foreign declarations. Then we define a Haskell module named ExportHaskellToCStruct and include "HsFFI.h", because hsc2hs does not. HsFFI.h contains the interface needed to initialize the Run-Time System of The Glasgow Haskell Compiler, in short the GHC RTS. Furthermore, we include the user-defined clib.h header to get access to the various C symbols we use.  \begin{code}
{-# LANGUAGE ForeignFunctionInterface #-}
module ExportHaskellToCStruct where
import Data.List
import Foreign.Marshal.Utils
import Foreign.C
import Foreign.Ptr
import Foreign.Storable
import Foreign.Marshal
import Numeric
#include "HsFFI.h"
#include "clib.h"
\end{code}
      
  Then we also include stddef.h, which is a standard C header containing the offsetof macro, and define alignment (as discussed before).  \begin{code}
#include <stddef.h>
#let alignment t = "%lu", (unsigned long)offsetof(struct {char x__; t (y__); }, y__)
\end{code}

  Create corresponding Haskell values for the enumeration from C, first argument is Haskell type, second argument is possible wrapper constructor (e.g. newtype Wrapper = Wrapper Int), the rest of the arguments are the C constructors. In this case we have not defined the second argument (the ,, in the code). The exact symbols being generated are used later in the code and I will refer back to here. Every identifier preceded by a # in this source code, except for CPP directives, denotes something that will be processed by hsc2hs. hsc2hs either generates new identifers, like in the #enum case or it constructs an anonymous function that will for example access certain fields in a C struct (e.g. #peek).   \begin{code} 
#enum Int,,LIST_EMPTY, LIST_CONS
\end{code}

  This defines a simple recursive data structure, which happens to be a list. The type-synomym merely saves some typing.  \begin{code}
data MyList = MyNil | MyCons Int MyList deriving Show
type MyListPtr = Ptr MyList
\end{code}

 Take the function print_list as defined in clib.h and make it available as f_print_list in Haskell. \begin{code}
foreign import ccall "static clib.h print_list"
    f_print_list :: MyListPtr -> IO ()
\end{code}
Now we have landed at the start of the most important part of the code: the translation between Haskell and C objects. The pattern for sizeOf and alignment should be fairly boring. One thing that might be confusing is that these functions ignore their first argument. That is simply because the type signature of a class method should mention the type variable of the type class at least once. The system could have been designed in another way, but it is not. Sorry. 
 \begin{code}
instance Storable MyList where
 sizeOf _ = (#size struct ListElt)
 alignment _ = #{alignment TListElt}
\end{code}

 The method peek takes as input a C value and constructs from it a Haskell value of the desired type in the IO monad. We need to define this mapping and in this particular peek implementation we are going to translate a pointer to a struct ListElt (== TListElt) containing CInts to a Haskell list containing Ints. We either return MyNil, in case the tag indicates LIST_EMPTY, which is written like listEmpty in Haskell terms, or a MyCons   in case of a LIST_CONS (which is listCons in Haskell). Just stating the obvious, this is why we needed the #enum earlier. The code starts with some debugging information, which is nice so you can see what is happening. \begin{code} 
 peek ptr = 
  do
    putStrLn "Trying to construct a Haskell value from the C one"
    putStrLn ("Before the peek at " ++ show ptr)
    tagInHaskell <- (#peek TListElt, tag) ptr
    putStrLn ("tagInHaskell: " ++ show tagInHaskell)
\end{code}

 The first case checks whether the value we read in equals LIST_EMPTY in C, and if it is true we return the base case MyNil. The non-trivial case on which we explicitly check, is the listCons case. \begin{code} 
    if tagInHaskell == listEmpty 
     then return MyNil
     else
      if tagInHaskell == listCons
      then do 
       putStrLn ("We are going to cons...")
\end{code}
We find a pointer to the Cons structure and in this structure we inspect (peek) the data field and bind it to c_data. We also find where the next list is located and then call peek recursively so that we get the rest of the list. Then we just combine the two and we have a valid Haskell MyList value. 
\begin{code} 
       ptr_to_cons <- (#peek TListElt, elt) ptr
       putStrLn ("ptr_to_cons: " ++ show ptr_to_cons)
       c_data <- (#peek struct Cons, data) ptr_to_cons
       putStrLn $ "cdata: " ++ show c_data
       pointer_to_next_element <- (#peek struct Cons, next) ptr_to_cons
       putStrLn ("pointer_to_next_element: " ++ show pointer_to_next_element)
       haskell_list <- peek pointer_to_next_element
       return (MyCons c_data haskell_list)
       else
         -- this is being overly cautious  
           error "Impossible state"
\end{code}

 Here we obtain an address (the ptr) and we need to somehow fill it with a C value; we are marshalling a Haskell value to C. The implementation is defined in poke_implementation_for_MyList. \begin{code}
 poke ptr haskell_list = do 
      poke_implementation_for_MyList ptr haskell_list
      haskell_value_that_should_be_equal_to_haskell_list <- peek ptr
      putStrLn (show haskell_value_that_should_be_equal_to_haskell_list)
      return ()
\end{code}
The poke implementation is going to be using O(1) stack space to run. The C value we require is a ptr to a TListElt. The first case is again MyNil. 
\begin{code}
poke_implementation_for_MyList ptr haskell_list = 
 case haskell_list of 
  MyNil -> do
    putStrLn $ "Building a C value corresponding to MyNil at " ++ show ptr
\end{code}

 We construct a poke function with #peek and set the tag field of the structure pointed to by ptr to listEmpty (which is LIST_EMPTY/0 in the C world). \begin{code}
    (#poke TListElt, tag) ptr listEmpty 
\end{code}
The other case is that we want to translate the MyCons case to C. We need to translate the MyCons constructor to a LIST_CONS with the haskell_data value and the rest of the list. 
  We set set the tag to be the LIST_CONS, verify it has been written correctly and then we somehow need to set the next field to some address. Since we do not have such an address, we need to obtain one with malloc. We will not free this memory in this code, but you can do it yourself if you want with free (either the C free or the Haskell Foreign.Marshal.Alloc.free). In C you need to specify how much memory you want, but the Haskell malloc does not take an argument. How does the RTS know what to do? In Haskell this is done by using the Storable type-class and a call to sizeOf in the implementation of malloc. In this case one could point (haha) out, that one is specifying the type, but look somewhat further and you see a call to malloc where the type is being deduced by the type-inferencer. We also set the data field to the haskell_data value.\begin{code}
  MyCons haskell_data more_haskell_list -> do
    putStrLn $ "Building a C value corresponding to MyCons at " ++ show ptr
    (#poke TListElt, tag) ptr listCons
    tagJustWritten <- (#peek TListElt, tag) ptr 
    putStrLn $ "Tag set: " ++ show (tagJustWritten `asTypeOf` listCons)

    -- first construct a struct Cons
    ptr_to_fcons_cell <- malloc::IO MyListPtr

    putStrLn $ "ptr_to_fcons_cell: " ++ show ptr_to_fcons_cell

    (#poke struct Cons, data) ptr_to_fcons_cell haskell_data
    should_contain_haskell_data <- (#peek struct Cons, data) ptr_to_fcons_cell
    putStrLn ("Value read: " ++ show (should_contain_haskell_data::CInt))
\end{code}
We let the elt union field point at the newly constructed Cons structure and we have the tail-recursive call to construct the rest of the list. A use of the #offset macro is also demonstrated, but is not used. (_offset_next_field is the offset of the next field in the Cons structure.) Finally we set the next field of Cons to the address of the allocated memory (this time the type is deduced by the compiler). 
\begin{code}
    (#poke TListElt, elt) ptr ptr_to_fcons_cell
    let 
        _offset_next_field = (#offset struct Cons, next)
    ptr_to_memory_that_will_contain_a_TListElt <- malloc 
    (#poke struct Cons, next) 
      ptr_to_fcons_cell
      ptr_to_memory_that_will_contain_a_TListElt
    poke_implementation_for_MyList
      ptr_to_memory_that_will_contain_a_TListElt more_haskell_list
\end{code}
These functions were used to see whether we could call C code, and indeed we can. 
\begin{code}             
foreign import ccall "static clib.h make_test_list"
        f_make_test_list :: IO()

foreign import ccall "static clib.h make_test_list_returning_list"
        f_returning_list :: IO MyListPtr
\end{code}
The function Foreign.Marshal.Utils.with takes a Haskell value of type T, and gives it to a Haskell function that expects a Ptr T (i.e., T * name in C notation) and then returns whatever value this Haskell function returned. 
\begin{code}
printList my_list = with my_list f_print_list
\end{code}
This final piece of code just exports Haskell functionality to the rest of the world (e.g. C/Python) like we did before. 
\begin{code}
foreign export ccall foo :: Int -> IO CInt

foo = return . genericLength .  f

f :: Int -> [Int]
f 0 = []
f n = n:(f (n-1))
\end{code}

  Congratulations, you made it to the end. Your perseverance level has reached a new height. I hope you liked it.