esoteric R | R Calling C

Jeffrey A. Ryan
January 1, 2011

As an interactive exploratory language, R excels. The interpretted nature of the R engine rarely interferes with thoughtful analysis. Computationally expensive algorithms though, while easy to prototype in R, can often realize enourmous performance gains if moved to C. This will be the first in a series of articles looking at exploring the well worn path of incorporating C speed within an R workflow.

R Objects. C Power.


R is fast – when done correctly. Using vectorized code, pre-allocating memory, and being careful about making copies — R is often the only language you'll need. At times though, it may prove beneficial to drop down to a lower level language to speed up an otherwise expensive and often used computation. Luckily with R, it is easy to do just that.

This article will focus on calling C code using .Call – though Fortran, C++ and Objective-C source should be just as easy to work with. We use C since that is the preferred language of R. Future articles will examine the other language choices in detail.

We'll start the series on interacting with the API with a basic example, something simple to understand the process. We'll create a new version of base::rev1 — one that keeps arbitrary attributes of the original series instead of dropping them.

On The C Side

To implement our new rev, we need to create a new object with the values of the original – only in reverse order. We also need to copy the attributes from the input to the result.

      /* 
         esotericR_rev.c

         Copyright 2010 lemnica, corp.
         Distributed under GPL >=2
      */

      #include <R.h>
      #include <Rinternals.h>

      SEXP esoteric_rev (SEXP x) {
        SEXP res;
        int i, r, P=0;
        PROTECT(res = allocVector(REALSXP, length(x))); P++;

        for(i=length(x), r=0; i>0; i--, r++) {
           REAL(res)[r] = REAL(x)[i-1];
        }

        copyMostAttrib(x, res);
        UNPROTECT(P);
        return res;
      }
    

Taking the above line-by-line, we first include the R header files. These include defininitions that allow us to access and create objects for use at the R level.

      #include <R.h>
      #include <Rinternals.h>
    

Next we define our function and variables. For our function to work in R, we need to have it return a type SEXP, which is the C version of an R object. Note that the function takes one R object (SEXP) as an argument. All functions using the .Call interface use type SEXP for arguments.

      SEXP esoteric_rev (SEXP x) {
        SEXP res;
        int i, r, P=0;
        PROTECT(res = allocVector(REALSXP, length(x))); P++;
    

We define res to be our return object of type SEXP. The P variable is used here as a counter for R's garbage collection mechanism.

The final line of this chunk deserves some attention. Here a call is made to allocVector which creates a new SEXP – a numeric vector of the same length as our input object. At the C level REALSXP corresponds to 'numeric' data types in R2.

The assignment is then wrapped with PROTECT to prevent R's garbage collection from removing the new object. Protection is rather opaque to the developer, though simply following along with the general idea is all that is required to stay out of trouble. We increment P to keep track of how many objects we will need to UNPROTECT before we exit the function3.

The main work of our function is done in the next three lines. We loop over the values of the result and original object in a way that reverses the observations. The two API functions that matter here are length – which returns the length of the objects data elements, and REAL – which extracts the data component from the large R object4.

        for(i=length(x), r=0; i>0; i--, r++) {
           REAL(res)[r] = REAL(x)[i-1];
        }
    

The final three lines involve copying original attributes to the results object, unprotecting the object, and finally returning this object back to the caller.

        copyMostAttrib(x, res);
        UNPROTECT(P);
        return res;
    

As you can see, writing C for use with R isn't very difficult. The simplicity of the API along with basic C code makes it easy to get started. One caveat on the above is that there are at least a few optimizations intentionally left out, but for now the idea is to get a feel for the process5.

The final step before moving back to R is to compile the code. For the sake of simplicity, we compile this on the command line using R to handle all the required linking.

      bash$ R CMD SHLIB esoteric_rev.c
    

This will result in either a file named esoteric_rev.so on UNIX (including Linux and OS X) or esoteric_rev.dll on Windows. With that out of the way, we can now move back to R.

Calling C from R

Calling our new function is relatively easy. We need to load the newly created library object (the .so or .dll), and call it with the built-in .Call.

      dyn.load("esoteric_rev.so")
      
      obj <- structure(seq(0,1,0.1), obligatory="hello world!")

      .Call("esoteric_rev", obj)
      [1] 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
       attr(,"obligatory")
       [1] "hello world!"
 
      rev(obj)
      [1] 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
    

As desired, our C implementation of rev now keeps arbitrary attributes. While this could be done in pure R, the advantage of doing this at the C level instead of in R is that we won't incur the copying cost as the object is modified in place before returning to the the R session. For small objects this isn't important, but for larger objects it may make a significant difference in time and memory.

Conclusion

Implementing code in C is easy. While it is often the case that most things are best left in R, there are situations where writing a quick function in C to get around some of R's problems with large objects can make a lot of sense. Future articles will explore more of the R API in detail, including further optimizations that can make C code even faster.

esotericRTM is edited and published by lemnica. It covers common parts of R in-depth, and examines the lesser known aspects of programming with R – from beginner to advanced. Submissions from authors, developers, and users are encouraged.

Source code and articles can be found at

www.lemnica.com/esotericR

keywords:
C, .Call, optimization, compiled

1 base::rev is already fast of course, as it is simply a subset in reverse order. One reason that you may find yourself writing something so trivial in C is that you are looking to extract performance gains when dealing with large objects. At the R level, attaching attributes results in a copy, if we move this to C we get object manipulation without automatic copies. This makes a difference when dealing with big objects.

2 R's native integer, numeric, complex, characters and raw bytes are implemented as INTSXP, REALSXP, CPLXSXP, STRSXP and RAWSXP in C.

3 We use a counter to illustrate a good practice in R, as it makes it easy to keep track of how many objects need to be unprotected before returning.

4 An SEXP is just a pointer to an opaque structure defined by the API. This is an R object where one element is the data. REAL( ) and related functions extract the data component.

5 Optimizations such as using pointers in place of calling REAL( ) in the loop, and assigning length(x) to a variable to avoid the repeated function overhead.

About the author

Jeffrey Ryan is the founder of lemnica corp., a Chicago firm specializing in statistical software, training, and on-demand support. He helps organize the R/Finance conference series [www.RinFinance.com], and is a frequent speaker on software related topics. He is the author or co-author of a variety of R packages involving finance, large data, and visualizations including quantmod, xts, Defaults, IBrokers, RBerkeley, mmap, and indexing. He currently lives in Chicago, Illinois with his wife and three children.




Copyright 2011 lemnica, corp. All rights reserved.