Romain Francois, Professional R Enthusiast

To content | To menu | To search

Monday, November 5 2012

OOP with Rcpp modules



The purpose of Rcpp modules has always been to make it easy to expose C++ functions and classes to R. Up to now, Rcpp modules did not have a way to declare inheritance between C++ classes. This is now fixed in the development version, and the next version of Rcpp will have a simple mechanism to declare inheritance.

Consider this simple example, we have a base class Shape with two virtual methods (area and contains) and two classes Circle and Rectangle) each deriving from Shape and representing a specific shape.

The classes might look like this:

And we can expose these classes to R using the following module declarative code:

It is worth noticing that:

  • The area and contains methods are exposed as part of the base Shape class
  • Classes Rectangle and Circle simply declare that they derive from Shape using the derives notation.

R code that uses these classes looks like this:

shapes.jpg

Thursday, October 25 2012

Rcpp modules more flexible


Rcpp modules just got more flexible (as of revision 3838 of Rcpp, to become 0.9.16 in the future).

modules have allowed exposing C++ classes for some time now, but developpers had to declare custom wrap and as specializations if they wanted their classes to be used as return type or argument type of a C++ function or method. This led to writing boilerplate code. The newest devel version allows for syntax like this:

The only thing the developper has to do is to declare the class using the macro RCPP_EXPOSED_CLASS. This will declare the appropriate class traits that Rcpp is using for internal implementations of as and wrap

One the example we can see three examples of the new functionality:

  • make_foo : this returns a Foo
  • cloner: this returns a Foo*
  • bla: uses a const Foo& as argument


Wednesday, December 14 2011

... And now for solution 17, still using Rcpp

Here comes yet another sequel of the code optimization problem from the R wiki, still using Rcpp, but with a different strategy this time

Essentially, my previous version (15) was using stringstream although we don't really need its functionality and it was slowing us down

Also, the characters "i" and "." are always on the same position so we can assign them once and for all

So without further ado, here is attempt 17:

With quite a speedup from attempt 15:

                test replications elapsed relative
2 generateIndex17(n)           20   9.363 1.000000
1 generateIndex15(n)           20  17.795 1.900566

Sunday, October 30 2011

Rcpp reverse dependency graph

I played around with reverse dependencies of Rcpp. At the moment, 44 packages depend on Rcpp and the number goes up to 53 when counting recusive reverse dependencies.

I've used graphviz for the representation of the directed graph

dep.png

Here is the code I've used to generate the dot file:

Sunday, April 17 2011

Rcpp article in JSS

The Journal of Statistical Software published our Rcpp article

Friday, December 3 2010

Evolution of Rcpp code size



I've been contributing to Rcpp for about a year now, initially to add missing bits that were needed for the development of RProtoBuf. This led to a complete redesign of the API, which now goes way beyond the initial code (that we now call classic Rcpp API). This has been quite a journey in terms of development with more than 1500 commits to the svn repository of the project on R-forge, and promotion with presentations at RMetrics 2010, useR 2010, LondonR and at Google, as well as many blog posts about Rcpp and the packages that derive from it.

I wanted to take this opportunity to express visually how vibrant the development of Rcpp has been since it was first relaunched in 2008, and since I started to contribute.

The graph below shows the evolution of the number of lines (counting the .h, .cpp, .R, .Rd, .Rnw files) accross released versions of the Rcpp package on CRAN

The first thing I need for this is to download the 32 versions of Rcpp that have been released since 0.6.0.

Then, all it takes is some processing with R to extract the relevant information (number of lines in files of interest), and present the data in a graph. I'm also taking this opportunity to have some fun with raster images and the png package

nlines_rcpp.png

The code explosion that started around version 0.7.8 marks the beginning of development of two of the most exciting and addictive projects I ever worked on: modules and sugar

The acceleration between 0.8.8 and the current version 0.8.9 represents many of the improvements that were made in modules. That alone, with more than 8000 new lines of code and documentation represents about 4 times as many lines as the total number of lines in 0.6.0

We still have plenty of ideas, and Rcpp will continue to evolve to deliver a quality interface between R and C++, to the best of the current team's abilities.

The full code is available below:

Wednesday, December 1 2010

RcppGSL 0.1.0

Gnu

We released the first version of our RcppGSL package. RcppGSL extends Rcpp to help programmers code with the GNU Scientific Library (GSL).

The package contains template classes in the RcppGSL namespace that act as smart pointers to the associated GSL data structure. For example, a RcppGSL::vector<:double> object acts a smart pointer to a gsl_vector*. Having the pointer shadowed by a smart pointer allows us to take advantage of C++ features such as operator overloading, etc ... which for example allows us to extract an element from the GSL vector simply using [] instead of GSL functions gsl_vector_get and gsl_vector_set

The package contains a 11 pages vignette that explains the features in details, with examples. The vignette also discusses how to actually use RcppGSL, either in another package (preferred) or directly from the R prompt through the inline package.

Sunday, November 28 2010

Rcpp 0.8.9

Rcpp 0.8.9 was pushed to CRAN recently. Apart from minor bug fixes, this release concentrates on modules, with lots of new features to expose C++ functions and classes through R reference classes.

Apollo 17 Command Module

The Rcpp-modules vignette has all the details

The major points are highlighted in the NEWS entry below:

0.8.9   2010-11-28 (or even -27)

    o   Many improvements were made to in 'Rcpp modules':

        - exposing multiple constructors

        - overloaded methods

        - self-documentation of classes, methods, constructors, fields and 
          functions.

        - new R function "populate" to facilitate working with modules in 
          packages. 

        - formal argument specification of functions.

        - updated support for Rcpp.package.skeleton.

        - constructors can now take many more arguments.
        
    o   The 'Rcpp-modules' vignette was updated as well and describe many
        of the new features

    o   New template class Rcpp::SubMatrix and support syntax in Matrix
        to extract a submatrix: 
        
           NumericMatrix x = ... ;
        
           // extract the first three columns
           SubMatrix y = x( _ , Range(0,2) ) ; 
        
           // extract the first three rows
           SubMatrix y = x( Range(0,2), _ ) ; 
        
           // extract the top 3x3 sub matrix
           SubMatrix y = x( Range(0,2), Range(0,2) ) ; 

    o   Reference Classes no longer require a default constructor for
        subclasses of C++ classes    

    o   Consistently revert to using backticks rather than shell expansion
        to compute library file location when building packages against Rcpp
	on the default platforms; this has been applied to internal test
        packages as well as CRAN/BioC packages using Rcpp

Thursday, October 28 2010

Google tech talk / Rcpp, ... presentation on youtube

Following this post, the 90 minutes presentation is now available to watch on youtube:

Saturday, October 23 2010

Google slides

Last stop on my World tour was Google headquarters in Mountain View, California, where Dirk and I presented Rcpp, RInside, RProtoBuf, etc ... for 90 minutes today. The talk was recorded, and will be broadcasted on youtube at some point. In the meantime, the slides are available here:

Thursday, October 7 2010

LondonR Rcpp slides

I'm just back to london where I presented about Rcpp at mango's LondonR event.

This was the third time (after rmetrics and useR!) I presented these slides, so I allowed myself some new metaphores about my long term relationship with R and my indiscretions with other languages such as C++. I've uploaded my slides to my slideshare account:

I had some time to browse around in South Bank and Covent Garden before the event. I took some pictures from my iphone

Friday, September 10 2010

Rcpp 0.8.6

Dirk released Rcpp 0.8.6 to CRAN

Most of the development of this release was trigerred by a question on the Rcpp-devel mailing list. After Richard's question, we added d-p-q-r functions for most of the distributions available in R.

The file runit.stats.R contains several examples of using them.

We have also started developing Rcpp 0.8.7, which will depend on the next version of R (R 2.12.0) since it will use some of the features it will introduce. More on this later...

Dirk also blogged about the release, including the relevant NEWS extract.

Friday, August 13 2010

Rcpp svn revision 2000

I commited the 2000th revision of Rcpp svn today, so I wanted to look back at what I did previously with the 50 000th R commit.

Here are the number of commits per day and month

commits_per_day.png commits_per_month.png

... the same thing, but focused on the period since I joined the project

commits_per_day__zoom.png commits_per_month__zoom.png

... and now split by contributor

commits_per_day_per_author__zoom.png commits_per_month_author__zoom.png

here are the month where each of us have been the most active

> do.call( rbind, 
   lapply( 
    split( month_author_data, month_author_data$author ) , 
    function(x) x[ which.max( x[["commits"]] ), ] ) 
  )
               date  author commits month year
dmbates 2010-08-01 dmbates      19    08 2010
edd     2010-06-01     edd     118    06 2010
romain  2010-06-01  romain     256    06 2010

and the most active day

> do.call( rbind, 
   lapply( 
    split( day_author_data, day_author_data$author ) , 
    function(x) x[ which.max( x[["commits"]] ), ] ) 
  )
              date  author commits month year
dmbates 2010-08-06 dmbates      13     8 2010
edd     2010-02-16     edd      20     2 2010
romain  2010-06-17  romain      30     6 2010

The code to reproduce the graphs is here

Rcpp at LondonR, oct 5th

I'll be presenting Rcpp at the next LondonR, which is currently scheduled for october 5th

Here is one picture I found on flickr, searching for london speed bus, ... there are many other

Saturday, July 10 2010

Rcpp 0.8.4

Dirk uploaded Rcpp 0.8.4 to CRAN yesterday. This release quickly follows the release of Rcpp 0.8.3, because there was some building problems (particularly on the ppc arch on OSX).

Rcpp sugar

Sugar:Cubed

Already available in Rcpp 0.8.3, the new sugar feature was extended in 0.8.4 to cover more functions, and we have now started to adapt sugar for matrices with functions such as outer, row, diag, etc ...

Here is an example of using the sugar version of outer

NumericVector xx(x) ;
NumericVector yy(y);
NumericMatrix m = outer( xx, yy, std::plus<double>() ) ;
return m ;

This mimics the R code

> outer( x, y, "+" )

Here is the relevant extract of the NEWS file:

0.8.4   2010-07-09

o   new sugar vector functions: rep, rep_len, rep_each, rev, head, tail, diag
	
o	sugar has been extended to matrices: The Matrix class now extends the Matrix_Base template that implements CRTP. Currently sugar functions for matrices are: outer, col, row, lower_tri, upper_tri, diag

o   The unit tests have been reorganised into fewer files with one call each to cxxfunction() (covering multiple tests) resulting in a significant speedup

o	The Date class now uses the same mktime() replacement that R uses (based on original code from the timezone library by Arthur Olson) permitting wide dates ranges on all operating systems

o   The FastLM/example has been updated, a new benchmark based on the historical Longley data set has been added

o   RcppStringVector now uses std::vector<std::string> internally

o    setting the .Data slot of S4 objects did not work properly

Wednesday, June 30 2010

Rmetrics slides

I presented Rcpp at the Rmetrics conference earlier today, this was a really good opportunity to look back at all the work Dirk and I have been commiting into Rcpp.

I've uploaded my slides here (pdf) and on slideshare :

and some pictures on flickr:

Tuesday, June 8 2010

Rcpp 0.8.1

We released Rcpp 0.8.0 almost a month ago. It finalized our efforts in designing a better, faster and more natural API than any version of Rcpp ever before. The journey from Rcpp 0.7.0 to Rcpp 0.8.0 has mainly been a coding and testing effort for designing the API.

And now for something completely different

We have now started (with release 0.8.1 of Rcpp) a new development cycle towards the 0.9.0 version with two major goals in mind

  • We want to improve documentation. To that end Rcpp 0.8.1 includes 4 new vignettes. more on that later.
  • We want to cross the boundaries between R and C++. Rcpp 0.8.1 introduces Rcpp modules. Modules allows the programmer to expose C++ classes and functions at the R level, with great ease.

new vignettes

Rcpp-FAQ :Frequently Asked Questions about Rcpp collects some of the frequently asked questions from the mailing list and from private exchanges with many people.

Rcpp-extending: Extending Rcpp shows how to extend Rcpp converters Rcpp::wrap and Rcpp::as to user defined types (C++ classes defined in someone else's package and third party types (C++ classes defined in some third party library used by a package. The document is based on our experience developping the RcppArmadillo package

Rcpp-package : Writing a package that uses Rcpp highlights the steps involved in making a package that uses Rcpp. The document is based on the Rcpp.package.skeleton function

finally, Rcpp-modules : Exposing C++ functions and classes with Rcpp modules documents the current feature set of Rcpp modules

Rcpp modules

Rcpp modules are inspired from the Boost.Python C++ library. Rcpp modules let you expose C++ classes and functions to the R level with minimal involvment from the programmer

The feature is best described by an example (more examples on the vignette). Say we want to expose this simple class:

This would typically involve external pointers. With Rcpp modules, we can simply declare what we want to expose about this class, and Rcpp takes care of the how to expose it:

The R side consists of grabbing a reference to the module, and just use the World class

The Rcpp-modules vignette gives more details about modules, including how to use them in packages

More details about 0.8.1 release

Here is the complete extract from our NEWS file about this release

Friday, May 28 2010

Rmetrics 2010

newsflash_logo.jpg

The 4th User/Developer Meeting on computational Finance and Financial Engineering (Rmetrics 2010) will take place once again in Meielisalp.

This is the first time I'll attend the conference, but I'm not coming empty handed. I'll present the work Dirk and I have done on Rcpp since version 0.7.0. See the abstract for my talk.

Wednesday, May 12 2010

Rcpp 0.8.0

Summary

Version 0.8.0 of the Rcpp package was released to CRAN today. This release marks another milestone in the ongoing redesign of the package, and underlying C++ library.

Overview

Rcpp is an R package and C++ library that facilitates integration of C++ code in R packages.

The package features a set of C++ classes (Rcpp::IntegerVector, Rcpp::Function, Rcpp::Environment, ...) that makes it easier to manipulate R objects of matching types (integer vectors, functions, environments, etc ...).

Rcpp takes advantage of C++ language features such as the explicit constructor/destructor lifecycle of objects to manage garbage collection automatically and transparently. We believe this is a major improvement over PROTECT/UNPROTECT. When an Rcpp object is created, it protects the underlying SEXP so that the garbage collector does not attempt to reclaim the memory. This protection is withdrawn when the object goes out of scope. Moreover, users generally do not need to manage memory directly (via calls to new / delete or malloc / free) as this is done by the Rcpp classes or the corresponding STL containers.

API

Rcpp provides two APIs: an older set of classes we refer to the classic API (see below for the section 'Backwards Compatibility) as well as second and newer set of classes.

Classes of the new Rcpp API belong to the Rcpp namespace. Each class is associated to a given SEXP type and exposes an interface that allows manipulation of the object that may feel more natural than the usual use of macros and functions provided by the R API.

SEXP type Rcpp class
INTSXP Rcpp::IntegerVector
REALSXP Rcpp::NumericVector
RAWSXP Rcpp::RawVector
LGLSXP Rcpp::LogicalVector
CPLXSXP Rcpp::ComplexVector
STRSXP Rcpp::CharacterVector
VECSXP Rcpp::List
EXPRSXP Rcpp::ExpressionVector
ENVSXP Rcpp::Environment
SYMSXP Rcpp::Symbol
CLOSXP
BUILTINSXP Rcpp::Function
SPECIALSXP
LANGSXP Rcpp::Language
LISTSXP Rcpp::Pairlist
S4SXP Rcpp::S4
PROMSXP Rcpp::Promise
WEAKREFSXP Rcpp::WeakReference
EXTPTRSXP template < typename T> Rcpp::XPtr    

Some SEXP types do not have dedicated Rcpp classes : NILSXP, DOTSXP, ANYSXP, BCODESXP and CHARSXP.

Still missing are a few convenience classes such as Rcpp::Date or Rcpp::Datetime which would map useful and frequently used R data types, but which do not have an underlying SEXP type.

Data Interchange

Data interchange between R and C++ is managed by extensible and powerful yet simple mechanisms.

Conversion of a C++ object is managed by the template function Rcpp::wrap. This function currently manages :

  • primitive types : int, double, bool, float, Rbyte, ...
  • std::string, const char*
  • STL containers such as std::vector<T> and STL maps such as std::mapr< std::string, Tr> provided that the template type T is wrappable
  • any class that can be implicitely converted to SEXP, through operator SEXP()

Conversion of an R object to a C++ object is managed by the Rcpp::as<T> template which can handle:

  • primitive types
  • std::string, const char*
  • STL containers such as std::vector<T>

Rcpp::wrap and Rcpp::as are often used implicitely. For example, when assigning objects to an environment:

  // grab the global environment
  Rcpp::Environment global = Rcpp::Environment::global_env() ;
  std::deque z( 3 ); z[0] = false; z[1] = true; z[3] = false ;

  global["x"] = 2 ;                    // implicit call of wrap
  global["y"] = "foo";                 // implicit call of wrap
  global["z"] = z ;                    // impl. call of wrap>

  int x = global["x"] ;                // implicit call of as
  std::string y = global["y"]          // implicit call of as
  std::vector z1 = global["z"] ; // impl. call of as>

Rcpp contains several examples that illustrate wrap and as. The mechanism was designed to be extensible. We have developped separate packages to illustrate how to extend Rcpp conversion mechanisms to third party types.

  • RcppArmadillo : conversion of types from the Armadillo C++ library.
  • RcppGSL : conversion of types from the GNU Scientific Library.

Rcpp is also used for data interchange by the RInside package which provides and easy way of embedding an R instance inside of C++ programs.

inline use

Rcpp depends on the inline package by Oleg Sklyar et al. Rcpp then uses the 'cfunction' provided by inline (with argument Rcpp=TRUE) to compile, link and load C++ function from the R session.

As of version 0.8.0 of Rcpp, we also define an R function cppfunction that acts as a facade function to the inline::cfuntion, with specialization for C++ use.

This allows quick prototyping of compiled code. All our unit tests are based on cppfunction and can serve as examples of how to use the mechanism. For example this function (from the runit.GenericVector.R unit test file) defines from R a C++ (simplified) version of lapply:

  ## create a compiled function cpp_lapply using cppfunction 
  cpp_lapply <- cppfunction(signature(x = "list", g = "function" ), 
  		'Function fun(g) ;
		 List input(x) ;
		 List output( input.size() ) ;
		 std::transform( input.begin(), input.end(), output.begin(), fun ) ;
		 output.names() = input.names() ;
		 return output ;
	    ')
  ## call cpp_lapply on the iris data with the R function summary
  cpp_lapply( iris, summary )	

Using Rcpp in other packages

Rcpp is designed so that its classes are used from other packages. Using Rcpp requires :

  • using the header files provided by Rcpp. This is typically done by adding this line in the package DESRIPTION file:
    	LinkingTo: Rcpp
    
    and add the following line in the package code:
    	#include <Rcpp.h>
    
  • linking against the Rcpp dynamic or static library, which is achieved by adding this line to the src/Makevars of the package:
    	PKG_LIBS = $(shell $(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()" )
    
    and this line to the src/Makevars.win file:
    	PKG_LIBS = $(shell Rscript.exe -e "Rcpp:::LdFlags()")
    

Rcpp contains a function Rcpp.package.skeleton, modelled after package.skeleton from the utils package in base r, that creates a skeleton of a package using Rcpp, including example code.

C++ exceptions

C++ exceptions are R contexts are both based on non local jumps (at least on the implementation of exceptions in gcc), so care must be ensure that one system does not void assumptions of the other. It is therefore very strongly recommended that each function using C++ catches C++ exceptions. Rcpp offers the function forward_exception_to_r to facilitate forwarding the exception to the "R side" as an R condition. For example :

  SEXP foo( ) {
    try {
      // user code here
    } catch( std::exception& __ex__){
      forward_exception_to_r( __ex__ ) ;
    }
    // return something here
  }

Alternatively, functions can enclose the user code with the macros BEGIN_RCPP and END_RCPP, which provides for a more compact way of programming. The function above could be written as follows using the macros:

  SEXP foo( ) {
    BEGIN_RCPP
    // user code here
    END_RCPP
    // return something here
  }

The use of BEGIN_RCPP and END_RCPP is recommended to anticipate future changes of Rcpp. We might for example decide to install dedicated handlers for specific exceptions later.

Experimental code generation macros

Rcpp contains several macros that can generate repetitive 'boiler plate' code:

  RCPP_FUNCTION_0, ..., RCPP_FUNCTION_65
  RCPP_FUNCTION_VOID_0, ..., RCPP_FUNCTION_VOID_65
  RCPP_XP_METHOD_0, ..., RCPP_XP_METHOD_65
  RCPP_XP_METHOD_CAST_0, ..., RCPP_XP_METHOD_CAST_65
  RCPP_XP_METHOD_VOID_0, ..., RCPP_XP_METHOD_VOID_65

For example:

  RCPP_FUNCTION_2( int, foobar, int x, int y){
     return x + y ;
  }

This will create a .Call compatible function "foobar" that calls a c++ function for which we provide the argument list (int x, int y) and the return type (int). The macro also encloses the call in BEGIN_RCPP/END_RCPP so that exceptions are properly forwarded to R.

Examples of the other macros are given in the NEWS file.

This feature is still experimental, but is being used in packages highlight and RProtoBuf

Quality Assurance

Rcpp uses the RUnit package by Matthias Burger et al and the aforementioned inline package by Oleg Sklyar et al to provide unit testing. Rcpp currently has over 500 unit tests (called from more than 230 unit test functions) with very good coverage of the critical parts of the package and library.

Source code for unit test functions are stored in the unitTests directory of the installed package and the results are collected in the "Rcpp-unitTests" vignette.

The unit tests can be both during the standard R package build and testing process, and also when the package is installed. The latter use is helpful to ensure that no system components have changed in a way that affect the Rcpp package since it has been installed. To run the tests, execute

   Rcpp:::test()

where an output directory can be provided as an optional first argument.

Backwards Compatibility

We believe the new API is now more complete and useful than the previous set of classes, which we refer to as the "classic Rcpp API". We would therefore recommend to package authors using 'classic' Rcpp to move to the new API. However, the classic API is still maintained and will continue to be maintained to ensure backwards compatibility for code that uses it.

Packages uses the 'Classic API' can use features of the new API selectively and in incremental steps. This provides for a non-disruptive upgrade path.

Documentation

The package contains a vignette which provides a short and succinct introduction to the Rcpp package along with several motivating examples. Also provided is a vignette containing the regression test summary from the time the package was built.

Links

Support

Questions about Rcpp should be directed to the Rcpp-devel mailing list

 -- Dirk Eddelbuettel and Romain Francois
    Chicago, IL, USA, and Montpellier, France
	May 2010

Sunday, February 14 2010

Rcpp 0.7.7

A good 2 days after 0.7.6 was released, here comes Rcpp 0.7.7. The reason for this release is that a subtle bug installed itself and we did not catch it in time

The new version also includes two new class templates : unary_call and binary_call that help integration of calls (e.g. Rcpp::Language objects) with STL algorithms. For example here is how we might use unary_call

This emulates the code

> lapply( 1:10, function(n) seq(from=n, to = 0 ) )

As usual, more examples in the unit tests

- page 1 of 2