Romain Francois, Professional R Enthusiast

To content | To menu | To search

Saturday, February 13 2010

Rcpp 0.7.6

Rcpp 0.7.6 was released yesterday. This is mostly a maintenance update since the version 0.7.5 had some very minor issues on windows, but we still managed however to include some new things as well.

Vectors can now use name based indexing. This is typically useful for things like data frame, which really are named lists. Here is an example from our unit tests where we grab a column from a data frame and then compute the sum of its values:

The classes CharacterVector, GenericVector(aka List) and ExpressionVector now have iterators. Below is another example from our unit tests, where we use iterators to implement a C++ version of lapply using the std::transform algorithm from the STL.

Generic vectors (lists) gain some methods that make them look more like std::vector from the STL : push_back, push_front, insert and erase. Examples of using these methods are available in our unit tests:

> system.file( "unitTests", "runit.GenericVector.R", 
+ package = "Rcpp" )

Tuesday, February 9 2010

Rcpp 0.7.5

Dirk released Rcpp 0.7.5 yesterday

The main thing is the smarter wrap function that now uses techniques of type traits and template meta-programming to have a compile time guess at whether an object is wrappable, and how to do it. Currently wrappable types are :

  • primitive types : int, double, Rbyte, Rcomplex
  • std::string
  • STL containers such as std::vector<T> as long as T is wrappable. This is not strictly tied to the STL, actually any type that has a nested type called iterator and member functions begin() and end() will do
  • STL maps keyed by strings such as std::map<std::string,T> as long as T is wrappable
  • any class that can be implicitely converted to SEXP
  • any class for which the wrap template is partly or fully specialized. (The next version of RInside has an example of that)

Here comes an example (from our unit tests) :

        funx <- cfunction(signature(), 
        '
        std::map< std::string,std::vector<int> > m ;
        std::vector<int> b ; b.push_back(1) ; b.push_back(2) ; m["b"] = b ;
        std::vector<int> a ; a.push_back(1) ; a.push_back(2) ; a.push_back(2) ; m["a"] = a ;
        std::vector<int> c ; c.push_back(1) ; c.push_back(2) ; c.push_back(2) ; c.push_back(2) ; m["c"] = c ;
        return wrap(m) ;
        ', 
        Rcpp=TRUE, verbose=FALSE, includes = "using namespace Rcpp;" )
R> funx()
$a
[1] 1 2 2

$b
[1] 1 2

$c
[1] 1 2 2 2

Apart from that, other things have changed, here is the relevant section of the NEWS for this release

    o 	wrap has been much improved. wrappable types now are :
    	- primitive types : int, double, Rbyte, Rcomplex, float, bool
    	- std::string
    	- STL containers which have iterators over wrappable types:
    	  (e.g. std::vector, std::deque, std::list, etc ...). 
    	- STL maps keyed by std::string, e.g std::map
    	- classes that have implicit conversion to SEXP
    	- classes for which the wrap template if fully or partly specialized
    	This allows composition, so for example this class is wrappable: 
    	std::vector< std::map > (if T is wrappable)
    	
    o 	The range based version of wrap is now exposed at the Rcpp::
    	level with the following interface : 
    	Rcpp::wrap( InputIterator first, InputIterator last )
    	This is dispatched internally to the most appropriate implementation
    	using traits

    o	a new namespace Rcpp::traits has been added to host the various
    	type traits used by wrap

    o 	The doxygen documentation now shows the examples

    o 	A new file inst/THANKS acknowledges the kind help we got from others

    o	The RcppSexp has been removed from the library.
    
    o 	The methods RObject::asFoo are deprecated and will be removed
    	in the next version. The alternative is to use as.

    o	The method RObject::slot can now be used to get or set the 
    	associated slot. This is one more example of the proxy pattern
    	
    o	Rcpp::VectorBase gains a names() method that allows getting/setting
    	the names of a vector. This is yet another example of the 
    	proxy pattern.
    	
    o	Rcpp::DottedPair gains templated operator<< and operator>> that 
    	allow wrap and push_back or wrap and push_front of an object
    	
    o	Rcpp::DottedPair, Rcpp::Language, Rcpp::Pairlist are less
    	dependent on C++0x features. They gain constructors with up
    	to 5 templated arguments. 5 was choosed arbitrarily and might 
    	be updated upon request.
    	
    o	function calls by the Rcpp::Function class is less dependent
    	on C++0x. It is now possible to call a function with up to 
    	5 templated arguments (candidate for implicit wrap)
    	
    o	added support for 64-bit Windows (thanks to Brian Ripley and Uwe Ligges)

Thursday, February 4 2010

RProtoBuf: protocol buffers for R

We (Dirk and I) released the initial version of our package RProtoBuf to CRAN this week. This packages brings google's protocol buffers to R

I invite you to check out the main page for protobuf to find the language definition for protocol buffers as well as tutorial for officially (i.e. by google) supported languages (python, c++ and java) as well as the third party support page that lists language bindings offered by others (including our RProtoBuf package.

Protocol buffers are a language agnostic data interchange format, based on a using a simple and well defined language. Here comes the classic example that google uses for C++, java and python tutorials.

First, the proto file defines the format of the message.

Then you need to teach this particular message to R, which is simply done by the readProtoFiles function.

> readProtoFiles( "addressbook.proto" )

Now we can start creating messages :

> person <- new( tutorial.Person, 
+     name = "John Doe", 
+     id = 1234,
+     email = "jdoe@example.com" )

And then access, modify fields of the message using a syntax extremely close to R lists

> person$email <- "francoisromain@free.fr"
> person$name <- "Romain Francois"

In R, protobuf messages are stored as simple S4 objects of class "Message" that contain an external pointer to the underlying C++ object. The Message class also defines methods that can be accessed using the dollar operator

> # write a debug version of message
> # this is not how it is serialized
> writeLines( person$toString() )
name: "Romain Francois"
id: 1234
email: "francoisromain@free.fr"

> # serialize the message to a file
> person$serialize( "somefile" )

The package already has tons of features, detailed in the vignette

> vignette( "RProtoBuf" )

.. and there is more to come

Wednesday, January 13 2010

Rcpp 0.7.2

Rcpp 0.7.2 is out, checkout Dirk's blog for details

selected highlights from this new version:

character vectors

if one wants to mimic this R code in C

> x <- c( "foo", "bar" )
one ends up with this :
SEXP x = PROTECT( allocVector( STRSXP, 2) ) ;
SET_STRING_ELT( x, 0, mkChar( "foo" ) ) ;
SET_STRING_ELT( x, 1, mkChar( "bar" ) ) ;
UNPROTECT(1) ;
return x ;

Rcpp lets you express the same like this :

CharacterVector x(2) ;
x[0] = "foo" ; 
x[1] = "bar" ;

or like this if you have GCC 4.4 :

CharacterVector x = { "foo", "bar" } ;

environments, functions, ...

Now, we try to mimic this R code in C :
rnorm( 10L, sd = 100 )
You can do one of these two ways in Rcpp :
Environment stats("package:stats") ;
Function rnorm = stats.get( "rnorm" ) ;
return rnorm( 10, Named("sd", 100 ) ) ;

or :

Language call( "rnorm", 10, Named("sd", 100 ) ) ;
return eval( call, R_GlobalEnv ) ;

and it will get better with the next release, where you will be able to just call call.eval() and stats["rnorm"].

Using the regular R API, you'd write these liks this :

SEXP stats = PROTECT( R_FindNamespace( mkString("stats") ) ) ;
SEXP rnorm = PROTECT( findVarInFrame( stats, install("rnorm") ) ) ;
SEXP call  = PROTECT( LCONS( rnorm, CONS(ScalarInteger(10), CONS(ScalarReal(100.0), R_NilValue)))) ;
SET_TAG( CDDR(call), install("sd") ) ;
SEXP res = PROTECT( eval( call, R_GlobalEnv ) );
UNPROTECT(4) ;
return res ;

or :

SEXP call  = PROTECT( LCONS( install("rnorm"), CONS(ScalarInteger(10), CONS(ScalarReal(100.0), R_NilValue)))) ;
SET_TAG( CDDR(call), install("sd") ) ;
SEXP res = PROTECT( eval( call, R_GlobalEnv ) );
UNPROTECT(2) ;
return res ;

Friday, January 8 2010

External pointers with Rcpp

One of the new features of Rcpp is the XPtr class template, which lets you treat an R external pointer as a regular pointer. For more information on external pointers, see Writing R extensions.

To use them, first we need a pointer to some C++ data structure, we'll use a pointer to a vector<int> :

/* creating a pointer to a vector<int> */
std::vector<int>* v = new std::vector<int> ;
v->push_back( 1 ) ;
v->push_back( 2 ) ;

Then, using the XPtr template class we wrap the pointer in an R external pointer

/* wrap the pointer as an external pointer */
/* this automatically protected the external pointer from R garbage 
   collection until p goes out of scope. */
Rcpp::XPtr< std::vector<int> > p(v, true) ;

The first parameter of the constructor is the actual (sometimes called dumb) pointer, and the second parameter is a flag indicating that we need to register a delete finalizer with the external pointer. When the external pointer goes out of scope, it becomes subject to garbage collection, and when it is garbage collected, the finalizer is called, which then calls delete on the dumb pointer.

Wrapping it all together thanks to the inline package, here's a function that creates an external pointer to a vector<int> and return it to R

        funx <- cfunction(signature(), '
                /* creating a pointer to a vector<int> */
                std::vector<int>* v = new std::vector<int> ;
                v->push_back( 1 ) ;
                v->push_back( 2 ) ;
                
                /* wrap the pointer as an external pointer */
                /* this automatically protected the external pointer from R garbage 
                   collection until p goes out of scope. */
                Rcpp::XPtr< std::vector<int> > p(v, true) ;
                
                /* return it back to R, since p goes out of scope after the return 
                   the external pointer is no more protected by p, but it gets 
                   protected by being on the R side */
                return( p ) ;
        ', Rcpp=TRUE, verbose=FALSE)
        xp <- funx()

At that point, xp is an external pointer object

> xp
<pointer: 0x9c850c8>
> typeof( xp )
[1] "externalptr"

Then, we can pass it back to the C(++) layer, an continue to work with the wrapped stl vector of ints. For this we use the other constructor for the XPtr class template, that takes an R object (SEXP) of sexp type EXTPTRSXP.


/* wrap the SEXP as a smart external pointer */
Rcpp::XPtr< std::vector<int> > p(x) ;

/* use p as a 'dumb' pointer */
p->front() ;

Again, we can wrap this up for quick prototyping using the inline package :

        # passing the pointer back to C++
        funx <- cfunction(signature(x = "externalptr" ), '
                /* wrapping x as smart external pointer */
                /* The SEXP based constructor does not protect the SEXP from 
                   garbage collection automatically, it is already protected 
                   because it comes from the R side, however if you want to keep 
                   the Rcpp::XPtr object on the C(++) side
                   and return something else to R, you need to protect the external
                   pointer, by using the protect member function */
                Rcpp::XPtr< std::vector<int> > p(x) ;
                
                /* just return the front of the vector as a SEXP */
                return( Rcpp::wrap( p->front() ) ) ;
        ', Rcpp=TRUE, verbose=FALSE)
        front <- funx(xp)
> front
[1] 1

The example is extracted from one unit tests that we use in Rcpp, see the full example :

> system.file( "unitTests", "runit.XPTr.R", package = "Rcpp" )
[1] "/usr/local/lib/R/library/Rcpp/unitTests/runit.XPTr.R"

See also the announcement for the release of Rcpp 0.7.1 here to get a list of new features, or wait a few days to see version 0.7.2.

Using the XPtr class template is the bread and butter of the CPP package I blogged about here

Thursday, January 7 2010

R Journal, Volume 1/2, December 2009

The issue 1/2 of the R Journal has been published. It features an article that I co-authored with Spencer Graves and Sundar Dorai-Raj about the sospackage.

Tuesday, December 29 2009

C++ exceptions at the R level

The feature described in this post is no longer valid with recent versions of Rcpp. Setting a terminate handler does not work reliably on windows, so we don't do it at all anymore. Exceptions need to be caught and relayed to R. Bracketing the code with BEGIN_RCPP / END_RCPP does it simply. See the Rcpp-introduction vignette for details.

I've recently offered an extra set of hands to Dirk to work on the Rcpp package, this serves a good excuse to learn more about C++

Exception management was quite high on my list. C++ has nice exception handling (well not as nice as java, but nicer than C).

With previous versions of Rcpp, the idiom was to wrap up everything in a try/catch block and within the catch block, call the Rf_error function to send an R error, equivalent of calling stop. Now things have changed and, believe it or not, you can now catch a C++ exception at the R level, using the standard tryCatch mechanism

, so for example when you throw a C++ exception (inheriting from the class std::exception) at the C++ level, and the exception is not picked up by the C++ code, it automatically sends an R condition that contain the message of the exception (what the what member function of std::exception gives) as well as the class of the exception (including namespace)

This, combined with the new inline support for Rcpp, allows to run this code, (also available in the inst/examples/RcppInline directory of Rcpp)

require(Rcpp)
require(inline)
funx <- cfunction(signature(), '
throw std::range_error("boom") ;
return R_NilValue ;
', Rcpp=TRUE, verbose=FALSE)

Here, we create the funx "function" that compiles itself into a C++ function and gets dynamically linked into R (thanks to the inline package). The relevant thing (at least for this post) is the throw statement. We throw a C++ exception of class "std::range_error" with the message "boom", and what follows shows how to catch it at the R level:

tryCatch(  funx(), "C++Error" = function(e){
    cat( sprintf( "C++ exception of class '%s' : %s\n", class(e)[1L], e$message  ) )
} )
# or using a direct handler 
tryCatch(  funx(), "std::range_error" = function(e){
        cat( sprintf( "C++ exception of class '%s' : %s\n", class(e)[1L], e$message  ) )
} )

... et voila

Under the carpet, the abi unmangling namespace is at work, and the function that grabs the uncaught exceptions is much inspired from the verbose terminate handler that comes with the GCC

Part of this was inspired from the new java exception handling that came with the version 0.8-0 of rJava, but cooked with C++ ingredients

Tuesday, December 22 2009

CPP package: exposing C++ objects

I've just started working on the new package CPP, as usual the project is maintained in r-forge. The package aims at exposing C++ classes at the R level, starting from classes from the c++ standard template library.

key to the package is the CPP function (much inspired from the J function of rJava). The CPP function builds an S4 object of class "C++Class". The "C++Class" currently is a placeholder wrapping the C++ class name, and defines the new method (again this trick or making new S4 generic comes from rJava). For example to create an R object that wraps up a std::vector<int>, one would go like this:

x <- new( CPP( "vector<int>" ) )

This is no magic and don't expect to be able to send anything to CPP (C++ does not have reflection capabilities), currently only these classes are defined : std::vector<int>, vector<double>, vector<raw> and set<int>

Because C++ does not offer reflection capabilities, we have to do something else to be able to invoke methods on the wrapped objects. Currently the approach that the package follows is a naming convention. The $ method create the name of the C routine it wants to call based on the C++ class the object is wrapping, the name of the method, and the types of the input parameters. So for example calling the size method for a vector<:int> object yields this routine name: "vector_int____size", calling the push_back method of the vector<double> class, passing an integer vector as the first parameter yields this signature : "vector_double____push_back___integer" .... (the CPP:::getRoutineSignature implements the convention)

Here is a full example using the set<int> class. Sets are a good example of a data structure that is not available in R. Basically it keeps its objects sorted

> # create the object
> x <- new( CPP("set<int>") )
> # insert data using the insert method
> # see : insert
> x$insert( sample( 1:20 ) )
> # ask for the size of the set
> x$size()
[1] 20
> # bring it back as an R classic integer vector
> as.vector( x )
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

Currently the package is my excuse to learn about the standard template library, and it is quite possible that the functionality will be merged into the Rcpp it currently depends on. Because of this volatility, I'll use the Rcpp-devel mailing list instead of creating a new one.

Friday, December 11 2009

new R package : bibtex

I've pushed to CRAN the package bibtex package

The package defines the read.bib function that reads a file in the bibtex format. The code is based on bibparse

The read.bib function generates an object of class citationList, just like utils::citation

Wednesday, December 2 2009

ohloh

I've been invited me to create my account on ohloh

Sunday, November 22 2009

new R package : highlight

I finally pushed highlight to CRAN, which should be available in a few days. The package uses the information gathered by the parser package to perform syntax highlighting of R code

The main function of the package is highlight, which takes a number of argument including :

  • file : the file in which the R code is
  • output : some output connection or file name where to write the result (The default is standard output)
  • renderer : a collection of function controlling how to render code into a given markup language

The package ships three functions that create such renderers

  • renderer_html : renders in html/css
  • renderer_latex: renders in latex
  • renderer_verbatim: does nothing

And additionally, the xterm256 package defines a renderer that allows syntax highlighting directly in the console (if the console knows xterm 256 colors)

Let's assume we have this code file (/tmp/code.R)

f <- function( x){
        x + rnorm(1)
}

g <- function(x){}
h <- function(x){}

Then we can syntax highlight it like this :

> highlight( "/tmp/code.R", renderer = renderer_html(), output = "/tmp/code.R.html" )
> highlight( "/tmp/code.R", renderer = renderer_latex(), output = "/tmp/code.R.latex" )

which makes these files : code.R.html and code.R.latex

The package also ships a sweave driver that can highlight code chunks in a sweave document, but I'll talk about this in another post

Monday, November 9 2009

LondonR slides

I was in london last week to present RemoteREngine at the LondonR user group sponsored by mango solutions.

Apart from minor technical details and upsetting someone because I did not mention that he once presented a much simpler solution to a quite different problem, it went pretty good and people were interested in what the package can do

Essentially, RemoteREngine is an implementation of REngine using java rmi (remote method invocation) for the data transport.

This allows a (or several) client java application to embed an R engine that lives in a different java virtual machine, perhaps on a different physical machine. In a way it is quite similar to the Rserve implementation of REngine, but rmi gives better control over the data transport and we get things Rserve does not currently do such as support for environments or references.

The slides are available here and will probably also make their way to the conference site at some point

Friday, October 9 2009

celebrating R commit #50000

Today, Brian Ripley commited the revision 50 000 into R svn repository.

------------------------------------------------------------------------
r50000 | ripley | 2009-10-09 10:34:17 +0200 (Fri, 09 Oct 2009) | 1 line
Changed paths:
   M /branches/R-2-10-branch/src/library/stats/R/plot.lm.R

port r49999 from trunk
------------------------------------------------------------------------
r49999 | ripley | 2009-10-09 10:33:28 +0200 (Fri, 09 Oct 2009) | 2 lines
Changed paths:
   M /trunk/src/library/stats/R/plot.lm.R

workaround for PR#13899 (that in the report is broken and fails make check!)

so it is time to celebrate and have some fun with the svn log to analyze the 50 000 commits ... with R of course.

data extraction

First we need to grab the full svn log, using command line svn, something like this:

$ svn log -v https://svn.r-project.org/R > rsvn.log

... or you can download it from my website if you don't have svn on your machine

now we need to read the data into R :

we might also be interested in release date, version number and size of the distribution of each R release that is archived on CRAN, which we can get like this :

graphics

now we can do some graphics. I'm using lattice here because I am familiar with it, but I'm sure interesting plots could be done using ggplot2, in fact checkout this post from Yihui Xie using ggplot2

First I need to define some helper panel functions I'll use in the plots below

Number of commits per day

commits_day.png

... split by author

commits_author_day.png

The number of commits per month

commits_month.png

... split by author

commits_author_month.png

blogroll

Wednesday, September 23 2009

RGG #158:161: examples of package IDPmisc

three new graphs have made their way to the graph gallery, submitted by Reto Burgin

Image lag plot matrix

graph_158.png

Image scatter plot matrix

graph_159.png

Regular time series

graph_160.png

Saturday, September 12 2009

New R package: sos

Searching help pages of contributed packages just got easier with the release of the new sos package. This is a replacement for and substantial enhancement of the existing "RSiteSearch" package. To learn more about it, try vignette("sos")

We hope you find this as useful as we have.

Spencer Graves, Sundar Dorai-Raj, Romain Francois

Tuesday, September 8 2009

search the graph gallery from R

This is a short code snippet that is motivated by this thread on r-help yesterday. The gallery contains a search engine textbox (top-right) that can be used to search for content in the website using either its internal crude search engine or perform a google search restricted to the gallery.

Here we write a small R function that can be used to take advantage of the search engine, from R

rgg.search <- function( topic, engine = c("Google", "RGG") ){

    engine <- match.arg( engine )
    url <- URLencode( sprintf( "http://addictedtor.free.fr/graphiques/search.php?q=%s&engine=%s", topic, engine ) )
    browseURL( url )
}
rgg.search( "Andrews plot" ) 

new R package : ant

The ant package has been released to CRAN yesterday. As discussed in previous posts in this blog (here and here), the ant R package provides an R-aware version of the ant build tool from the apache project.

The package contains an R script that can be used to invoke ant with enough plumbing so that it can use R code during the build process. Calling the script is further simplified with the ant function included in the package.

$ Rscript -e "ant::ant()"

The simplest way to take advantage of this package is to add it to the Depends list of yours, include a java source tree somewhere in your package tree (most likely somewhere in the inst tree) with a build.xml file, and include a configure and configure.win script at the root of the package that contains something like this:

#!/bin/sh

cd inst/java_src
"${R_HOME}/bin/Rscript" -e "ant::ant()"
cd ../..

This will be further illustrated with the demo package helloJavaWorld in future posts

Thursday, September 3 2009

update on the ant package

I have updated the ant package I described yesterday in this blog to add several things

  • Now the R code related to <r-set> and <r-run> tasks can either be given as the code attribute or as the text inside the task
  • The R code has access to special variables to manipulate the current project (project) and the current task (self) which can be used to set properties, get properties, ...
  • The package contains ant ant function so that ant can be invoked using a simple Rscript call, see below

The package now includes a demonstrative build.xml file in the examples directory

Here is the result

Wednesday, September 2 2009

R capable version of ant

ant is an amazing build tool. I have been using ant for some time to build the java code that lives inside the src directories of my R packages, see this post for example.

The drawbacks of this approach are :

  • that it assumes ant is available on the system that builds the package
  • You cannot use R code within the ant build script

The ant package for R is developed to solve these two issues. The package is source-controlled in r-forge as part of the orchestra project

Once installed, you find an ant.R script in the exec directory of the package. This script is pretty similar to the usual shell script that starts ant, but it sets it so that it can use R with the following additional tasks

  • <r-run> : to run arbitrary R code
  • <r-set> : to set a property of the current project with the result of an R expression

Here is an example build file that demonstrate how to use these tasks

Here is what happens when we call the R special version of ant with this build script

$ `Rscript -e "cat( system.file( 'exec', 'ant.R', package = 'ant') )"`
Buildfile: build.xml

test:
     [echo] 
     [echo]   	R home        : /usr/local/lib/R
     [echo]   	R version     : R version 2.10.0 Under development (unstable) (2009-08-05 r49067)
     [echo]   	rJava home    : /usr/local/lib/R/library/rJava
     [echo]   	rJava version : 0.7-1
     [echo]  

BUILD SUCCESSFUL
Total time: 1 second

Tip: get java home from R with rJava

Assuming rJava is installed and works, it is possible to take advantage of its magic to get the path where java is installed:

$ Rscript --default-packages="methods,rJava" -e ".jinit(); .jcall( 'java/lang/System', 'S', 'getProperty', 'java.home' ) "
[1] "/opt/jdk/jre"

This is useful when you develop scripts that need to call a java program without assuming that java is on the path, or the JAVA_HOME environment variable is set, etc ...

- page 4 of 7 -