Romain Francois, Professional R Enthusiast

To content | To menu | To search

Monday, November 5 2012

OOP with Rcpp modules



The purpose of Rcpp modules has always been to make it easy to expose C++ functions and classes to R. Up to now, Rcpp modules did not have a way to declare inheritance between C++ classes. This is now fixed in the development version, and the next version of Rcpp will have a simple mechanism to declare inheritance.

Consider this simple example, we have a base class Shape with two virtual methods (area and contains) and two classes Circle and Rectangle) each deriving from Shape and representing a specific shape.

The classes might look like this:

And we can expose these classes to R using the following module declarative code:

It is worth noticing that:

  • The area and contains methods are exposed as part of the base Shape class
  • Classes Rectangle and Circle simply declare that they derive from Shape using the derives notation.

R code that uses these classes looks like this:

shapes.jpg

Thursday, October 25 2012

Rcpp modules more flexible


Rcpp modules just got more flexible (as of revision 3838 of Rcpp, to become 0.9.16 in the future).

modules have allowed exposing C++ classes for some time now, but developpers had to declare custom wrap and as specializations if they wanted their classes to be used as return type or argument type of a C++ function or method. This led to writing boilerplate code. The newest devel version allows for syntax like this:

The only thing the developper has to do is to declare the class using the macro RCPP_EXPOSED_CLASS. This will declare the appropriate class traits that Rcpp is using for internal implementations of as and wrap

One the example we can see three examples of the new functionality:

  • make_foo : this returns a Foo
  • cloner: this returns a Foo*
  • bla: uses a const Foo& as argument


Wednesday, December 14 2011

... And now for solution 17, still using Rcpp

Here comes yet another sequel of the code optimization problem from the R wiki, still using Rcpp, but with a different strategy this time

Essentially, my previous version (15) was using stringstream although we don't really need its functionality and it was slowing us down

Also, the characters "i" and "." are always on the same position so we can assign them once and for all

So without further ado, here is attempt 17:

With quite a speedup from attempt 15:

                test replications elapsed relative
2 generateIndex17(n)           20   9.363 1.000000
1 generateIndex15(n)           20  17.795 1.900566

Saturday, November 26 2011

int64: 64 bit integer vectors for R

google-64.png

The Google Open Source Programs Office sponsored me to create the new int64 package that has been released to CRAN a few days ago. The package has been mentionned in an article in the open source blog from Google.

The package defines classes int64 and uint64 that represent signed and unsigned 64 bit integer vectors. The package also allows conversion of several types (integer, numeric, character, logical) to 64 bit integer vectors, arithmetic operations as well as other standard group generic functions, and reading 64 bit integer vectors as a data.frame column using int64 or uint64 as the colClasses argument.

The package has a vignette that details its features, several examples are given in the usual help files. Once again, I've used RUnit for quality insurance about the package code

int64 has been developped so that 64 bit integer vectors are represented using only R data structures, i.e data is not represented as external pointers to some C++ object. Instead, each 64 bit integer is represented as a couple of regular 32 bit integers, each of them carrying half the bits of the underlying 64 bit integer. This was a choice by design so that 64 bit integer vectors can be serialized and used as data frame columns.

The package contains C++ headers that third party packages can used (via LinkingTo: int64) to use the C++ internals. This allows creation and manipulation of the objects in C++. The internals will be documented in another vignette for package developpers who wish to use the internals. For the moment, the main entry point is the C++ template class LongVector.

I'm particularly proud that Google trusted me to sponsor the development of int64. The next versions of packages Rcpp and RProtoBuf take advantage of the facilities of int64, e.g. Rcpp gains wrapping of C++ containers of 64 bit integers as R objects of classes int64 and uint64 and RProtoBuf improves handling of 64 bit integers in protobuf messages. More on this later

Thursday, November 10 2011

Code optimization, an Rcpp solution

Tony Breyal woke up an old code optimization problem in this blog post, so I figured it was time for an Rcpp based solution

This solutions moves down Henrik Bengtsson's idea (which was at the basis of attempt 10) down to C++. The idea was to call sprintf less than the other solutions to generate the strings "001", "002", "003", ...

We can benchmark this version using the rbenchmark package:

> library(rbenchmark)
> n <- 2000
> benchmark(
+     generateIndex10(n), 
+     generateIndex11(n),
+     generateIndex12(n), 
+     generateIndex13(n),
+     generateIndex14(n),
+     columns = 
+        c("test", "replications", "elapsed", "relative"),
+     order = "relative",
+     replications = 20
+ )
                test replications elapsed relative
5 generateIndex14(n)           20  21.015 1.000000
3 generateIndex12(n)           20  22.034 1.048489
4 generateIndex13(n)           20  23.436 1.115203
2 generateIndex11(n)           20  23.829 1.133904
1 generateIndex10(n)           20  30.580 1.455151
>    

Sunday, October 30 2011

Rcpp reverse dependency graph

I played around with reverse dependencies of Rcpp. At the moment, 44 packages depend on Rcpp and the number goes up to 53 when counting recusive reverse dependencies.

I've used graphviz for the representation of the directed graph

dep.png

Here is the code I've used to generate the dot file:

Friday, April 29 2011

Rcpp Workshop slides

Dirk and I gave a full day Rcpp workshop yesterday in Chicago before the R in Finance conference.

The pdfs of the slides are available here: part 1 (intro), part 2 (details), part 3 (modules) and part 4 (applications)

Sunday, April 17 2011

Rcpp article in JSS

The Journal of Statistical Software published our Rcpp article

Wednesday, March 30 2011

Rcpp workshop in Chicago on April 28th

Overview

This year's R/Finance conference will be preceded by a full-day masterclass on Rcpp and related topics which will be held on Thursday, April 28, 2011, the Univ. of Illinois at Chicago campus.

Join Dirk Eddelbuettel and Romain Francois for six hours of detailed and hands-on instructions and discussions around Rcpp, inline, RInside, RcppArmadillo and other packages---in intimate small-group setting.

The full-day format allows to combine a morning introductory session with a more advanced afternoon session while leaving room for sufficient breaks. There will be about six hours of instructions, a one-hour lunch break and two half-hour coffee breaks.

Morning session: "A hands-on introduction to R and C++"

The morning session will provide a practical introduction to the Rcpp package (and other related packages). The focus will be on simple and straightforward applications of Rcpp in order to extend R and/or to significantly accelerate the execution of simple functions.

The tutorial will cover the inline package which permits embedding of self-contained C, C++ or Fortran code in R scripts. We will also discuss RInside to embed R code in C++ applications, as well as standard Rcpp extension packages such as RcppArmadillo for linear algebra and RcppGSL.

Afternoon session: "Advanced R and C++ topics"

This afternoon tutorial will provide a hands-on introduction to more advanced Rcpp features. It will cover topics such as writing packages that use Rcpp, how 'Rcpp modules' and the new R ReferenceClasses interact, and how 'Rcpp sugar' lets us write C++ code that is often as expressive as R code. Another possible topic, time permitting, may be writing glue code to extend Rcpp to other C++ projects.

We also hope to leave some time to discuss problems brought by the class participants.

Prerequisites

Knowledge of R as well as general programming knowledge; C or C++ knowledge is helpful but not required.

Users should bring a laptop set up so that R packages can be built. That means on Windows, Rtools needs to be present and working, and on OS X the Xcode package should be installed.

Registration

Registration is available via the R/Finance conference at

http://www.RinFinance.com/register/

or directly at RegOnline

http://www.regonline.com/930153

The cost is USD 500 for the whole day, and space will be limited.

Questions

Please contact us directly at RomainAndDirk@r-enthusiasts.com

Wednesday, March 2 2011

Rcpp at Geneva-R

I'll present Rcpp at the inaugural Geneva-R meeting.

Geneva-R is an informal gathering of R enthusiasts sponsored by Mango Solutions, that builds on the success of London-R, where I presented twice, and Basel-R

Tuesday, December 7 2010

highlight 0.2-5

I pushed highlight 0.2-5 on CRAN. This release improves the latex renderer and the sweave driver so that multiple lines character strings are properly rendered.

This example vignette shows it:

\documentclass[a4paper]{report}
\begin{document}

<<echo=FALSE,results=hide>>=
old.op <- options( prompt = " ", continue = " " )
@

<<>>=   
require( inline )
require( Rcpp )
convolve <- cxxfunction( 
    signature( a = "numeric", b = "numeric" ), '
    NumericVector xa(a); int n_xa = xa.size() ;
    NumericVector xb(b); int n_xb = xb.size() ;
    NumericVector xab(n_xa + n_xb - 1,0.0);
    
    Range r( 0, n_xb-1 );
    for(int i=0; i<n_xa; i++, r++){
        xab[ r ] += noNA(xa[i]) * noNA(xb) ;
    }
    return xab ;
', plugin = "Rcpp" )
convolve( 1:4, 1:5 )
@

<<echo=FALSE,results=hide>>=
options( old.op )
@

\end{document}

Once processed with Sweave, e.g. :

require( highlight )
driver <- HighlightWeaveLatex(boxes = TRUE)
Sweave( 'test.Rnw', driver = driver )
texi2dvi( 'test.tex', pdf = TRUE )

we get this result, embedded below with google viewer:

See this question on stack overflow for the tip of using google documents to display pdf files

Friday, December 3 2010

Evolution of Rcpp code size



I've been contributing to Rcpp for about a year now, initially to add missing bits that were needed for the development of RProtoBuf. This led to a complete redesign of the API, which now goes way beyond the initial code (that we now call classic Rcpp API). This has been quite a journey in terms of development with more than 1500 commits to the svn repository of the project on R-forge, and promotion with presentations at RMetrics 2010, useR 2010, LondonR and at Google, as well as many blog posts about Rcpp and the packages that derive from it.

I wanted to take this opportunity to express visually how vibrant the development of Rcpp has been since it was first relaunched in 2008, and since I started to contribute.

The graph below shows the evolution of the number of lines (counting the .h, .cpp, .R, .Rd, .Rnw files) accross released versions of the Rcpp package on CRAN

The first thing I need for this is to download the 32 versions of Rcpp that have been released since 0.6.0.

Then, all it takes is some processing with R to extract the relevant information (number of lines in files of interest), and present the data in a graph. I'm also taking this opportunity to have some fun with raster images and the png package

nlines_rcpp.png

The code explosion that started around version 0.7.8 marks the beginning of development of two of the most exciting and addictive projects I ever worked on: modules and sugar

The acceleration between 0.8.8 and the current version 0.8.9 represents many of the improvements that were made in modules. That alone, with more than 8000 new lines of code and documentation represents about 4 times as many lines as the total number of lines in 0.6.0

We still have plenty of ideas, and Rcpp will continue to evolve to deliver a quality interface between R and C++, to the best of the current team's abilities.

The full code is available below:

Wednesday, December 1 2010

RcppGSL 0.1.0

Gnu

We released the first version of our RcppGSL package. RcppGSL extends Rcpp to help programmers code with the GNU Scientific Library (GSL).

The package contains template classes in the RcppGSL namespace that act as smart pointers to the associated GSL data structure. For example, a RcppGSL::vector<:double> object acts a smart pointer to a gsl_vector*. Having the pointer shadowed by a smart pointer allows us to take advantage of C++ features such as operator overloading, etc ... which for example allows us to extract an element from the GSL vector simply using [] instead of GSL functions gsl_vector_get and gsl_vector_set

The package contains a 11 pages vignette that explains the features in details, with examples. The vignette also discusses how to actually use RcppGSL, either in another package (preferred) or directly from the R prompt through the inline package.

Sunday, November 28 2010

parser 0.0-13

I've pushed a new version of the parser package to CRAN.

This is the first release that depends on Rcpp, which allowed me to reduce the code size and increase its maintainability.

This also features a faster version of nlines, a function that retrieves the number of lines of a text file.

Rcpp 0.8.9

Rcpp 0.8.9 was pushed to CRAN recently. Apart from minor bug fixes, this release concentrates on modules, with lots of new features to expose C++ functions and classes through R reference classes.

Apollo 17 Command Module

The Rcpp-modules vignette has all the details

The major points are highlighted in the NEWS entry below:

0.8.9   2010-11-28 (or even -27)

    o   Many improvements were made to in 'Rcpp modules':

        - exposing multiple constructors

        - overloaded methods

        - self-documentation of classes, methods, constructors, fields and 
          functions.

        - new R function "populate" to facilitate working with modules in 
          packages. 

        - formal argument specification of functions.

        - updated support for Rcpp.package.skeleton.

        - constructors can now take many more arguments.
        
    o   The 'Rcpp-modules' vignette was updated as well and describe many
        of the new features

    o   New template class Rcpp::SubMatrix and support syntax in Matrix
        to extract a submatrix: 
        
           NumericMatrix x = ... ;
        
           // extract the first three columns
           SubMatrix y = x( _ , Range(0,2) ) ; 
        
           // extract the first three rows
           SubMatrix y = x( Range(0,2), _ ) ; 
        
           // extract the top 3x3 sub matrix
           SubMatrix y = x( Range(0,2), Range(0,2) ) ; 

    o   Reference Classes no longer require a default constructor for
        subclasses of C++ classes    

    o   Consistently revert to using backticks rather than shell expansion
        to compute library file location when building packages against Rcpp
	on the default platforms; this has been applied to internal test
        packages as well as CRAN/BioC packages using Rcpp

Saturday, October 23 2010

Google slides

Last stop on my World tour was Google headquarters in Mountain View, California, where Dirk and I presented Rcpp, RInside, RProtoBuf, etc ... for 90 minutes today. The talk was recorded, and will be broadcasted on youtube at some point. In the meantime, the slides are available here:

Thursday, October 7 2010

LondonR Rcpp slides

I'm just back to london where I presented about Rcpp at mango's LondonR event.

This was the third time (after rmetrics and useR!) I presented these slides, so I allowed myself some new metaphores about my long term relationship with R and my indiscretions with other languages such as C++. I've uploaded my slides to my slideshare account:

I had some time to browse around in South Bank and Covent Garden before the event. I took some pictures from my iphone

Friday, September 10 2010

Rcpp 0.8.6

Dirk released Rcpp 0.8.6 to CRAN

Most of the development of this release was trigerred by a question on the Rcpp-devel mailing list. After Richard's question, we added d-p-q-r functions for most of the distributions available in R.

The file runit.stats.R contains several examples of using them.

We have also started developing Rcpp 0.8.7, which will depend on the next version of R (R 2.12.0) since it will use some of the features it will introduce. More on this later...

Dirk also blogged about the release, including the relevant NEWS extract.

Friday, August 13 2010

Rcpp svn revision 2000

I commited the 2000th revision of Rcpp svn today, so I wanted to look back at what I did previously with the 50 000th R commit.

Here are the number of commits per day and month

commits_per_day.png commits_per_month.png

... the same thing, but focused on the period since I joined the project

commits_per_day__zoom.png commits_per_month__zoom.png

... and now split by contributor

commits_per_day_per_author__zoom.png commits_per_month_author__zoom.png

here are the month where each of us have been the most active

> do.call( rbind, 
   lapply( 
    split( month_author_data, month_author_data$author ) , 
    function(x) x[ which.max( x[["commits"]] ), ] ) 
  )
               date  author commits month year
dmbates 2010-08-01 dmbates      19    08 2010
edd     2010-06-01     edd     118    06 2010
romain  2010-06-01  romain     256    06 2010

and the most active day

> do.call( rbind, 
   lapply( 
    split( day_author_data, day_author_data$author ) , 
    function(x) x[ which.max( x[["commits"]] ), ] ) 
  )
              date  author commits month year
dmbates 2010-08-06 dmbates      13     8 2010
edd     2010-02-16     edd      20     2 2010
romain  2010-06-17  romain      30     6 2010

The code to reproduce the graphs is here

Rcpp at LondonR, oct 5th

I'll be presenting Rcpp at the next LondonR, which is currently scheduled for october 5th

Here is one picture I found on flickr, searching for london speed bus, ... there are many other

- page 1 of 2