Romain Francois, Professional R Enthusiast

To content | To menu | To search

Tag - package

Entries feed - Comments feed

Saturday, November 26 2011

int64: 64 bit integer vectors for R

google-64.png

The Google Open Source Programs Office sponsored me to create the new int64 package that has been released to CRAN a few days ago. The package has been mentionned in an article in the open source blog from Google.

The package defines classes int64 and uint64 that represent signed and unsigned 64 bit integer vectors. The package also allows conversion of several types (integer, numeric, character, logical) to 64 bit integer vectors, arithmetic operations as well as other standard group generic functions, and reading 64 bit integer vectors as a data.frame column using int64 or uint64 as the colClasses argument.

The package has a vignette that details its features, several examples are given in the usual help files. Once again, I've used RUnit for quality insurance about the package code

int64 has been developped so that 64 bit integer vectors are represented using only R data structures, i.e data is not represented as external pointers to some C++ object. Instead, each 64 bit integer is represented as a couple of regular 32 bit integers, each of them carrying half the bits of the underlying 64 bit integer. This was a choice by design so that 64 bit integer vectors can be serialized and used as data frame columns.

The package contains C++ headers that third party packages can used (via LinkingTo: int64) to use the C++ internals. This allows creation and manipulation of the objects in C++. The internals will be documented in another vignette for package developpers who wish to use the internals. For the moment, the main entry point is the C++ template class LongVector.

I'm particularly proud that Google trusted me to sponsor the development of int64. The next versions of packages Rcpp and RProtoBuf take advantage of the facilities of int64, e.g. Rcpp gains wrapping of C++ containers of 64 bit integers as R objects of classes int64 and uint64 and RProtoBuf improves handling of 64 bit integers in protobuf messages. More on this later

Friday, May 21 2010

highlight 0.1-8

I've pushed version 0.1-8 of highlight to CRAN. highlight is a syntax highlighter for R that renders R source code into some markup language, the package ships html and latex renderers but is flexible enough to handle other formats. Syntax highlighting is based on information about the code gathered by a slightly modified version of the R parser, available in the separate parser package.

Internal code has been modified to take advantage of new features of Rcpp such as the DataFrame c++ class.

Since R 2.11.0, it is possible to install custom handlers to respond to http request (GET, POST, ...). highlight takes advantage of this and responds to urls with html syntax highlighted functions. So if the httpd port used by the dynamic help system is 9000 (hint: tools:::httpdPort) :

Sunday, February 14 2010

Rcpp 0.7.7

A good 2 days after 0.7.6 was released, here comes Rcpp 0.7.7. The reason for this release is that a subtle bug installed itself and we did not catch it in time

The new version also includes two new class templates : unary_call and binary_call that help integration of calls (e.g. Rcpp::Language objects) with STL algorithms. For example here is how we might use unary_call

This emulates the code

> lapply( 1:10, function(n) seq(from=n, to = 0 ) )

As usual, more examples in the unit tests

Saturday, February 13 2010

highlight 0.1-5

I've pushed the version 0.1-5 of highlight to CRAN, it should be available in a couple of days.

This version fixes highlighting of code when one wants to display the prompt and the continue prompt. For example, this code :

rnorm(10, 
	mean = 5)


runif(5)

gets highlighted like this:

using this code:

> highlight( "/tmp/test.R", renderer=renderer_html(document=T), showPrompts = TRUE, output = "test.html" )

Under the hood, highlight now depends on Rcpp and uses some of the C++ classes of the new Rcpp API. See the get_highlighted_text function in the code.

Thursday, February 4 2010

RProtoBuf: protocol buffers for R

We (Dirk and I) released the initial version of our package RProtoBuf to CRAN this week. This packages brings google's protocol buffers to R

I invite you to check out the main page for protobuf to find the language definition for protocol buffers as well as tutorial for officially (i.e. by google) supported languages (python, c++ and java) as well as the third party support page that lists language bindings offered by others (including our RProtoBuf package.

Protocol buffers are a language agnostic data interchange format, based on a using a simple and well defined language. Here comes the classic example that google uses for C++, java and python tutorials.

First, the proto file defines the format of the message.

Then you need to teach this particular message to R, which is simply done by the readProtoFiles function.

> readProtoFiles( "addressbook.proto" )

Now we can start creating messages :

> person <- new( tutorial.Person, 
+     name = "John Doe", 
+     id = 1234,
+     email = "jdoe@example.com" )

And then access, modify fields of the message using a syntax extremely close to R lists

> person$email <- "francoisromain@free.fr"
> person$name <- "Romain Francois"

In R, protobuf messages are stored as simple S4 objects of class "Message" that contain an external pointer to the underlying C++ object. The Message class also defines methods that can be accessed using the dollar operator

> # write a debug version of message
> # this is not how it is serialized
> writeLines( person$toString() )
name: "Romain Francois"
id: 1234
email: "francoisromain@free.fr"

> # serialize the message to a file
> person$serialize( "somefile" )

The package already has tons of features, detailed in the vignette

> vignette( "RProtoBuf" )

.. and there is more to come

Wednesday, January 13 2010

Rcpp 0.7.2

Rcpp 0.7.2 is out, checkout Dirk's blog for details

selected highlights from this new version:

character vectors

if one wants to mimic this R code in C

> x <- c( "foo", "bar" )
one ends up with this :
SEXP x = PROTECT( allocVector( STRSXP, 2) ) ;
SET_STRING_ELT( x, 0, mkChar( "foo" ) ) ;
SET_STRING_ELT( x, 1, mkChar( "bar" ) ) ;
UNPROTECT(1) ;
return x ;

Rcpp lets you express the same like this :

CharacterVector x(2) ;
x[0] = "foo" ; 
x[1] = "bar" ;

or like this if you have GCC 4.4 :

CharacterVector x = { "foo", "bar" } ;

environments, functions, ...

Now, we try to mimic this R code in C :
rnorm( 10L, sd = 100 )
You can do one of these two ways in Rcpp :
Environment stats("package:stats") ;
Function rnorm = stats.get( "rnorm" ) ;
return rnorm( 10, Named("sd", 100 ) ) ;

or :

Language call( "rnorm", 10, Named("sd", 100 ) ) ;
return eval( call, R_GlobalEnv ) ;

and it will get better with the next release, where you will be able to just call call.eval() and stats["rnorm"].

Using the regular R API, you'd write these liks this :

SEXP stats = PROTECT( R_FindNamespace( mkString("stats") ) ) ;
SEXP rnorm = PROTECT( findVarInFrame( stats, install("rnorm") ) ) ;
SEXP call  = PROTECT( LCONS( rnorm, CONS(ScalarInteger(10), CONS(ScalarReal(100.0), R_NilValue)))) ;
SET_TAG( CDDR(call), install("sd") ) ;
SEXP res = PROTECT( eval( call, R_GlobalEnv ) );
UNPROTECT(4) ;
return res ;

or :

SEXP call  = PROTECT( LCONS( install("rnorm"), CONS(ScalarInteger(10), CONS(ScalarReal(100.0), R_NilValue)))) ;
SET_TAG( CDDR(call), install("sd") ) ;
SEXP res = PROTECT( eval( call, R_GlobalEnv ) );
UNPROTECT(2) ;
return res ;

Tuesday, December 29 2009

C++ exceptions at the R level

The feature described in this post is no longer valid with recent versions of Rcpp. Setting a terminate handler does not work reliably on windows, so we don't do it at all anymore. Exceptions need to be caught and relayed to R. Bracketing the code with BEGIN_RCPP / END_RCPP does it simply. See the Rcpp-introduction vignette for details.

I've recently offered an extra set of hands to Dirk to work on the Rcpp package, this serves a good excuse to learn more about C++

Exception management was quite high on my list. C++ has nice exception handling (well not as nice as java, but nicer than C).

With previous versions of Rcpp, the idiom was to wrap up everything in a try/catch block and within the catch block, call the Rf_error function to send an R error, equivalent of calling stop. Now things have changed and, believe it or not, you can now catch a C++ exception at the R level, using the standard tryCatch mechanism

, so for example when you throw a C++ exception (inheriting from the class std::exception) at the C++ level, and the exception is not picked up by the C++ code, it automatically sends an R condition that contain the message of the exception (what the what member function of std::exception gives) as well as the class of the exception (including namespace)

This, combined with the new inline support for Rcpp, allows to run this code, (also available in the inst/examples/RcppInline directory of Rcpp)

require(Rcpp)
require(inline)
funx <- cfunction(signature(), '
throw std::range_error("boom") ;
return R_NilValue ;
', Rcpp=TRUE, verbose=FALSE)

Here, we create the funx "function" that compiles itself into a C++ function and gets dynamically linked into R (thanks to the inline package). The relevant thing (at least for this post) is the throw statement. We throw a C++ exception of class "std::range_error" with the message "boom", and what follows shows how to catch it at the R level:

tryCatch(  funx(), "C++Error" = function(e){
    cat( sprintf( "C++ exception of class '%s' : %s\n", class(e)[1L], e$message  ) )
} )
# or using a direct handler 
tryCatch(  funx(), "std::range_error" = function(e){
        cat( sprintf( "C++ exception of class '%s' : %s\n", class(e)[1L], e$message  ) )
} )

... et voila

Under the carpet, the abi unmangling namespace is at work, and the function that grabs the uncaught exceptions is much inspired from the verbose terminate handler that comes with the GCC

Part of this was inspired from the new java exception handling that came with the version 0.8-0 of rJava, but cooked with C++ ingredients

Tuesday, December 22 2009

CPP package: exposing C++ objects

I've just started working on the new package CPP, as usual the project is maintained in r-forge. The package aims at exposing C++ classes at the R level, starting from classes from the c++ standard template library.

key to the package is the CPP function (much inspired from the J function of rJava). The CPP function builds an S4 object of class "C++Class". The "C++Class" currently is a placeholder wrapping the C++ class name, and defines the new method (again this trick or making new S4 generic comes from rJava). For example to create an R object that wraps up a std::vector<int>, one would go like this:

x <- new( CPP( "vector<int>" ) )

This is no magic and don't expect to be able to send anything to CPP (C++ does not have reflection capabilities), currently only these classes are defined : std::vector<int>, vector<double>, vector<raw> and set<int>

Because C++ does not offer reflection capabilities, we have to do something else to be able to invoke methods on the wrapped objects. Currently the approach that the package follows is a naming convention. The $ method create the name of the C routine it wants to call based on the C++ class the object is wrapping, the name of the method, and the types of the input parameters. So for example calling the size method for a vector<:int> object yields this routine name: "vector_int____size", calling the push_back method of the vector<double> class, passing an integer vector as the first parameter yields this signature : "vector_double____push_back___integer" .... (the CPP:::getRoutineSignature implements the convention)

Here is a full example using the set<int> class. Sets are a good example of a data structure that is not available in R. Basically it keeps its objects sorted

> # create the object
> x <- new( CPP("set<int>") )
> # insert data using the insert method
> # see : insert
> x$insert( sample( 1:20 ) )
> # ask for the size of the set
> x$size()
[1] 20
> # bring it back as an R classic integer vector
> as.vector( x )
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

Currently the package is my excuse to learn about the standard template library, and it is quite possible that the functionality will be merged into the Rcpp it currently depends on. Because of this volatility, I'll use the Rcpp-devel mailing list instead of creating a new one.

Friday, December 11 2009

new R package : bibtex

I've pushed to CRAN the package bibtex package

The package defines the read.bib function that reads a file in the bibtex format. The code is based on bibparse

The read.bib function generates an object of class citationList, just like utils::citation

Sunday, November 22 2009

new R package : highlight

I finally pushed highlight to CRAN, which should be available in a few days. The package uses the information gathered by the parser package to perform syntax highlighting of R code

The main function of the package is highlight, which takes a number of argument including :

  • file : the file in which the R code is
  • output : some output connection or file name where to write the result (The default is standard output)
  • renderer : a collection of function controlling how to render code into a given markup language

The package ships three functions that create such renderers

  • renderer_html : renders in html/css
  • renderer_latex: renders in latex
  • renderer_verbatim: does nothing

And additionally, the xterm256 package defines a renderer that allows syntax highlighting directly in the console (if the console knows xterm 256 colors)

Let's assume we have this code file (/tmp/code.R)

f <- function( x){
        x + rnorm(1)
}

g <- function(x){}
h <- function(x){}

Then we can syntax highlight it like this :

> highlight( "/tmp/code.R", renderer = renderer_html(), output = "/tmp/code.R.html" )
> highlight( "/tmp/code.R", renderer = renderer_latex(), output = "/tmp/code.R.latex" )

which makes these files : code.R.html and code.R.latex

The package also ships a sweave driver that can highlight code chunks in a sweave document, but I'll talk about this in another post

Tuesday, August 4 2009

R parser package on CRAN

The parser package has been released to CRAN, the package mainly defines a function parser that is similar to the usual R function parse, with the few following differences:

  • The information about the location of each token is structured differently, in a data frame
  • location is gathered for all symbols from the source code, including terminal symbols (tokens), comments
  • An equal sign is identified to be either an assignment, the declaration of a formal argument or the use of an argument

Here is an example file containing R source code that we are going to parse with parser

#' a roxygen comment
f <- function( x = 3 ){
	
	# a regular comment
	rnorm(10 ) + runif( 10 )
	
}

It is a very simple file, for illustration purpose. Let's look what to do with it with the parser package

The parser generates a list of expressions, just like the regular parse function, but the gain is the data attribute. This is a data frame where each token of the parse tree is a line. The id column identifies each line, and the parent column identifies the parent of the current line.

At the moment, only the forthcoming highlight package uses the parser package (actually the parser package has been factored out of highlight), but some anticipated uses of the package include:

  • rework the codetools package so that it tells source location of potential problems
  • code coverage in RUnit or svUnit
  • rework the roxygen parser