Romain Francois, Professional R Enthusiast

To content | To menu | To search

Saturday, November 26 2011

int64: 64 bit integer vectors for R

google-64.png

The Google Open Source Programs Office sponsored me to create the new int64 package that has been released to CRAN a few days ago. The package has been mentionned in an article in the open source blog from Google.

The package defines classes int64 and uint64 that represent signed and unsigned 64 bit integer vectors. The package also allows conversion of several types (integer, numeric, character, logical) to 64 bit integer vectors, arithmetic operations as well as other standard group generic functions, and reading 64 bit integer vectors as a data.frame column using int64 or uint64 as the colClasses argument.

The package has a vignette that details its features, several examples are given in the usual help files. Once again, I've used RUnit for quality insurance about the package code

int64 has been developped so that 64 bit integer vectors are represented using only R data structures, i.e data is not represented as external pointers to some C++ object. Instead, each 64 bit integer is represented as a couple of regular 32 bit integers, each of them carrying half the bits of the underlying 64 bit integer. This was a choice by design so that 64 bit integer vectors can be serialized and used as data frame columns.

The package contains C++ headers that third party packages can used (via LinkingTo: int64) to use the C++ internals. This allows creation and manipulation of the objects in C++. The internals will be documented in another vignette for package developpers who wish to use the internals. For the moment, the main entry point is the C++ template class LongVector.

I'm particularly proud that Google trusted me to sponsor the development of int64. The next versions of packages Rcpp and RProtoBuf take advantage of the facilities of int64, e.g. Rcpp gains wrapping of C++ containers of 64 bit integers as R objects of classes int64 and uint64 and RProtoBuf improves handling of 64 bit integers in protobuf messages. More on this later

Tuesday, September 8 2009

new R package : ant

The ant package has been released to CRAN yesterday. As discussed in previous posts in this blog (here and here), the ant R package provides an R-aware version of the ant build tool from the apache project.

The package contains an R script that can be used to invoke ant with enough plumbing so that it can use R code during the build process. Calling the script is further simplified with the ant function included in the package.

$ Rscript -e "ant::ant()"

The simplest way to take advantage of this package is to add it to the Depends list of yours, include a java source tree somewhere in your package tree (most likely somewhere in the inst tree) with a build.xml file, and include a configure and configure.win script at the root of the package that contains something like this:

#!/bin/sh

cd inst/java_src
"${R_HOME}/bin/Rscript" -e "ant::ant()"
cd ../..

This will be further illustrated with the demo package helloJavaWorld in future posts

Thursday, September 3 2009

update on the ant package

I have updated the ant package I described yesterday in this blog to add several things

  • Now the R code related to <r-set> and <r-run> tasks can either be given as the code attribute or as the text inside the task
  • The R code has access to special variables to manipulate the current project (project) and the current task (self) which can be used to set properties, get properties, ...
  • The package contains ant ant function so that ant can be invoked using a simple Rscript call, see below

The package now includes a demonstrative build.xml file in the examples directory

Here is the result

Wednesday, September 2 2009

R capable version of ant

ant is an amazing build tool. I have been using ant for some time to build the java code that lives inside the src directories of my R packages, see this post for example.

The drawbacks of this approach are :

  • that it assumes ant is available on the system that builds the package
  • You cannot use R code within the ant build script

The ant package for R is developed to solve these two issues. The package is source-controlled in r-forge as part of the orchestra project

Once installed, you find an ant.R script in the exec directory of the package. This script is pretty similar to the usual shell script that starts ant, but it sets it so that it can use R with the following additional tasks

  • <r-run> : to run arbitrary R code
  • <r-set> : to set a property of the current project with the result of an R expression

Here is an example build file that demonstrate how to use these tasks

Here is what happens when we call the R special version of ant with this build script

$ `Rscript -e "cat( system.file( 'exec', 'ant.R', package = 'ant') )"`
Buildfile: build.xml

test:
     [echo] 
     [echo]   	R home        : /usr/local/lib/R
     [echo]   	R version     : R version 2.10.0 Under development (unstable) (2009-08-05 r49067)
     [echo]   	rJava home    : /usr/local/lib/R/library/rJava
     [echo]   	rJava version : 0.7-1
     [echo]  

BUILD SUCCESSFUL
Total time: 1 second

Tuesday, August 4 2009

R parser package on CRAN

The parser package has been released to CRAN, the package mainly defines a function parser that is similar to the usual R function parse, with the few following differences:

  • The information about the location of each token is structured differently, in a data frame
  • location is gathered for all symbols from the source code, including terminal symbols (tokens), comments
  • An equal sign is identified to be either an assignment, the declaration of a formal argument or the use of an argument

Here is an example file containing R source code that we are going to parse with parser

#' a roxygen comment
f <- function( x = 3 ){
	
	# a regular comment
	rnorm(10 ) + runif( 10 )
	
}

It is a very simple file, for illustration purpose. Let's look what to do with it with the parser package

The parser generates a list of expressions, just like the regular parse function, but the gain is the data attribute. This is a data frame where each token of the parse tree is a line. The id column identifies each line, and the parent column identifies the parent of the current line.

At the moment, only the forthcoming highlight package uses the parser package (actually the parser package has been factored out of highlight), but some anticipated uses of the package include:

  • rework the codetools package so that it tells source location of potential problems
  • code coverage in RUnit or svUnit
  • rework the roxygen parser