Romain Francois, Professional R Enthusiast

To content | To menu | To search

Tag - RProtoBuf

Entries feed - Comments feed

Saturday, November 26 2011

int64: 64 bit integer vectors for R

google-64.png

The Google Open Source Programs Office sponsored me to create the new int64 package that has been released to CRAN a few days ago. The package has been mentionned in an article in the open source blog from Google.

The package defines classes int64 and uint64 that represent signed and unsigned 64 bit integer vectors. The package also allows conversion of several types (integer, numeric, character, logical) to 64 bit integer vectors, arithmetic operations as well as other standard group generic functions, and reading 64 bit integer vectors as a data.frame column using int64 or uint64 as the colClasses argument.

The package has a vignette that details its features, several examples are given in the usual help files. Once again, I've used RUnit for quality insurance about the package code

int64 has been developped so that 64 bit integer vectors are represented using only R data structures, i.e data is not represented as external pointers to some C++ object. Instead, each 64 bit integer is represented as a couple of regular 32 bit integers, each of them carrying half the bits of the underlying 64 bit integer. This was a choice by design so that 64 bit integer vectors can be serialized and used as data frame columns.

The package contains C++ headers that third party packages can used (via LinkingTo: int64) to use the C++ internals. This allows creation and manipulation of the objects in C++. The internals will be documented in another vignette for package developpers who wish to use the internals. For the moment, the main entry point is the C++ template class LongVector.

I'm particularly proud that Google trusted me to sponsor the development of int64. The next versions of packages Rcpp and RProtoBuf take advantage of the facilities of int64, e.g. Rcpp gains wrapping of C++ containers of 64 bit integers as R objects of classes int64 and uint64 and RProtoBuf improves handling of 64 bit integers in protobuf messages. More on this later

Saturday, October 23 2010

Google slides

Last stop on my World tour was Google headquarters in Mountain View, California, where Dirk and I presented Rcpp, RInside, RProtoBuf, etc ... for 90 minutes today. The talk was recorded, and will be broadcasted on youtube at some point. In the meantime, the slides are available here:

Tuesday, July 27 2010

useR! 2010

I was in useR! last week, it was great to catch up with friends, see what people are doing with R, tell people what I am doing with R, etc ... the conference was great

This year I presented with Dirk in Laurel and Hardy mode and I've uploaded our slides in my slideshare account

I also took some time to visit Washington and take a few pictures (tagged with user2010 on flickr

Thursday, February 4 2010

RProtoBuf: protocol buffers for R

We (Dirk and I) released the initial version of our package RProtoBuf to CRAN this week. This packages brings google's protocol buffers to R

I invite you to check out the main page for protobuf to find the language definition for protocol buffers as well as tutorial for officially (i.e. by google) supported languages (python, c++ and java) as well as the third party support page that lists language bindings offered by others (including our RProtoBuf package.

Protocol buffers are a language agnostic data interchange format, based on a using a simple and well defined language. Here comes the classic example that google uses for C++, java and python tutorials.

First, the proto file defines the format of the message.

Then you need to teach this particular message to R, which is simply done by the readProtoFiles function.

> readProtoFiles( "addressbook.proto" )

Now we can start creating messages :

> person <- new( tutorial.Person, 
+     name = "John Doe", 
+     id = 1234,
+     email = "jdoe@example.com" )

And then access, modify fields of the message using a syntax extremely close to R lists

> person$email <- "francoisromain@free.fr"
> person$name <- "Romain Francois"

In R, protobuf messages are stored as simple S4 objects of class "Message" that contain an external pointer to the underlying C++ object. The Message class also defines methods that can be accessed using the dollar operator

> # write a debug version of message
> # this is not how it is serialized
> writeLines( person$toString() )
name: "Romain Francois"
id: 1234
email: "francoisromain@free.fr"

> # serialize the message to a file
> person$serialize( "somefile" )

The package already has tons of features, detailed in the vignette

> vignette( "RProtoBuf" )

.. and there is more to come