Romain Francois, Professional R Enthusiast

To content | To menu | To search

Sunday, January 15 2012

Crawling facebook with R

So, let's crawl some data out of facebook using R. Don't get too excited though, this is just a weekend whatif project. Anyway, so for example, I want to download some photos where I'm tagged.

First, we need an access token from facebook. I don't know how to get this programmatically, so let's get one manually, log on to facebook and then go to the Graph API Explorer

graph_api_explorer.png

Grab the access token and save it into a variable in R

access_token <- "************..."

Now we need to study the graph api to figure out the url we need to build to do what we want to do, e.g. here we want "me/photos". I've wrapped this into an R function:

And then we can use it:

That's it, I told you it was not that exciting, but it was still worth playing with ...

Blogroll:

Wednesday, December 14 2011

... And now for solution 17, still using Rcpp

Here comes yet another sequel of the code optimization problem from the R wiki, still using Rcpp, but with a different strategy this time

Essentially, my previous version (15) was using stringstream although we don't really need its functionality and it was slowing us down

Also, the characters "i" and "." are always on the same position so we can assign them once and for all

So without further ado, here is attempt 17:

With quite a speedup from attempt 15:

                test replications elapsed relative
2 generateIndex17(n)           20   9.363 1.000000
1 generateIndex15(n)           20  17.795 1.900566

Saturday, November 26 2011

int64: 64 bit integer vectors for R

google-64.png

The Google Open Source Programs Office sponsored me to create the new int64 package that has been released to CRAN a few days ago. The package has been mentionned in an article in the open source blog from Google.

The package defines classes int64 and uint64 that represent signed and unsigned 64 bit integer vectors. The package also allows conversion of several types (integer, numeric, character, logical) to 64 bit integer vectors, arithmetic operations as well as other standard group generic functions, and reading 64 bit integer vectors as a data.frame column using int64 or uint64 as the colClasses argument.

The package has a vignette that details its features, several examples are given in the usual help files. Once again, I've used RUnit for quality insurance about the package code

int64 has been developped so that 64 bit integer vectors are represented using only R data structures, i.e data is not represented as external pointers to some C++ object. Instead, each 64 bit integer is represented as a couple of regular 32 bit integers, each of them carrying half the bits of the underlying 64 bit integer. This was a choice by design so that 64 bit integer vectors can be serialized and used as data frame columns.

The package contains C++ headers that third party packages can used (via LinkingTo: int64) to use the C++ internals. This allows creation and manipulation of the objects in C++. The internals will be documented in another vignette for package developpers who wish to use the internals. For the moment, the main entry point is the C++ template class LongVector.

I'm particularly proud that Google trusted me to sponsor the development of int64. The next versions of packages Rcpp and RProtoBuf take advantage of the facilities of int64, e.g. Rcpp gains wrapping of C++ containers of 64 bit integers as R objects of classes int64 and uint64 and RProtoBuf improves handling of 64 bit integers in protobuf messages. More on this later

Thursday, November 10 2011

Code optimization, an Rcpp solution

Tony Breyal woke up an old code optimization problem in this blog post, so I figured it was time for an Rcpp based solution

This solutions moves down Henrik Bengtsson's idea (which was at the basis of attempt 10) down to C++. The idea was to call sprintf less than the other solutions to generate the strings "001", "002", "003", ...

We can benchmark this version using the rbenchmark package:

> library(rbenchmark)
> n <- 2000
> benchmark(
+     generateIndex10(n), 
+     generateIndex11(n),
+     generateIndex12(n), 
+     generateIndex13(n),
+     generateIndex14(n),
+     columns = 
+        c("test", "replications", "elapsed", "relative"),
+     order = "relative",
+     replications = 20
+ )
                test replications elapsed relative
5 generateIndex14(n)           20  21.015 1.000000
3 generateIndex12(n)           20  22.034 1.048489
4 generateIndex13(n)           20  23.436 1.115203
2 generateIndex11(n)           20  23.829 1.133904
1 generateIndex10(n)           20  30.580 1.455151
>    

Sunday, October 30 2011

Rcpp reverse dependency graph

I played around with reverse dependencies of Rcpp. At the moment, 44 packages depend on Rcpp and the number goes up to 53 when counting recusive reverse dependencies.

I've used graphviz for the representation of the directed graph

dep.png

Here is the code I've used to generate the dot file:

Tuesday, October 11 2011

R Bloggers widget in R Graph Gallery

Following last post about partnership with R Bloggers, Tal and I have added a small widget to the gallery main page to present links to recent posts on R Bloggers

rbloggers-widget.png

It uses the wordpress api to grab information about the rss feed generated by R Bloggers and displays links one at a time using the same jquery magic as we've used in the widget that was integrated in R Bloggers a few days ago

Saturday, October 8 2011

R Graph Gallery widget in R Bloggers

The R Bloggers website, maintained by Tal Galili, aggregates blogs (including mine) from many people of the R community.

Tal and I have been wondering about how to tight R Bloggers with the gallery, supporting each other's website. To that extent, I've made a quick and dirty widget, using the jquery cycle plugin that is now on the right sidebar of R bloggers, inside the related sites box.

rbloggers.png

The widget first chooses 20 items from the gallery at random, and then cycles through them.

This is an initial design made specifically for R Bloggers, but it is quite likely that I will improve on this and make the widget more generic so that other website can use it to advertise for the gallery.

Monday, October 3 2011

Twitter updates on R Graph Gallery

I've added a twitter search widget that searches for the #rgraphgallery hashtag or the url of the gallery on the front page.

twitter.png

Friday, September 30 2011

R Graph Gallery - Donations Welcome

I've added a PayPal button into the graph, just in case people want to help the development of the website

paypal_button.png

Thursday, September 22 2011

Facebook page about the Graph Gallery

I've just created a facebook page about the R Graph Gallery

I hope this will improve the experience of the website by making it more social, for example, I anticipate that people will share their own graphs by sending a picture on the facebook page wall

As part of this, I've added the usual "find us on facebook" widget on the home page of the gallery

facebook_page.png

Wednesday, September 21 2011

More facebook and google plus on the Graph Gallery

Following up on yesterday's post about facebook like box, I've added some more social things into the gallery. The main page gains a google plus "plus one" button, and each graph page now has a +1 button, a facebook like button, and a facebook comment box

Capture_d_ecran_2011-09-21_a_21.40.04.png

Tuesday, September 20 2011

Facebook like button in Graph Gallery

I've added facebook like button in the home page of the R Graph Gallery and on each image page, i.e. this one which I "like".

Capture_d_ecran_2011-09-20_a_22.46.23.png

Monday, July 18 2011

now in google+

You can now find me in google+, and still on facebook

Friday, April 29 2011

Rcpp Workshop slides

Dirk and I gave a full day Rcpp workshop yesterday in Chicago before the R in Finance conference.

The pdfs of the slides are available here: part 1 (intro), part 2 (details), part 3 (modules) and part 4 (applications)

Sunday, April 17 2011

Rcpp article in JSS

The Journal of Statistical Software published our Rcpp article

Wednesday, March 30 2011

Rcpp workshop in Chicago on April 28th

Overview

This year's R/Finance conference will be preceded by a full-day masterclass on Rcpp and related topics which will be held on Thursday, April 28, 2011, the Univ. of Illinois at Chicago campus.

Join Dirk Eddelbuettel and Romain Francois for six hours of detailed and hands-on instructions and discussions around Rcpp, inline, RInside, RcppArmadillo and other packages---in intimate small-group setting.

The full-day format allows to combine a morning introductory session with a more advanced afternoon session while leaving room for sufficient breaks. There will be about six hours of instructions, a one-hour lunch break and two half-hour coffee breaks.

Morning session: "A hands-on introduction to R and C++"

The morning session will provide a practical introduction to the Rcpp package (and other related packages). The focus will be on simple and straightforward applications of Rcpp in order to extend R and/or to significantly accelerate the execution of simple functions.

The tutorial will cover the inline package which permits embedding of self-contained C, C++ or Fortran code in R scripts. We will also discuss RInside to embed R code in C++ applications, as well as standard Rcpp extension packages such as RcppArmadillo for linear algebra and RcppGSL.

Afternoon session: "Advanced R and C++ topics"

This afternoon tutorial will provide a hands-on introduction to more advanced Rcpp features. It will cover topics such as writing packages that use Rcpp, how 'Rcpp modules' and the new R ReferenceClasses interact, and how 'Rcpp sugar' lets us write C++ code that is often as expressive as R code. Another possible topic, time permitting, may be writing glue code to extend Rcpp to other C++ projects.

We also hope to leave some time to discuss problems brought by the class participants.

Prerequisites

Knowledge of R as well as general programming knowledge; C or C++ knowledge is helpful but not required.

Users should bring a laptop set up so that R packages can be built. That means on Windows, Rtools needs to be present and working, and on OS X the Xcode package should be installed.

Registration

Registration is available via the R/Finance conference at

http://www.RinFinance.com/register/

or directly at RegOnline

http://www.regonline.com/930153

The cost is USD 500 for the whole day, and space will be limited.

Questions

Please contact us directly at RomainAndDirk@r-enthusiasts.com

Thursday, March 3 2011

Eponyme : 40 minutes stand up

I'll play my 40 minutes one man show in Montpellier on March 15th, and a small extract at the next Montpellier Comédie Club next wednesday

eponyme-affiche.jpg

Wednesday, March 2 2011

Rcpp at Geneva-R

I'll present Rcpp at the inaugural Geneva-R meeting.

Geneva-R is an informal gathering of R enthusiasts sponsored by Mango Solutions, that builds on the success of London-R, where I presented twice, and Basel-R

Friday, January 28 2011

Facebook me, I'm famous

After years of resistance, here I am on facebook

Tuesday, January 18 2011

Back to Stand-Up

This is I guess off topic in this blog, but after a few years off, I'm back on stage doing some stand up comedy, as part of the Montpellier Comédie Club

See me on youtube:

Some press coverage and pictures: soonlight le-macadam-303380_0.jpg le-macadam-303380_1.jpg le-macadam-303380_12.jpg le-macadam-303380_13.jpg

Local news report (France 3 Montpellier)

- page 1 of 7