Romain Francois, Professional R Enthusiast

To content | To menu | To search

Thursday, March 26 2009

Build java software with "R CMD build" and ant

The helloJavaWorld is a simple R package that shows how to distribute java software with an R package and communicate with it by means of the rJava package. helloJavaWorld has a vignette showing the different steps involved in making such a package.

Basically, helloJavaWorld uses the inst directory of an R package structure to ship the jar file in which the java software is packaged.

This post goes a bit further and shows how we can distribute the source of the java software and make R compile it when we run R CMD build. For that we are naturally going to use the src part of the R package, leading to this structure:

.
|-- DESCRIPTION
|-- NAMESPACE
|-- R
|   |-- helloJavaWorld.R
|   `-- onLoad.R
|-- inst
|   |-- doc
|   |   |-- helloJavaWorld.Rnw
|   |   |-- helloJavaWorld.pdf
|   |   `-- helloJavaWorld.tex
|   `-- java
|       `-- hellojavaworld.jar
|-- man
|   `-- helloJavaWorld.Rd
`-- src
    |-- Makevars
    |-- build.xml
    `-- src
        `-- HelloJavaWorld.java

7 directories, 12 files

Only the src directory differs from the version of helloJavaWorld that is on cran. Let's have a look at the files that are in src:

helloJavaWorld.java is the same as the code we can read in helloJavaWorld's vignette

   1 public class HelloJavaWorld {
   2    
   3   public String sayHello() {
   4     String result = new String("Hello Java World!");
   5     return result;
   6   }
   7 
   8   public static void main(String[] args) {
   9   }
  10 
  11 } 

build.xml is a simple ant script. Ant is typically used to build java software. This build script is very simple. It defines the following targets:

  • clean: removes the bin directory we use to store compiled class files
  • compile: compiles all java classes found in src into bin
  • build: package the java classes into the hellojavaworld.jar file, that we store in the inst/java directory to comply with the initial package structure
   1 <project name="Hello Java World" basedir="." default="build" >
   2 
   3   <property name="target.dir" value="../inst/java" />
   4   
   5   <target name="clean">
   6     <delete dir="bin" />
   7   </target>
   8   
   9   <target name="compile">
  10     <mkdir dir="bin"/>
  11     <javac srcdir="src" destdir="bin" />
  12   </target>
  13   
  14   <target name="build" depends="compile">
  15     <jar jarfile="${target.dir}/hellojavaworld.jar">
  16       <fileset dir="bin" />
  17     </jar>
  18   </target>
  19   
  20   
  21 </project>

Next, is the Makevars file. When an R package is built, R looks into the src directory for a Makevars file, which would typically be used to indicate how to compile the source code that is in the package. We simply use the Makevars file to launch the building and cleaning with ant, so we have a simple Makevars file:

   1 .PHONY: all
   2 
   3 clean:
   4     ant clean
   5 
   6 all: clean
   7     ant build
   8 

See Writing R extensions for details on the Makevars file

And now we can R CMD build the package:

$ R CMD build helloJavaWorld
* checking for file 'helloJavaWorld/DESCRIPTION' ... OK
* preparing 'helloJavaWorld':                          
* checking DESCRIPTION meta-information ... OK         
* cleaning src                                         
ant clean                                              
Buildfile: build.xml                                   

clean:

BUILD SUCCESSFUL
Total time: 0 seconds
* installing the package to re-build vignettes
* Installing *source* package ‘helloJavaWorld’ ...
** libs                                           
ant clean                                         
Buildfile: build.xml

clean:

BUILD SUCCESSFUL
Total time: 0 seconds
ant build
Buildfile: build.xml

compile:
    [mkdir] Created dir: /home/romain/svn/helloJavaWorld/src/bin
    [javac] Compiling 1 source file to /home/romain/svn/helloJavaWorld/src/bin

build:
      [jar] Building jar: /home/romain/svn/helloJavaWorld/inst/java/hellojavaworld.jar

BUILD SUCCESSFUL
Total time: 1 second
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
 >>> Building/Updating help pages for package 'helloJavaWorld'
     Formats: text html latex example
  helloJavaWorld                    text    html    latex   example
** building package indices ...
* DONE (helloJavaWorld)
* creating vignettes ... OK
* cleaning src
ant clean
Buildfile: build.xml

clean:
   [delete] Deleting directory /home/romain/svn/helloJavaWorld/src/bin

BUILD SUCCESSFUL
Total time: 0 seconds
* removing junk files
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building 'helloJavaWorld_0.0-7.tar.gz'

Download this version of helloJavaWorld: helloJavaWorld_0.0-7.tar.gz

This approach relies on ant being available, which we can specify in the SystemRequirements in the DESCRIPTION file

SystemRequirements: Java (>= 5.0), ant

Next time, we will see how to trick the documentation system so that it builds javadoc files

Monday, March 16 2009

A better jedit edit mode for R

I have spend a bit of time over the week end working with the jedit edit mode file for R code. That is the file that guides jedit on how to syntax highlight R code. The previous one I was using was based on the idea "let's put all the names of all the functions in standard R packages as keywords", although "it works", it is not a very good idea since it makes the edit mode huge and consequently must have some effect on jedit's performance when painting R code

The new mode file can be found on the biocep-editor project on r-forge

Here are some of the choices I have made

  • function calls are highlighted with type FUNCTION. A function call is a name followed by a round bracket
  • Core language constructs are highlighted with type KEYWORD1: (for, function, if, else, ifelse, in, repeat, return, switch, while, break, next) I have also added these to that list: (expression, quote, parse, deparse, substitute, get, getAnywhere, assign, exists, expression, bquote, typeof, mode, eval, evalq, with).
  • Debugging related functions are highlighted with type KEYWORD2: browser, debug, trace, traceback, recover, undebug, isdebugged, bp, mtrace
  • Error handling functions are highlighted using KEYWORD3: try, tryCatch, withRestarts, withCallingHandlers, stop, stopifnot, geterrmessage, warning, signalCondition, simpleCondition, simpleError, simpleWarning, simpleMessage, conditionCall, conditionMessage, computeRestarts, findRestart, invokeRestart, invokeRestartInteractively, isRestart, restartDescription, restartFormals, .signalSimpleWarning, .handleSimpleError
  • Object Oriented related functions (S3 and S4) are using type KEYWORD4: class, inherits, setClass, representation, structure, methods, setIs, slot, new, setMethod, validObject, setValidity, getValidity, initialize, setOldClass, callNextMethod, NextMethod, UseMethod, getS3method
  • Constants are using type LITERAL2: NULL, Inf, NULL, NA, NaN, T, TRUE, F, FALSE, pi, NA_character_, NA_complex_, NA_integer_, NA_real_
  • Apply functions are using LITERAL4: lapply, sapply, by, mapply, tapply, apply, replicate, aggregate. I have also added some functions from the packages reshape and plyr to that list
  • Support for R4X by delegating to the R4X mode (mainly XML) between strings "'##((xml" and "'##xml))"
  • Support for Roxygen comment, inspired from the way javadoc comments are treated in the java mode

Sunday, March 15 2009

RGG#152: Correlation circles

For those out there looking for yet another way to represent a correlation matrix, Taiyun Wei has submitted the correlation circles graph_152.png

Wednesday, March 11 2009

biocep editor plugin API

For those interested, I have started the process of writing API documentation for the biocep editor plugin. Note that is far from being finished.

Tuesday, March 10 2009

Google Summer of Code idea

GSoC09.png

I've send today (right at the deadline point) an idea for a Google Summer of Code project to create an integrated debugger for R. The plan is to replace the tcltk front-end of the "debug" package with something giving a better user-experience. See the R Google Summer of Code page for details

Sunday, March 8 2009

Goldbach's Comet - take 2

Following this post, there is still room for improvement. Recall the last implementation (goldbach5)

12 goldbach5 <- function(n) {
13     xx <- 1 : n
14     xx <- xx[isprime(xx) > 0][-1]
15     
16     # generates row indices
17     x <- function(N){ 
18         rep.int( 1:N, N:1) 
19     }
20     # generates column indices
21     y <- function(N){ 
22         unlist( lapply( 1:N, seq.int, to = N ) ) 
23     }
24     z <- xx[ x(length(xx)) ] + xx[ y(length(xx)) ]
25     z <- z[z <= n]
26     tz <- tabulate(z, n )
27     tz[ tz != 0 ]
28 }
29 

The first thing to notice right away is that when we build xx in the first place, we are building integers from 1 to n, and check if they are prime afterwards. In that case, we are only going to need odd numbers in xx, so we can build them dircetly as:

31     xx <- seq.int( 3, n, by = 2)
32     xx <- xx[isprime(xx) > 0]

The next thing is that, even though the goldbach5 version only builds the upper triangle of the matrix, which saves some memory, we don't really need all the numbers, since eventually we just want to count how many times each of them appears. For that, there is no need to store them all in memory.

The version goldbach6 below takes this idea forward implementing the counting in C using the .C interface. See this section of writing R extensions for details of the .C interface.

30 goldbach6 <- function( n ){
31     xx <- seq.int( 3, n, by = 2)
32     xx <- xx[isprime(xx) > 0]
33     out <- integer( n )
34     tz <- .C( "goldbach", xx =as.integer(xx), nx = length(xx), 
35       out = out, n = as.integer(n), DUP=FALSE )$out
36     tz[ tz != 0 ]
37 }
38 

and the C function that goes with it

   1 #include <R.h>
   2  
   3 void goldbach(int * xx, int* nx, int* out, int* n){
   4     int i,j,k;
   5     
   6     for( i=0; i<*nx; i++){
   7         for( j=i; j<*nx; j++){
   8             k = xx[i] + xx[j] ;
   9             if( k > *n){
  10                 break;
  11             }
  12             out[k-1]++ ;
  13         }
  14     }
  15 }
  16 

We need to build the shared object

$ R CMD SHLIB goldbach.c
gcc -std=gnu99 -I/usr/local/lib/R/include  -I/usr/local/include    -fpic  -g -O2 -c goldbach.c -o goldbach.o
gcc -std=gnu99 -shared -L/usr/local/lib -o goldbach.so goldbach.o   -L/usr/local/lib/R/lib -lR

and load it in R:

> dyn.load( "goldbach.so" )

And now, let's see if it was worth the effort

> system.time( out <- goldbach6(100000) )
   user  system elapsed
  0.204   0.005   0.228
> system.time( out <- goldbach5(100000) )
   user  system elapsed
  4.839   2.630   7.981
> system.time( out <- goldbach4(100000) )
   user  system elapsed
 28.425   5.932  38.380

We could also have a look at the memoty footprint of each of the three functions using the memory profiler.

> gc(); Rprof( "goldbach4.out", memory.profiling=T ); out <- goldbach4(100000); Rprof(NULL) 
         used (Mb) gc trigger  (Mb)  max used   (Mb)                                        
Ncells 150096  4.1     350000   9.4    350000    9.4                                        
Vcells 213046  1.7  104145663 794.6 199361285 1521.1                                        
> gc(); Rprof( "goldbach5.out", memory.profiling=T ); out <- goldbach5(100000); Rprof(NULL) 
         used (Mb) gc trigger   (Mb)  max used   (Mb)                                       
Ncells 150093  4.1     350000    9.4    350000    9.4                                       
Vcells 213043  1.7  162727615 1241.6 199361285 1521.1                                       
> gc(); Rprof( "goldbach6.out", memory.profiling=T ); out <- goldbach6(100000); Rprof(NULL) 
         used (Mb) gc trigger  (Mb)  max used   (Mb)                                        
Ncells 150093  4.1     350000   9.4    350000    9.4                                        
Vcells 213043  1.7  130182091 993.3 199361285 1521.1  
> rbind( summaryRprof( filename="goldbach4.out", memory="both" )$by.total[1,] ,
+        summaryRprof( filename="goldbach5.out", memory="both" )$by.total[1,],
+        summaryRprof( filename="goldbach6.out", memory="both" )$by.total[1,] )
            total.time total.pct mem.total self.time self.pct
"goldbach4"      32.08       100     712.6      1.26      3.9
"goldbach5"       6.66       100     306.7      2.80     42.0
"goldbach6"       0.22       100       0.2      0.00      0.0

Saturday, March 7 2009

Goldbach's Comet

Murali Menon has posted on his blog code to calculate Goldbach partitions. Murali describes his approach to write the function, starting from brute force approach of loops, though the use of the Vectorize function, to some further optimized code using outer

This post is a follow up on Murali's attempt refining the extra mile

This is the last implementation on Murali's blog :

 1 goldbach4 <- function(n) {
 2     xx <- 1 : n
 3     xx <- xx[isprime(xx) > 0]
 4     xx <- xx[-1]
 5     z <- as.numeric(upperTriangle(outer(xx, xx, "+"), 
 6                     diag = TRUE))
 7     z <- z[z <= n]
 8     hist(z, plot = FALSE, 
 9          breaks = seq(4, max(z), by = 2))$counts
10 }
11 

As pointed out on the blog, although this is fast thanks to the clever vectorization of outer, there is some frustration of having to allocate a matrix of size N * N when you only need the upper triangle ( N*(N+1)/2 ). Furthermore, if we look in outer, we see that not only an N*N sized vector is created for the result (robj), but also for the vectors X and Y:

31         FUN <- match.fun(FUN)
32         Y <- rep(Y, rep.int(length(X), length(Y)))
33         if (length(X)) 
34             X <- rep(X, times = ceiling(length(Y)/length(X)))
35         robj <- FUN(X, Y, ...)
36         dim(robj) <- c(dX, dY)
37     }

This reminded me of the fun we had a few years ago with a similar problem. See the R wiki for a detailed optimization, and I am borrowing Tony Plate's idea for the goldbach5 approach here. The idea is basically to figure out the indices of the upper triangle part of the matrix before calculating it:

12 goldbach5 <- function(n) {
13     xx <- 1 : n
14     xx <- xx[isprime(xx) > 0]
15     xx <- xx[-1]
16     
17     # generates row indices
18     x <- function(N){ 
19         rep.int( 1:N, N:1) 
20     }
21     # generates column indices
22     y <- function(N){ 
23         unlist( lapply( 1:N, seq.int, to = N ) ) 
24     }
25     z <- xx[ x(length(xx)) ] + xx[ y(length(xx)) ]
26     z <- z[z <= n]
27     tz <- tabulate(z, n )
28     tz[ tz != 0 ]
29 }
30 

This gives a further boost to the execution time (only really visible with large n)

> system.time( out <- goldbach4(100000) )
   user  system elapsed
 28.268   5.389  36.347
>
> system.time( out <- goldbach5(100000) )
   user  system elapsed
  4.927   1.873   7.734

Let's take a look at the output from the profiler output for both functions

> Rprof( "goldbach5.out" ); out <- goldbach5(100000); Rprof( NULL)
> summaryRprof( "goldbach5.out" )$by.total
            total.time total.pct self.time self.pct
"goldbach5"       6.60     100.0      2.88     43.6
"<="              1.60      24.2      1.60     24.2
"+"               0.72      10.9      0.72     10.9
".C"              0.48       7.3      0.48      7.3
"unlist"          0.48       7.3      0.30      4.5
"tabulate"        0.48       7.3      0.00      0.0
"y"               0.48       7.3      0.00      0.0
"rep.int"         0.36       5.5      0.36      5.5
"x"               0.36       5.5      0.00      0.0
"lapply"          0.18       2.7      0.18      2.7
".Call"           0.08       1.2      0.08      1.2
"isprime"         0.08       1.2      0.00      0.0

> Rprof( "goldbach4.out" ); out <- goldbach4(100000); Rprof( NULL)
> summaryRprof( "goldbach4.out" )$by.total
                 total.time total.pct self.time self.pct
"goldbach4"           31.60     100.0      1.28      4.1
"upperTriangle"       23.56      74.6      2.10      6.6
"upper.tri"           12.48      39.5      0.00      0.0
"outer"                8.98      28.4      7.38     23.4
"col"                  5.96      18.9      5.96     18.9
"row"                  5.40      17.1      5.40     17.1
"hist.default"         5.24      16.6      0.86      2.7
"hist"                 5.24      16.6      0.00      0.0
".C"                   3.98      12.6      3.98     12.6
"<="                   2.02       6.4      2.02      6.4
"FUN"                  1.60       5.1      1.60      5.1
"as.numeric"           0.54       1.7      0.54      1.7
"max"                  0.22       0.7      0.22      0.7
"seq"                  0.22       0.7      0.00      0.0
"seq.default"          0.22       0.7      0.00      0.0
"is.finite"            0.16       0.5      0.16      0.5
".Call"                0.08       0.3      0.08      0.3
"isprime"              0.08       0.3      0.00      0.0
"sort.int"             0.02       0.1      0.02      0.1
""          0.02       0.1      0.00      0.0
"median.default"       0.02       0.1      0.00      0.0
"sort"                 0.02       0.1      0.00      0.0
"sort.default"         0.02       0.1      0.00      0.0

Or a graphical display (see the R wiki for the perl script that makes the graph):

goldbach4

goldbach4.png

goldbach5

goldbach5.png

The question is now, can we go further. I believe we can, because we still allocate a lot of things we trash eventually, any takers ?

What functions are called by my function

Quite often, it can be useful to identify which functions are being called by an R function. There are many ways to achieve this, such as for example massage the text representation of the function with regular expressions to basically find out what is just before round brackets.

The codetools package actually provides a much better way to do that, with the walkCode function.

# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see .

#' Gets the functions called by fun
#' 
#' @param fun a function, or a character string
#' @return a named vector of occurences of each function, the values are 
#'         the number of times and the names are the functions
callees <- function( fun ){

    ## dump the function and read it back in the expression e
    # TODO: is there a better way
    #       If I just use body( fun ), I don't get the arguments
    fun <- match.fun( fun )
    con <- textConnection( NULL, open = "w" )
    dump( "fun", con )
    e <- parse( text = textConnectionValue(con) )[[1]]
    close( con )


    # initiate the functions vector whcih will be populated within
    # the code walker
    functions <- NULL
    env <- environment()

    # a code walker (see package codetools) that records function calls
    # this is inspired from the code walker used by showTree
    cw <- makeCodeWalker (
        call = function (e, w) {
            if( is.null(e) || length(e) == 0 ) return()

            # add the current function to the list
            env[["functions"]] <-
                    c( env[["functions"]], as.character(e[[1]]) )

            # process the list of expressions
            w$call.list( e[-1] , w )

        },
        leaf = function( e, w ){
            # deal with argument list of functions
            if( typeof( e ) == "pairlist" ){
                w$call.list( e, w )
            }
        },
        call.list = function( e, w ){
            for( a in as.list(e) ){
                if( !missing( a) ){
                    walkCode( a, w )
                }
            }
        },
        env = env # so that we can populate "functions"
    )

    # walk through the code with our code walker
    walkCode( e,  w = cw )

    # clean the output
    out <- table( functions )
    out[ order( names(out) ) ]

}
Let's try this on the jitter function:
> require( codetools )
> source("http://romainfrancois.blog.free.fr/public/posts/callees/callees.R")
> jitter
function (x, factor = 1, amount = NULL)
{
    if (length(x) == 0L)
        return(x)
    if (!is.numeric(x))
        stop("'x' must be numeric")
    z <- diff(r <- range(x[is.finite(x)]))
    if (z == 0)
        z <- abs(r[1L])
    if (z == 0)
        z <- 1
    if (is.null(amount)) {
        d <- diff(xx <- unique(sort.int(round(x, 3 - floor(log10(z))))))
        d <- if (length(d))
            min(d)
        else if (xx != 0)
            xx/10
        else z/10
        amount <- factor/5 * abs(d)
    }
    else if (amount == 0)
        amount <- factor * (z/50)
    x + stats::runif(length(x), -amount, amount)
}

> callees( jitter )
functions
        <-         ==          -         ::          !         !=          /
        10          4          2          1          1          1          4
         (          [          {          *          +        abs       diff
         1          2          2          2          1          2          2
     floor   function         if  is.finite    is.null is.numeric     length
         1          1          8          1          1          1          3
     log10        min      range     return      round      runif   sort.int
         1          1          1          1          1          1          1
     stats       stop     unique
         1          1          1

Tuesday, March 3 2009

Patching the R profiler so that it shows loops

The R profiler

The R profiler is an amazing way to find where in your code (or someone else's code) lies some inefficiency. For example, the profiler helped in this challenge on the R wiki. See also the profiling section on the http://cran.r-project.org/doc/manuals/R-exts.html#Profiling-R-code-for-speed document

What is wrong with it

The profiler, however, is not able to trace uses of loops (for, while or repeat), and consequently will not identify loops as the cause of ineffiency of the code. This is a shame, because loops in R are usually closely related to inefficiency. For example, if we profile this code:

1 Rprof( )
2 x <- numeric( )
3     for( i in 1:10000){
4       x <- c( x, rnorm(10) )
5     }
6 Rprof( NULL )
7 print( summaryRprof( ) )
8 
we get :
$ time Rscript script1.R 
$by.self
        self.time self.pct total.time total.pct
"rnorm"      0.16      100       0.16       100

$by.total
        total.time total.pct self.time self.pct
"rnorm"       0.16       100      0.16      100

$sampling.time
[1] 0.16


real	0m7.043s
user	0m5.170s
sys	0m0.675s

So the profiler only reports about 0.22 seconds, when the actual time taken is more about 5 seconds. We can show that by wrapping the entire for loop in a function:

   1 Rprof( )
   2 ffor <- function(){
   3     x <- numeric( )
   4     for( i in 1:10000){
   5       x <- c( x, rnorm(10) )
   6     }
   7 } 
   8 ffor()
   9 Rprof( NULL )
  10 print( summaryRprof( ) )
  11 

which gives this :

$ time Rscript script2.R 
$by.self
        self.time self.pct total.time total.pct
"ffor"       5.14     96.3       5.34     100.0
"rnorm"      0.20      3.7       0.20       3.7

$by.total
        total.time total.pct self.time self.pct
"ffor"        5.34     100.0      5.14     96.3
"rnorm"       0.20       3.7      0.20      3.7

$sampling.time
[1] 5.34


real	0m6.434s
user	0m5.151s
sys	0m0.698s

The ffor function takes 100 pourcent of the times, and rnorm takes only 3.7 percent of the time, instead of 100 percent, which would be the conclusion of the first example.

But in real life, it is not possible to wrap every loop in a function as this will massively break a lot of code. Instead, we could make the profiler aware of loops. This is the purpose of the patch I posted to R-devel

The details of the implementation

The patch actually only takes place in (several places of) the file eval.c

In the do_for function, a context is created for the "for" loop, using the begincontext function:

1033     begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
1034          mkChar("[for]") );

The change here appears on the second line and simply adds a bit of information to the context that is created, similar changes are also made on the functions do_repeat and do_while.

Next, we need to grab this information at each tick of the profiler, which is the job of the doprof function:

168     if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
169         && TYPEOF(cptr->call) == LANGSXP) {
170         SEXP fun = CAR(cptr->call);
171         if (!newline) newline = 1;
172         fprintf(R_ProfileOutfile, "\"%s\" ",
173             TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
174             "<Anonymous>");
175     } else if( (cptr->callflag & CTXT_LOOP) ){
176       if (!newline) newline = 1;
177       fprintf(R_ProfileOutfile, "\"%s\" ", CHAR(cptr->callfun) );  
178     }

The else branch will be executed when the context is a a loop context, and we just retrieve the callfun string we created in the do_for function.

Now, with this R patched, and compiled, Rprof is able to record loops

[]$ /home/romain/workspace/R-trunk/bin/Rscript script1.R
$by.self
        self.time self.pct total.time total.pct
"[for]"      5.28     97.4       5.42     100.0
"rnorm"      0.14      2.6       0.14       2.6

$by.total
        total.time total.pct self.time self.pct
"[for]"       5.42     100.0      5.28     97.4
"rnorm"       0.14       2.6      0.14      2.6

$sampling.time
[1] 5.42

[]$ head Rprof.out 
sample.interval=20000
"[for]" 
"[for]" 
"[for]" 
"rnorm" "[for]" 
"[for]" 
"[for]" 
"[for]" 
"[for]" 
"[for]" 

Friday, February 27 2009

Abstract submitted to useR! 2009

I've just submitted an abstract about the power editor at useR! 2009. The abstract is co-signed by Karim Chine (the author of biocep) who provided a lot of support during the development of the plugin and also written parts of it.

Last day to submit abstract to useR!

Today is the last day to submit an abstract to the next useR! conference and also to register online with the cheap rate.

Tuesday, February 24 2009

Mode specific perspectives for the biocep workbench

Following on this previous post, here is how to set up jedit in the power editor to use mode specific perspectives, so that when you leave a file of a given mode (say R), the current perspective is saved, and when you load a file of a given mode (say sweave), the recorded perspective is used (if it exists)

You first need to start the workbench, with a recent version of the power editor plugin (svn revision >220)

startup.png

Then load the editor plugin. Plugin > Editor > Power Editor

powereditorloaded.png

Finally, you need to tell jedit that you want it to manage saving and loading perspectives automatically based on the mode of the file being edited. You can do that using the jedit menu jEdit > Utilities > Global Options, ... The following dialog is displayed, click the two checkboxes on top.

dockingdialog.png

That's it. You can also save a perspective by selecting jEdit > View > Docking > Save Docking Layout ...

Sunday, February 22 2009

Perspectives for the biocep workbench

The need for perspectives

The virtual R Workbench of the biocep project uses Info Node as a flexible docking framework, giving the possibility the move parts of the user interface (called views) anywhere. However, views have be to moved manually each time to reconstruct the layout you are using, which to me is one major usability misfeature of the workbench. At the moment, workbench plugins add views by calling the createView method of the RGui interface, which adds the requested view next to the Working Directory view of the workbench.

As an example, if I add the view supplied by this tutorial, it appears as this:

createView.png

and unless your plugin uses tricks, that is the only place where the view can be added, and you have to move it around to compose the layout you want to use. This is fine if your plugin only defines one view, but if you define many views (such as the power editor plugin), then asking the user to arrange the views each time is not so great.

Typically, programs using flexible docking frameworks (most notably eclipse) also use perspectives as a way to save and load the layout of the user interface.

Perspectives for the power editor

I have added support for perspectives in the power editor so that the many views supplied by the power editor are arranged in a useful way when the plugin is started, which can be configured by the user. Here is the default layout:

perspectives.png

That is stored in this xml file in the pesrepectives directory of the Editor plugin

   1 <RootWindow>
   2   <WindowBar direction="Up" />
   3   <WindowBar direction="Right" />
   4   <WindowBar direction="Down" />
   5   <WindowBar direction="Left" />
   6   <SplitWindow horizontal="true" dividerLocation="0.2">
   7     <SplitWindow horizontal="false" dividerLocation="0.5407166">
   8       <TabWindow selected="0" direction="Up" tabAreaOrientation="Left">
   9         <JEditDockableView name="vfs.browser" title="File Browser" />
  10         <JEditDockableView name="projectviewer" title="Project Viewer" />
  11       </TabWindow>
  12       <TabWindow selected="0" direction="Up" tabAreaOrientation="Left">
  13         <JEditDockableView name="robjectexplorer" title="R Objects Explorer" />
  14       </TabWindow>
  15     </SplitWindow>
  16     <SplitWindow horizontal="true" dividerLocation="0.75">
  17       <SplitWindow horizontal="false" dividerLocation="0.7654723">
  18         <TabWindow selected="0" direction="Right" tabAreaOrientation="Up">
  19           <JEditView />
  20           <View title="R Console" urls="/opt/biocep/biocep.jar" class="org.kchine.rpf.gui.ConsolePanel" />
  21           <View title="Main Graphic Device" class="javax.swing.JPanel" />
  22         </TabWindow>
  23         <TabWindow selected="0" direction="Right" tabAreaOrientation="Down">
  24           <JEditDockableView name="console" title="Console" />
  25           <View title="Working Directory" class="javax.swing.JPanel" />
  26         </TabWindow>
  27       </SplitWindow>
  28       <SplitWindow horizontal="false" dividerLocation="0.66">
  29         <SplitWindow horizontal="false" dividerLocation="0.5124378">
  30           <TabWindow selected="0" direction="Down" tabAreaOrientation="Right">
  31             <JEditDockableView name="sidekick-tree" title="Sidekick" />
  32           </TabWindow>
  33           <TabWindow selected="0" direction="Down" tabAreaOrientation="Right">
  34             <JEditDockableView name="hypersearch-results" title="HyperSearch Results" />
  35           </TabWindow>
  36         </SplitWindow>
  37         <TabWindow selected="0" direction="Down" tabAreaOrientation="Right">
  38           <JEditDockableView name="error-list" title="Error List" />
  39         </TabWindow>
  40       </SplitWindow>
  41     </SplitWindow>
  42   </SplitWindow>
  43 </RootWindow>

Implementation of perspectives in the power editor

I have recently added support for perspectives in the power editor (this feature should really belong to the biocep project itself, but is not high priority to the author at the moment) by using XML. For example, the default layout that appears when the workbench is started

default.png

can be represented by this perspective

   1 <RootWindow>
   2   <SplitWindow horizontal="true" dividerLocation="0.2">
   3     <View title="R Console" 
   4       urls="/opt/biocep/biocep.jar" 
   5       class="org.kchine.rpf.gui.ConsolePanel" />
   6     <SplitWindow horizontal="false" dividerLocation="0.5407166">
   7       <View title="Main Graphic Device" class="javax.swing.JPanel" />
   8       <TabWindow selected="0" direction="Up" tabAreaOrientation="Up">
   9             <View title="Working Directory" class="javax.swing.JPanel" />
  10       </TabWindow>
  11     </SplitWindow>
  12   </SplitWindow>  
  13 </RootWindow>   
  14 
  15           

Basically, each node of the XML represents one info node window below the root window. Info Node defines a class for each type of window, see the api for the net.infonode.docking package. Theses are of particular interest:

  • RootWindow: top level container for docking windows
  • SplitWindow: A window with a split pane that contains two child windows
  • TabWindow: A docking window containing a tabbed panel
  • View: A docking window containing a component

The implementation of perspectives that is available in the Power Editor plugin relies on two classes for each docking window class, a class that is responsible for exporting the view to an XML node (called FooExporter if the docking window class is Foo), and a class that is responsible for reading the XML representation and recreate the window from it.

The implementation provides importer and exporter classes for all Info Nodes docking window classes, most importantly classes ViewExporter and ViewImporter that dumps the sufficient information about the view into XML and reads this information to recreate the view. Plugins are encouraged to create classes that inherit from View, say MyPluginView and create the classes MyPluginViewExporter and MyPluginViewImporter to handle the information that this view need to store sufficient information as attributes of the <MyPluginView> node

The exporter is the easiest to implement, all the MyPluginView class needs to do is extend the DefaultExporter class and use the setAttribute method somewhere in the constructor. As an example, the constructor for the JEditDockableViewExporter looks like this:

13   /**
14    * Constructor for the ViewExporter
15    *
16    * @param window The View to stream to XML
17    */
18   public JEditDockableViewExporter( JEditDockableView window){
19     super( window ) ;
20     setAttribute( "name", window.getName() ) ;
21     setAttribute( "title", window.getTitle() ) ;
22   }           
23   

... so that the <JEditDockableView> node will have the attributes title and name

To implement custom importers, the easiest way is to extend the ViewImporter class and define the newView method that takes no parameter and builds the view from the attributes, the DefaultImporter defines the getAttribute method that can be used to retrieve an attribute from the xml node. See the implementation of newView for the JEditDockableView class below:

28   /** 
29    * Creates the JEditDockableView
30    *
31    * @return the JEditDockableView for the &lt;JEditDockableView&gt; node
32    */      
33   @Override
34   public View newView( ) throws Exception {
35     return new JEditDockableView( name ) ;
36   }

Future

At the moment, this feature is implemented in the Power editor only, and even if it could be used outside of it by using some tricks, the best course of action is probably to add the feature into the biocep workbench itself so that all plugins can take advantage of it and potentially arrange views from other plugins as well. We can also imagine these features

  • user specific perspectives
  • restore the perspective of the previous session
  • plugin specific perspectives

Sunday, February 8 2009

Playing with QtJambi and Jedit

Qt Jambi

I've been looking at excuses to learn Qt for some time now, but could not really justify to myself going back to C++, but now with jambi, you can write Qt programs in java. More importantly, with the Qt Jambi to Awt bridge, you can melt swing components in Qt windows and Qt widgets in swing components. Here is a picture of some swing components in a QWidget

awtinqt.png

and another one with some Qt components in a swing frame

qtinawt.png See this code snippet for how it is easy to do:
12         QGridLayout layout = new QGridLayout(this);
13 
14         // A few Qt widgets
15         layout.addWidget(new QLabel("First name:"), 0, 0);
16         layout.addWidget(new QLineEdit(), 0, 1);
17         layout.addWidget(new QLabel("Last name:"), 1, 0);
18         layout.addWidget(new QLineEdit(), 1, 1);
19 
20         // The AWT part of the GUI
21         {
22             JPanel panel = new JPanel();
23 
24             panel.setLayout(new GridLayout(1, 2));
25 
26             panel.add(new JLabel("Social security number:"));
27             panel.add(new JTextField());
28 
29             // Add the AWT panel to Qt's layout
30             layout.addWidget(new QComponentHost(panel), 2, 0, 1, 2);
31         }
32 

beyond hello world

So I wanted to go beyond the hello world level, and try to integrate jedit in a Qt window. If it works, this could lead to interesting things such as distributing jedit dockable windows through Qt system QtDockWidget which should be easy based on the new abstract docking window manager service in jedit, or using Qt widgets to extend jedit, ...

I managed to embed jedit in a Qt window, although I had to trick jedit to not build a JFrame when a view is created, I've used the same trick as in biocep workbench, which is writing a small patch to the jEdit class so that the view (which is a JFrame) is never set to visible, and its content pane borrowed by some other component, in that case, a Qt component. Here is how everything looks like:

$ tree
.
|-- build.properties
|-- build.xml
|-- jambidocking
|   |-- data
|   |   |-- JambiDockingPlugin.props
|   |   |-- actions.xml
|   |   `-- services.xml
|   `-- src
|       `-- JambiDocking
|           |-- JambiDockingDockingLayout.java
|           |-- JambiDockingWindowManager.java
|           |-- Plugin.java
|           `-- Provider.java
|-- src
|   |-- com
|   |   `-- addictedtor
|   |       `-- jambijedit
|   |           `-- JambiJedit.java
|   `-- org
|       `-- gjt
|           `-- sp
|               `-- jedit
|                   `-- jEdit.java
`-- src_qtjambiawtbridge
    `-- com
        `-- trolltech
            `-- research
                `-- qtjambiawtbridge
                    |-- QComponentHost.java
                    |-- QWidgetHost.java
                    |-- QWidgetWrapper.java
                    |-- QtJambiAwtBridge.java
                    |-- RedirectContainer.java
                    |-- examples
                    |   |-- AwtInQt.java
                    |   `-- QtInAwt.java
                    `-- generated
                        |-- QComponentHostNative.java
                        |-- QWidgetHostNative.java
                        `-- QtJambi_LibraryInitializer.java

19 directories, 21 files

Apart from the code of the Qt Jambi to Awt bridge, there is the patched jEdit.java, the JambiJedit.java file which basically creates a Qt main window and sets jedit as its central widget, and the jambidocking directory which contains the start of an implementation of jedit's shiny new DockableWindowManager system (more on that later)

jeditinjambi.png

The good news is that it works, the bad news is that it sort of works

Bad things start to happen when I tried to implement the DockableWindowManager system, here is the kind of messages I get, I suppose the issue is that jedit uses threading quite a lot and Qt is not happy about it

     [java] Exception in thread "main" 7:25:23 PM [main] [error] main: QObject used from outside its own thread, object=com::trolltech::research::qtjambiawtbridge::QComponentHost(0xa305370) , objectThread=Thread[AWT-EventQueue-0,6,main], currentThread=Thread[main,5,main]
     [java] 7:25:23 PM [main] [error] main:  at com.trolltech.qt.GeneratorUtilities.threadCheck(GeneratorUtilities.java:56)
     [java] 7:25:23 PM [main] [error] main:  at com.trolltech.research.qtjambiawtbridge.generated.QComponentHostNative.event(QComponentHostNative.java:37)
     [java] 7:25:23 PM [main] [error] main:  at com.trolltech.research.qtjambiawtbridge.QComponentHost.event(QComponentHost.java:35)
     [java] 7:25:23 PM [main] [error] main:  at com.trolltech.qt.gui.QApplication.exec(Native Method)
     [java] 7:25:23 PM [main] [error] main:  at com.addictedtor.jambijedit.JambiJedit.main(Unknown Source)
     [java] QPixmap: It is not safe to use pixmaps outside the GUI thread
     [java] QPixmap: It is not safe to use pixmaps outside the GUI thread
     [java] QPixmap: It is not safe to use pixmaps outside the GUI thread
     [java] QPixmap: It is not safe to use pixmaps outside the GUI thread
     [java] QPixmap: It is not safe to use pixmaps outside the GUI thread
     [java] QPixmap: It is not safe to use pixmaps outside the GUI thread

Anyway, I zipped it up here in case someone else wants to have a go. It is not quite there yet but at least now I have my excuse to learn Qt, which was the original point ...

Saturday, February 7 2009

a "hello world" miner for the biocep workbench

JGraph X is the next generation of Java Swing Diagramming Library, factoring in 7 years of architectural improvements into a clean, concise design... See the rest of the quote in jgraphx homepage

This is an example on using jgraphx as a plugin to the biocep workbench, we are just going to integrate the hello world example from jgraphx as a view of the workbench. I am hoping people will find this useful and more ideas will come later. Here is a screenshot:

Screenshot.png

As you might guess from the screenshot, this is not really useful and does not do anything apart from being able to move things around. jgraphx has more examples, and apparently you can use any swing component as a renderer to a graph vertex, so there is no limit to what can be achieved ...

The project looks like this:

.
|-- build.properties
|-- build.xml
|-- descriptor.xml
|-- lib
|   |-- dt.jar
|   |-- jaxx-runtime.jar
|   |-- jaxx-swing.jar
|   |-- jaxxc.jar
|   `-- jgraphx.jar
`-- src
    `-- com
        `-- addictedtor
            `-- workbench
                `-- plugin
                    `-- jgraphx
                        `-- HelloWorld.java

7 directories, 9 files

and the build.* files looks pretty much the same as in this previous tutorial, except this time we won't use jaxx for the user interface because the swing is not a pain when it comes to hello world. The HelloWorld.java file looks like this:

.
   1 package com.addictedtor.workbench.plugin.jgraphx ;
   2 
   3 import com.mxgraph.swing.mxGraphComponent;
   4 import com.mxgraph.view.mxGraph;
   5 
   6 import org.kchine.r.workbench.RGui;
   7 import java.awt.BorderLayout ;
   8 import javax.swing.JPanel ;
   9 
  10 public class HelloWorld extends JPanel {
  11   
  12   private RGui rgui ;
  13   
  14   public HelloWorld(RGui rgui){
  15     super( new BorderLayout() ) ;
  16     this.rgui = rgui ;
  17     
  18     mxGraph graph = new mxGraph();
  19     Object parent = graph.getDefaultParent();
  20 
  21     graph.getModel().beginUpdate();
  22     try {
  23        Object v1 = graph.insertVertex(parent, null, "Hello", 
  24          20, 20, 80, 30);
  25        Object v2 = graph.insertVertex(parent, null, "World!",
  26          240, 150, 80, 30);
  27        graph.insertEdge(parent, null, "Edge", v1, v2);
  28     } finally {
  29        graph.getModel().endUpdate();
  30     }
  31     
  32     add(new mxGraphComponent(graph), BorderLayout.CENTER );
  33   }
  34 
  35 }

Here is the source of the plugin and a zip you can deploy in your RWorkbench directory to start dragging around.

Nested 0.1 on jedit plugin central

Nested has been released on jedit plugin central, you can now install it via jedit's plugin manager

Screenshot-1.png

In short, nested is a jedit plugin that lets you see when you edit files with nested languages, such as for example XML inside R (see this post to find out what the code is about) :

Screenshot.png

Note, you need to install my R edit mode for jedit to recognize the xml within R.

21     <!-- deal with R4X inline XML -->
22     <SPAN DELEGATE="xml::MAIN" >
23       <BEGIN>'##((xml</BEGIN>
24       <END>'##xml))</END>
25     </SPAN> 
26  

Friday, February 6 2009

Tag cloud for the R Graph Gallery

This post has a following goals: announcing the graph gallery has gained a tag cloud, and showing how it is done.

Screenshot.png

The cloud is a simple tag cloud of the words in titles of graphics that are included in the gallery. For this purpose, I am using an XML dump of the main table of the gallery database, here is for example the information for graph 12.

226     <graph>
227         <id>12</id>
228         <titre>Conditionning plots</titre>
229         <titre_fr>graphique conditionnel</titre_fr>
230         <comments>Conditioning plots</comments>
231         <comments_fr>graphique conditionnel</comments_fr>
232         <demo>graphics</demo>
233         <notemoy>0.56769596199524</notemoy>
234         <nbNote>421</nbNote>
235         <nbKeywords>0</nbKeywords>
236         <boolForum>0</boolForum>
237         <px_w>500</px_w>
238         <px_h>400</px_h>
239     </graph>
240     <graph>
We are interested in the tag titre of each tag graph. That is something straightforward to get with the R4X package (I will do a post specifically on R4X soon).
   1 x <- xmlTreeParse( "/tmp/rgraphgallery.xml" )$doc$children[[1]]
   2 titles <- x["graph/titre/#"] 
Next, we want to extract words of the titles, we need to be careful about removing &br; tags that appear in some of the titles and also remove any character that is not a letter or a space, and then seperate by spaces. For that, we will use the operators package like this :
4 words <- gsub( "<br>", " ", titles ) 
5 words <- words %-~% "[^[:alpha:][:space:]]" %/~% "[[:space:]]"
Next, we convert eveything to lower case, and extract the 100 most used words:
7 words <- casefold( words )
8 w100 <- tail( sort( table( words ) ), 100 )
9 
and finally generate the (fairly simple) html code:
10 w100 <- w100[ order( names( w100 ) ) ]
11 html <- sprintf( '
12 <a href="search.php?engine=RGG&q=%s">
13     <span style="font-size:%dpt">%s</span>
14 </a>
15 ', 
16     names(w100), 
17     round( 20*log(w100, base = 5) ), 
18     names(w100) )
19 cat( html, file = "cloud.html"  )
20 
and that's it. You can see it on the gallery frontpage Here is the full script:
   1 ### read the xml dump
   2 x <- xmlTreeParse( "rgraphgallery.xml" )$doc$children[[1]]
   3 
   4 ### extract the titles
   5 titles <- x["graph/titre/#"] 
   6 
   7 ### clean them up
   8 words <- gsub( "<br>", " ", titles ) 
   9 words <- words %-~% "[^[:alpha:][:space:]]" %/~% "[[:space:]]"
  10 
  11 ### get the 100 most used words
  12 words <- casefold( words )
  13 w100 <- tail( sort( table( words ) ), 100 )
  14 w100 <- w100[ order( names( w100 ) ) ]
  15 
  16 ### generate the html using sprintf
  17 html <- sprintf( '
  18 <a href="search.php?engine=RGG&q=%s">
  19     <span style="font-size:%dpt">%s</span>
  20 </a>
  21 ', 
  22     names(w100), 
  23     round( 20*log(w100, base = 5) ), 
  24     names(w100) )
  25 cat( html, file = "cloud.html"  )
  26 
  27 ### or using R4X again
  28 # - we need an enclosing tag for that
  29 # - note the &amp; instead of & to make the XML parser happy
  30 w <- names(w100)
  31 sizes <-  round( 20*log(w100, base = 5) )
  32 xhtml <- '##((xml
  33     <div id="cloud">
  34         <@i|100>
  35             <a href="search.php?q={ w[i] }&amp;engine=RGG">
  36                 <span style="font-size:{sizes[i]}pt" >{ w[i] }</span>
  37             </a>
  38         </@>
  39     </div>'##xml))
  40 html <- xml( xhtml )
  41 

Wednesday, February 4 2009

Graphic literacy improving? Let's try (RGG#150)

Here is a proposed alternative to this bubble inferno pointed out in the revolutions blog bubble.png ft.png and the R code behind it (here is the data). This is now item 150 in the graph gallery
   1 
   2 ### read the data
   3 d <- read.csv( "data.txt" )
   4 d$bank <- ordered( d$bank, levels = d$bank )
   5 
   6 ### load lattice and grid
   7 require( lattice )
   8 
   9 ### setup the key
  10 k <- simpleKey( c( "Q2 2007",  "January 20th 2009" ) )
  11 k$points$fill <- c("lightblue", "lightgreen")
  12 k$points$pch <- 21
  13 k$points$col <- "black"
  14 k$points$cex <- 1
  15 
  16 ### create the plot
  17 dotplot( bank ~ MV2007 + MV2009 , data = d, horiz = T, 
  18     par.settings = list( 
  19         superpose.symbol = list( 
  20             pch = 21, 
  21             fill = c( "lightblue", "lightgreen"), 
  22             cex = 4, 
  23             col = "black"  
  24         )
  25      ) , xlab = "Market value ($Bn)", key = k, 
  26      panel = function(x, y, ...){
  27        panel.dotplot( x, y, ... )
  28        grid.text( 
  29             unit( x, "native") , unit( y, "native") , 
  30             label = x, gp = gpar( cex = .7 ) )
  31      } ) 

Tutorial: A simple biocep plugin using JAXX

Background

Although not being documented yet, making plugins for the biocep workbench is easy (well if you are familiar with Swing). This tutorial presents the making of a really simple plugin, although not using swing directly for the user interface but using jaxx to generate the appropriate verbose swing code. JAXX is a way to make swing user interfaces using XML tags to describe the user interface structure. This article will get you started on the concept of jaxx.

The application

The application we are demonstrating here is really simple and may not be useful beyond getting started at making other plugins for biocep. It will add a single view into the workbench allowing to retrieve data from yahoo finance using the function get.hist.quote from the beginning of a year to today, and display the result in a graphical device. This will look something like that:

Screenshot.png

Structure of the plugin

A workbench plugin is more or less just any java class that takes an RGui interface and does something with it. Here is how I've structured the plugin so that I can build and install using ant.

$ tree
.
|-- build.properties
|-- build.xml
|-- descriptor.xml
|-- lib
|   |-- dt.jar
|   |-- jaxx-runtime.jar
|   |-- jaxx-swing.jar
|   `-- jaxxc.jar
`-- src
    `-- com
        `-- addictedtor
            `-- workbench
                `-- plugin
                    `-- simple
                        |-- SimplePlugin.jaxx
                        `-- SimplePlugin.jaxxscript
7 directories, 10 files

build.properties

The build.properties file contains some properties to describe to ant where the workbench is installed and where we should install the plugin. Here is how it looks on my system:

install.dir=/home/romain/RWorkbench/plugins
biocep.dir=/opt/biocep
biocep.jar=biocep.jar

The build.xml file

The build.xml is a standard ant build file with a set of targets to compile the plugin, create a zip for distribution and install the plugin in the installation directory (as indicated in the build.properties file). Let's take a look at the steps specifically involving jaxx. To compile jaxx files in ant, you need to define an additional ant task

24   <target name="defineAntTask">
25     <taskdef name="jaxxc" classname="jaxx.JaxxcAntTask" 
26       classpath="lib/jaxxc.jar"/>
27   </target>

and then use this task to compile the jaxx files in your source tree, here we are compiling the SimplePlugin.jaxx file. After that, classes have been generated and you may compile the java files using the standard javac ant task.

31   <target name="compile" depends="clean,defineAntTask">
32     <mkdir dir="build/classes"/>
33     <jaxxc srcdir="src" keepJavaFiles='yes' 
34       destdir="build/classes" classpath="${full.biocep.jar}" />
35     <javac srcdir="src" destdir="build/classes" source="1.5" target="1.5" > 
36       <classpath refid="simple.class.path"/>
37     </javac>
38   </target>

Finally, we can make the jar file. I usually prefer one jar file per biocep plugin, so I unjar the content of the jaxx runtime classes to jar it back into a single jar file

42   <target name="build" depends="compile">
43     <mkdir dir="build/lib"/>
44     <unjar src="lib/jaxx-runtime.jar" dest="build/classes" />
45     <jar jarfile="build/lib/simple.jar">
46       <fileset dir="build/classes" />
47       <fileset dir="src">
48         <include name="*.xml" />
49         <include name="**/*.props" />
50         <include name="**/*.properties" />
51         <include name="**/*.html" />
52         <include name="**/*.gif" />
53         <include name="**/*.png" />
54       </fileset>
55     </jar>  
56   </target>

The descriptor.xml file

The descriptor.xml file is used by the workbench to load the plugin, it identifies the plugin main class

   1 <plugin>  
   2   <view name="Simple Plugin" 
   3     class="com.addictedtor.workbench.plugin.simple.SimplePlugin" />
   4 </plugin>

We are pointing the workbench to the class com.addictedtor.workbench.plugin.simple.SimplePlugin and the workbench will instanciate one object of the class using the constructor that takes an RGui interface, through which we will communicate with R.

The SimplePlugin.* files

The SimplePlugin.jaxx file contains the description of the user interface using XML and the SimplePlugin.jaxxscript file contains java code to implement some of the logic of the application

If you are already familiar with Swing, it does not take too much effort to grab what is going on with this jaxx file

   1 <JPanel layout="{new BorderLayout()}">
   2     
   3     <script source="SimplePlugin.jaxxscript" />
   4     
   5     <JPanel constraints="BorderLayout.NORTH" id="toolbar">
   6         <JComboBox id ="instruments" >
   7             <item value='{null}' 
   8                 label='Select an instrument'/>
   9             <item value='Nasdaq'   />
  10             <item value='Dow'      />
  11             <item value='SP 500'   />
  12             <item value='CAC 40'   />
  13             <item value='FTSE 100' />
  14             <item value='DAX'      />
  15         </JComboBox>
  16         <JLabel text="start year :" />
  17         <JTextField id="startyear" text="2003" 
  18             onActionPerformed='go()' />
  19         <JButton id="submit" text="go" 
  20             onActionPerformed='go()' />
  21         
  22     </JPanel>
  23     
  24     <JScrollPane constraints="BorderLayout.CENTER">
  25         <org.kchine.r.workbench.views.PDFPanel id = "pdf" />
  26     </JScrollPane>
  27     
  28     <JPanel constraints="BorderLayout.SOUTH" id="statusbar">
  29       <JLabel id="info"  text = " " />
  30     </JPanel>
  31     
  32 </JPanel>
  33 

The SimplePlugin.jaxx file complements the jaxx code by implementing a set of java methods, including the constructor for the class which needs to take an RGui interface as its only parameter

 8 public SimplePlugin( RGui rgui){
 9     this.rgui = rgui ;
10     initMap( ) ;
11     loadRPackage( "tseries" ); 
12 }
13 

... a simple utility method to load an R package. We can see here of of the main design decision about the RGui interface, the instance of R that is running on the background may only do one thing at a time, which is why when you want to do something with it, you need to lock it, do whatever, and then unlock it

14 public void loadRPackage( String pack ){
15     try{
16         rgui.getRLock().lock() ;
17         rgui.getR().evaluate( "require( 'tseries' )" ) ;
18     } catch(Exception e) {
19         JOptionPane.showMessageDialog(this, 
20             "Please install the tseries package");
21     } finally{
22         rgui.getRLock().unlock() ;
23     }
24 }

Finally, the go function does the actual work of retrieving data from yahoo about the chosen instrument since the start of the chosen year, and then plot the result in the PDF panel

36 public void go( ){
37     Object instrument = instruments.getSelectedItem() ;
38     if( instrument == null ) {
39       JOptionPane.showMessageDialog(this, "Please select an instrument");
40       return ;
41     }
42   
43     int start = 2003 ; 
44     try{ 
45         start = Integer.parseInt( startyear.getText( ) ) ;
46     } catch( Exception e ){
47         JOptionPane.showMessageDialog(this, "Invalid start year");
48         return ;
49     }
50     String ins = (String)map.get(instrument) ; 
51     String cmd = 
52         "x = get.hist.quote(instrument = '^" +   ins +
53         "', start = '" +  
54         start +  "-01-01', quote = 'Close' )" ;
55     String plot = 
56         "plot( x, ylab = '"+ instrument + "' )" ;
57     try{
58       rgui.getRLock().lock( ) ;
59       rgui.getR().evaluate(cmd) ;
60       pdf.setPDFContent( rgui.getR().getPdf( plot , 800, 400) );
61     } catch(Exception e){
62       e.printStackTrace( ) ;
63     } finally{
64       rgui.getRLock().unlock( ) ;
65     }
66   
67 }
68 

Perspectives

Extend JAXX to make it more R-friendly

As opposed to other XML based user interfaces in java, Jaxx is not only restricted to swing tags and any other class might be included in a jaxx tree. Moreover, we can extend jaxx to define how to interpret a given tag, so with a bit of work we could embed R code in a gentle way into jaxx files, I am thinking something like that :

   1 <JButton>
   2     <action language="R">
  3     cat( "hello world" ) 
   4     </action>
   5 </JButton>
   6 

What about rgg

There also is the rgg package which has similar ideas except as far as I can see the nesting is done the other way, XML tags are embedded in R code within an rgg file.

   1 <rgg>
   2 file = <filechooser label="CSV" description="Csv Dateien" 
   3        extensions="csv" fileselection-mode="files-only"/>
   4 
   5 myIris = read.table(file, header=<checkbox label="header" span="2"/>, 
   6   sep=<combobox items="\t,;" label="Seperator" selected="T"/>)
   7 summary(myIris)
   8 </rgg>
   9 

the XML is processed and the script is transformed into an R script. One of the advantages of rgg though is that it defines a set of R related tags such as <matrix>

Files

Here is the source of the pluginplugin and the simple.zip which you can simply unzip into your RWorkbench/plugins directory.

Tuesday, February 3 2009

RGG#149: correlation ellipses

As suggested by Gregor Gorjanc, I've added the correlation ellipses graph from the plotcorr function of the ellipse package. graph_149.png

- page 6 of 7 -