celebrating R commit #50000
By romain francois on Friday, October 9 2009, 11:38 - Permalink
Today, Brian Ripley commited the revision 50 000 into R svn repository.
------------------------------------------------------------------------ r50000 | ripley | 2009-10-09 10:34:17 +0200 (Fri, 09 Oct 2009) | 1 line Changed paths: M /branches/R-2-10-branch/src/library/stats/R/plot.lm.R port r49999 from trunk ------------------------------------------------------------------------ r49999 | ripley | 2009-10-09 10:33:28 +0200 (Fri, 09 Oct 2009) | 2 lines Changed paths: M /trunk/src/library/stats/R/plot.lm.R workaround for PR#13899 (that in the report is broken and fails make check!)
so it is time to celebrate and have some fun with the svn log to analyze the 50 000 commits ... with R of course.
data extraction
First we need to grab the full svn log, using command line svn, something like this:
$ svn log -v https://svn.r-project.org/R > rsvn.log
... or you can download it from my website if you don't have svn on your machine
now we need to read the data into R :
we might also be interested in release date, version number and size of the distribution of each R release that is archived on CRAN, which we can get like this :
graphics
now we can do some graphics. I'm using lattice here because I am familiar with it, but I'm sure interesting plots could be done using ggplot2, in fact checkout this post from Yihui Xie using ggplot2
First I need to define some helper panel functions I'll use in the plots below
Number of commits per day

... split by author

The number of commits per month

... split by author

blogroll
- Analyzing R's rate of change on revolution's blog
- 50000 Revisions Committed to R on Yihui Xie's blog
Comments
Good job!
A \Huge{Thank you} to the R core team for their great efforts in developing and improving R!
Great post and analysis.
Interesting to speculate why the activity in R-core has been declining. For sure it is compensated by the increase in activity in the contributed packages.
Romain,,
Nice work, but you need to fold 'martyn' and 'plummer' and 'thomas' and 'lumley' into one each.
Dirk
Thanks.
I have also folded together "paul" and "murrell" as a unique person. Not quite sure who "r" is, and also who "mike" is (with one commit)
I am not sure we can conclude to a decline of activity, the number of commits may not be the best metric, maybe should be the number of files modified in each commit, or the number of lines ...
romain
Thanks to Duncan Murdoch who pointed me that the smooth lines did not take into account days with 0 commits. This had the effect of drawing the smooth lines too high.
I've also updated so that the top axis shows the major R releases
This uses grepl, which is available in R 2.9.2 or later.
A substitute is grepl2 found at:
http://scsys.co.uk:8002/35082
grepl2<-function(pattern,x) {
matches<-grep(pattern,x)
ret<-rep(F, length(x))
ret[matches]<-T
return(ret)
}
grepl2("abdz", letters)
Found with the kind help of mrflick of #R irc.