Code Snippet : List of CRAN packages
By romain francois on Wednesday, August 5 2009, 15:15 - Snippet - Permalink
This is a really simple code snippet that shows how to get the list of CRAN packages and their titles from the html page html page (toulouse mirror in this example).
...
Note that R has the available.packages function, but it does not give the titles of the packages
Comments
Thanks for this! Quite a neat trick
Unfortunately, the script didn't run so smoothly on Mac OS for the following reasons:
1. index <- sub( "src/contrib", "web/packages/index.html", repo ) didn't work because the returned value for contrib.url(getOption("repos")) seems to be different on Mac OS.
2. data <- sub( '<.*?>', '', data ) replaced everything between the first "<" and last ">" instead of the first "<" and the next ">". In turn, the description of the package got erased.
The following fixed script worked fine on my machine:
repo <- contrib.url(getOption("repos"))
index <- gsub( "bin.*", "web/packages/index.html", repo)
html <- readLines( index )
html <- grep( "./../web/packages/", html, value = TRUE )
data <- sub( '^.*index.html">(.*?)(.*?)$', "\\1 @@ \\2", html, perl = TRUE )
data <- gsub( '<^>+>', ' ', data )
data <- trim(gsub( '@@', "", data))
packages <- do.call( rbind, strsplit( data, " " ) )
head( packages, 20 )
Joys of pasting to HTML! Spotted two errors in the copied script above. Also, I modified the statement to create the URL to access html page.
Fixed script below:
index = paste(getOption("repos"), "/web/packages/index.html", sep = "")
html <- readLines( index )
html <- grep( "./../web/packages/", html, value = TRUE )
data <- sub( '^.*index.html">(.*?)(.*?)$', "\\1 @@ \\2", html, perl = TRUE )
data <- gsub( '<[^>]+>', ' ', data )
data <- trim(gsub( '@@', "", data))
packages <- do.call( rbind, strsplit( data, " {3,3}" ) )
head( packages, 20 )
Cool. Thanks.
I should have used perl = TRUE in the sub call to fix the problem #2.