Why write an R package?
As @drob said on twitter:
"I wish I'd left this code across scattered .R files instead of combining it into a package" said no one ever #rstat (19/07/2015)
And that is basically it! If you put your code into a package you can:
- share it easily,
- document it easily
- and when you come to use it again in the future you don't have to wade through tens of R files/folders trying to work out which one you were using.
- also if you put it on CRAN/github and write a paper about it, people will use it and you may be invited to work on exciting new projects with them!
There are a few tools (packages) which make writing your own R package (almost!) painless:
- devtools (R package for development)
- roxygen2 (R package for documentation)
- testthat (R package for testing)
- Rmarkdown (writing vignettes)
Package design is an art! The first thing to realise is that writing an R package is not just a case of sharing your code. You may have to think about how you want to structure your code. And it is easier to do any code rearrangement before you start the packaging (in my experience anyway)
- Begin by putting all your functions in separate files where the filename is the function name (.R)
- Maybe you have a script which calls all these functions - can this be broken down into smaller functions?
- When you make a package you are basically sharing a lot of functions - think about how the user will want to use them
- Do you have input files? These may be a list of parameter values, or maybe a csv file.
- Think about what needs to be saved as R objects and loaded with the package
- Think about what access the user will need to different objects
- Think about which functions the user needs access to (these are marked for @export) and which functions will be intrinsic
- When I release a complex mathematical model I need to allow the user to insert their own functions in some places so I put the functions in a list which the user can edit.
- I expect different people will have totally types of package -- I am used to making process-based models into packages (basically one function which calls tens of other functions) but maybe you have lots of functions that are used independently
Documentation is easy using the Roxygen2 package. Basically just add in some comments at the start of each function and these will be used to generate the help() text for that function.
One thing to bear in mind is if you use @examples these are executed when you run the CRAN checks and if this produces a plot, it can cause spurious files to be generated which CRAN will not like!
Testing using testthat
- Units tests allow you to test one specific piece of code (e.g. a function)
- Sometimes it's really hard to test a function.
- Is your function is too unweidly? Break it down/refactor?
- Writing tests can really help improve your code
- Often you need to run some code before you can use the function.
- If this is true for many functions then I write a script that runs the model for different scenarios then I make tests for these different scenarios (use flags to avoid re-running code)
- When you do code updates running all your unit tests gives you peace of mind you haven't broken your code
Possibly the thing that takes the longest. But you will thank your past self so much!
R error messages can be cryptic when they are reporting an error in your package so take some time to write an informative error message
Informative error messages will save you hours in the future!
You will need to add a vignette to your package - this is basically a 'Getting Started' type doc. Most sources (e.g. hadley wickham) tell you to write it in R markdown which compiles (knits) when you build your package.
I personally do not recommend doing this unless your code runs extremely fast but even then I have had problems which I have not managed to resolve.
You can avoid this whole headache by creating your vignette separately and then putting the final pdf in the package (for completeness you can add the Rmd file to the package in a different folder).
Having written 4 R packages in the last year I have realized the following...
1/ CRAN checks have got a LOT stricter!
2/ A good compromise is to build a package and put it on github/lab - people can then install it from there if they want it.
3/ If you are aiming to release your package to CRAN you need to specify all the packages you use in your code (even if they are included in base R).
e.g. if you use plot() you have to make it clear it is from the graphics package, similarly, approx() is from the stats package...
4/ The devtools package is good but not perfect! Often it is better to do a command line build.
R packages are great for sharing and bundling up your final code - a great output from a project.
There is a lot of help online and from purpose built packages to help you.