R related Makefile definitions

Data analysis can involve many steps including reading data; cleaning and transforming data; plotting data, statistical analysis and finally writing reports. While we can try and keep track of each step manually by using good documentation and being highly organised, it can prove to be more efficient to employ computer tools to augment these practices. One such approach is to use a tool like GNU Make. Such an approach does not obviate the need to be organised and document the work but it can certainly prove helpful, especially as a project grows in size. While it is far from perfect, make is widely used in software development and also proves to be useful for efficiently carrying out tasks in data analysis. Unfortunately, make does not provide standard rules for producing .Rout files from .R files, .pdf files from .Rnw files, .docx files from .Rmd files and so on. It is straight forward to define a pattern rule to output a .Rout file from a .R syntax file by including the following two lines in a Makefile
%.Rout: %.R <TAB> R CMD BATCH --vanilla $<
which runs the command R CMD BATCH –vanilla to produce the output file. The left hand side of the colon (:) is the target which depends on the prerequisite file(s) to the right of the colon. Here, % is a wildcard. So, for any .R syntax file, say mySyntax.R, you can then use ‘make mySyntax.Rout’ to produce the .Rout output file noting that nothing happens if the target is newer than the prerequisite since it is already ‘up to date’. To actually use this rule in practice, we may have several prerequisite files like an R syntax file and several data files. In the Makefile we may specify the dependencies as
readData.Rout: readData.R data1.csv data2.csv oldData.RData
so we can run the syntax file by typing ‘make readData.Rout’ at the command prompt. If any of the files readData.R, data1.csv, data2.csv or oldData.RData have changed recently, and so are newer than the target file readData.Rout, then the predefined R batch command is run to get a new output file, otherwise readData.Rout is ‘up to date’. Similar rules can be set up for producing reports from markdown or sweave files. The file common.mk contains many such rules and can be included in a standard Makefile to facilitate a more efficient workflow. You can obtain common.mk at github https://github.com/petebaker/r-makefile-definitions
Using common.mk
- Download the file to a directory you commonly use to store functions and definitions. Ideally, this would something like:
- ~/lib or C:\MyLibrary
- put the following line in yourMakefile
- include ~/lib/common.mk where ~ will be expanded to be your HOME directory, or
- include C:/MyLibrary/common.mk (in windows)
Example Makefiles
.PHONY : all all: test.pdf test.html test.docx test2.pdf test-stitch.Rout test-stitch.pdf ## produce pdf, html, docx from test.Rmd test.pdf: ${@:.pdf=.Rmd} test.html: ${@:.html=.Rmd} test.docx: ${@:.docx=.Rmd} ## produce pdf from test2.rmd test2.pdf: ${@:.pdf=.rmd} ## use stitch to produce pdf via rmarkdown (exactly as in RStudio) test-stitch.pdf: ${@:.pdf=.R} ## if you have common.mk in ~/lib directory comment line below ## and uncomment the second line include common.mk ##include ~/lib/common.mk
More usually, if there is a sequence of steps relying on a data file, say myData.csv then your Makefile may look something like
.PHONY: all all: report.pdf ## produce report from .Rmd once previous steps carried out report.pdf: ${@:.pdf=.Rmd} summaryAndPlots.Rout ## summarise data summaryAndPlots.Rout: ${@:.Rout=.R} read.Rout ## read data read.Rout: ${@:.Rout=.R} myData.csv include ~/lib/common.mk
Prerequisites
- GNU Make http://www.gnu.org/software/make/
- R http://www.r-project.org/
- latexmk http://http://www.ctan.org/pkg/latexmk/
- R packages on CRAN: rmarkdown, knitr
Note that Windows users can install Rtools (available from CRAN) to get a working version of make and may also need to install pandoc and latex to produce pdf files if they haven’t already. Miktex is recommended although texlive will also work well.