Abstract
This is a template mainly designed for data science lab projects. In this template, we review most common components of a single R Markdown document with the power of the bookdown package and demonstrate their basic usage through examples.This document is designed as a template for data science lab projects. However, it can also be used as a general template in R Markdown for a single document.
The benefits of setting up a template in R Markdown are its simple syntax and flexible output format with the help of pandoc. In addition, it is in favor of reproducible studies, which have been receiving increasing attention in modern research.
Cross-reference of mathematical equations, tables, and figures used to be a challenge when using R markdown. Usually extra packages, such as kfigr (Koohafkan 2015), and extra efforts were needed for automatic and satisfactory cross-referencing. Fortunately, the arrival of the package bookdown (Xie 2017) provides a much easier and more consistent syntax for cross-referencing.
Instead of providing a minimal but non-informative template framework, we review most of the basic syntax of writing a single R Markdown document With the power of the bookdown with examples. However, this is not intended as a tutorial of R Markdown or the bookdown. Readers are encouraged to skim the PDF or HTML output, and have a closer look at the source document of this template directly.
The rest of this project template is organized as follows: In Section 2 and Section 3, we present examples of writing mathematical equations, and mathematical environments, such as theorem, lemma, and definition, etc., respectively. Some examples for reproducing figures and including existing figures are given in Section 4. The generation of tables and other R objects is discussed in Section 5. A brief demonstration of a code chunk is given in Section 6. Several example HTML widgets and Shiny applications are given in Section 7 and Section 8, respectively. At last but not least, in Section 9, we point readers to some external resources for further reading and more advanced usage of bookdown.
Inline math expressions are quoted by $
in the source document, which is
consistent with the syntax of LaTeX. For instance, \(x_i^2\), \(\sin(x)\), and
\(\theta\) are inline expressions. The equations can be simply quoted by $$
if
no cross-reference is needed, where regular LaTeX commands under the math
environment can be used. For equations that need cross-referencing,
LaTeX environments for mathematical equations, such as equation
or align
,
can be used directly. For example, Equation (2.1) is the well-known
Euler’s identity.
\[\begin{align}
e^{i\theta} = \cos(\theta) + i \sin(\theta).
\tag{2.1}
\end{align}\]
A mathematical theorem can be put inside a theorem
chunk followed by its
label. For example, the Central Limit Theorem (CLT) is presented in Theorem
3.1.
Similarly, a lemma can be put inside a lemma
chunk. For instance, the First
Borel-Cantelli Lemma is given in Lemma 3.1.
All the available theorem environments and their label prefix designed for cross-referencing are summarized in Table 3.1.
Environment | Printed Name | Label Prefix |
---|---|---|
theorem | Theorem | thm |
lemma | Lemma | lem |
definition | Definition | def |
corollary | Corollary | cor |
proposition | Proposition | prp |
example | Example | exm |
exercise | Exercise | exr |
Figures can be generated by a code chunk within the source document. For example, integrals and derivatives of cubic B-splines with three internal knots generated by the splines2 package (Wang and Yan 2017) are plotted by the following R code chunk. The resulting plot is shown in Figure 4.1.
x <- seq.int(0, 1, 0.01)
knots <- c(0.3, 0.5, 0.6)
ibsMat <- ibs(x, knots = knots, intercept = TRUE)
dbsMat <- dbs(x, knots = knots, intercept = TRUE)
par(mar = c(2.5, 2.5, 0.2, 0.2), mgp = c(1.5, 0.5, 0), mfrow = c(1, 2))
matplot(x, ibsMat, type = "l", ylab = "B-spline Integrals")
abline(v = knots, lty = 2, col = "gray")
matplot(x, dbsMat, type = "l", ylab = "B-spline Derivatives")
abline(v = knots, lty = 2, col = "gray")
It is possible that we may not wish to regenerate a plot from R
code. Instead of reproducing plots on the fly, we may also include an existing
figure in the document by the function knitr::include_graghics
. Suppose we
have already generated quadratic M-splines and I-splines (Ramsay 1988)
with three internal knots by splines2 and saved the plots under
directory figs
, respectively. Then we may skip the regeneration step and
include the existing plot directly as follows:
knitr::include_graphics(c("figs/mSpline.png", "figs/iSpline.png"))
In the code chunk shown above, the chunk option out.width = '45%'
and
fig.show = 'hold'
were set so that the plots were placed side by side. We may
set the chunk option echo = FALSE
so that the code chunk generating the plots
are excluded from the output. Also, the chunk option cache
can be set to be
TRUE
for time-consuming code chunks once the code chunk is unlikely to be
modified.
Tables can be similarly generated by a code chunk within the source document.
Table 3.1 was, in fact, generated by function
knitr::kable
. Another simple example of table generation by knitr::kable
is
given in the following code chunk. Table 5.1 is the resulting table.
knitr::kable(head(iris), booktabs = TRUE,
caption = 'The first six rows of the iris dataset.')
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
There are other R packages that can be of tremendous help in generating the Markdown source for various R objects. For example, the package xtable (Dahl 2016) provides a more sophisticated support for generation of table source for LaTeX and HTML; the package pander (Daróczi and Tsegelskyi 2015) provides functions for printing a variety of R objects in pandoc’s Markdown; the package stargazer (Hlavac 2015) produces LaTeX code, HTML code and SCII text for well-formatted tables for results from regression models. See CRAN task view on reproducible research for a more comprehensive package list.
In addition to R, the code chunk can be written in a variety of
other languages, such as Bash, Python,
SAS, etc., by specifying the chunk option engine
.
The following code chunk is one toy example written in Python 3.
foo = "Hello " + "world!"
print("The length of '%s' is %d." % (foo, len(foo)))
>>> The length of 'Hello world!' is 12.
We may set the chunk option eval = FALSE
if we only want to present the code
without evaluation.
The htmlwidgets package (Vaidyanathan et al. 2016) provides a framework for easily creating R bindings to JavaScript libraries. Several R packages built based on it, such as leaflet (Cheng and Xie 2016) and DT (Xie 2016), enable us to embed interactive HTML widgets in the HTML output. For PDF output, a screenshot taken by the package webshot (Chang 2016) will be included instead.
For example, we embed a map for the location of Department of Statistics at University of Connecticut (UConn) by leaflet in Figure 7.1.
urlTemplate <- "https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png"
leaflet(width = 900, height = 500) %>% addTiles(urlTemplate = urlTemplate) %>%
setView(- 72.251113, 41.810757, zoom = 17) %>%
addPopups(- 72.251113, 41.810757,
'<b>Department of Statistics, UConn</b>')
Another example of using the package DT to display mtcars
data is
given here. The result is shown in Figure
7.2.
DT::datatable(mtcars)
The package shiny (Chang et al. 2017) is a great tool providing readers with an interactive way to explore data and results. We may easily build Shiny applications on our own, deploy, and share it online at shinyapps.io by the package rsconnect (Allaire 2016). In addition to building regular applications by Shiny, the package miniUI (Cheng 2016) provides layout function designed for Shiny applications with appropriate size on small screens.
We may embed Shiny applications in the document by knitr::include_app
, which
is mainly designed for HTML output. Similarly, a screenshot taken by
webshot will be embedded instead for PDF output. The package
webshot provides argument zoom
for a possible high resolution
screenshot. However, if the resolution is still not satisfactory, we may take
a screenshot and include it manually by knitr::include_graphics
.
An example Shiny application visualizing different kind of spline bases is given in Figure 8.1.
knitr::include_app("https://wenjie-stat.shinyapps.io/minisplines2/", "500px")
In summary, we provided this project template and reviewed most common components and their syntax of writing a single R Markdown document with the power and love of bookdown and many other fantastic packages.
Xie (2017) provided a thorough introduction to bookdown including more advanced customization and other output formats. Additionally, the manual of Pandoc gives all the available options that can be specified through the YAML metadata section.
The template source and other associated files, such as BibTeX and CSS file, are available at our GitHub repository dslab-templates: https://github.com/statds/dslab-templates.
We would like to thank Yihui Xie and all the other authors and contributors for the fabulous knitr, rmarkdown, and bookdown packages. It would also be impossible for this template to work without the fantastic open-source software: R, pandoc, etc.
Allaire, JJ. 2016. rsconnect: Deployment Interface for R Markdown Documents and Shiny Applications. https://CRAN.R-project.org/package=rsconnect.
Chang, Winston. 2016. webshot: Take Screenshots of Web Pages. https://CRAN.R-project.org/package=webshot.
Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2017. shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.
Cheng, Joe. 2016. miniUI: Shiny UI Widgets for Small Screens. https://CRAN.R-project.org/package=miniUI.
Cheng, Joe, and Yihui Xie. 2016. leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.
Dahl, David B. 2016. xtable: Export Tables to LaTeX or HTML. https://CRAN.R-project.org/package=xtable.
Daróczi, Gergely, and Roman Tsegelskyi. 2015. pander: An R Pandoc Writer. https://CRAN.R-project.org/package=pander.
Hlavac, Marek. 2015. stargazer: Well-Formatted Regression and Summary Statistics Tables. Cambridge, USA: Harvard University. https://CRAN.R-project.org/package=stargazer.
Koohafkan, Michael C. 2015. kfigr: Integrated Code Chunk Anchoring and Referencing for R Markdown Documents. https://CRAN.R-project.org/package=kfigr.
Ramsay, J. O. 1988. “Monotone Regression Splines in Action.” Statistical Science. JSTOR, 425–41.
Vaidyanathan, Ramnath, Yihui Xie, JJ Allaire, Joe Cheng, and Kenton Russell. 2016. htmlwidgets: HTML Widgets for R. https://CRAN.R-project.org/package=htmlwidgets.
Wang, Wenjie, and Jun Yan. 2017. splines2: Regression Spline Functions and Classes Too. https://CRAN.R-project.org/package=splines2.
Xie, Yihui. 2016. DT: A Wrapper of the JavaScript Library ’DataTables’. https://CRAN.R-project.org/package=DT.
———. 2017. bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.
wenjie.2.wang@uconn.edu; Ph.D. student at Department of Statistics, University of Connecticut.↩