Chapter 2 Using the Right Tools for Writing
Many people use MS Word when it comes to writing. Not withholding the importance of the invention of MS Office, it is not the right tool to write statistical papers. Writing a statistical paper using MS Word would be as interesting as running a statistical data analysis using MS Excel. Simply put, MS Office is great for office staff to do routine office documentary tasks. For professional writing, one need to be aware of the professional tools and invest time to master them.
The right, high-quality, professional typesetting system is LaTeX. LaTeX is a
typesetting language that makes it easier and cleaner to write documents
involving extensive mathematical content. It is the standard in Statistics,
Mathematics, Physics, Chemistry and other disciplines that require many
mathematical formulas.
It separates the appearance of a document from its content. This allows authors
to be able to focus on writing the content without having to worry about
its appearance until the end. There are many different
professionally looking appearances one can choose or design, allowing for easy
adaptation to different formats and styles.
The document has .tex
extension, and can be
edited by your favorite text editor. The final output of the document can have
different formats, the most popular of which is pdf
, which stands for portable
document format. It can be opened on any platform (computer operating system).
The source .tex
file is a plain text file. Just like source code of any
programming language, a plain text file allows version control, which makes
tracking and managing the source easy and professional. The most popular version
control tool today is git
.
2.1 Git for Version Control
Many tutorials are available in different formats. Here is a YouTube video ``Git and GitHub for Beginners — Crash Course’’ The video also covers GitHub, a cloud service for Git. Other similar services are, for example, bitbucket and GitLab. A cloud service gives you a cloud back up of your work and makes collaboration with co-workers easy.
There are tools that make learning Git easy.
- Here is a collection of online Git exersices that I used for Git training in other courses that I taught.
- Here is a game called
Oh My Git
, an open source game about learning Git!
2.1.1 Set Up
- Download Git here.
- Make a GitHub Account here if you don’t have one yet.
- Get started with your GitHub account by following the help
page.
- One important step is the set-up.
- The connection between your local and GitHub repositories needs to be set up only once. One easy way is with a personal access token, as illustrated in a YouTube video.
2.1.2 Most Frequently Used Git Commands
git clone
:- Clones a remote repository to a local folder.
- Requires either HTTPS link or SSH key to authenticate.
git pull
:- Downloads any updates made to the remote repository and automatically updates the local repository.
git status
:- Returns the state of the working directory.
- Lists the files that have been modified, and are yet to be or have been staged and/or committed.
- Shows if the local repository is begind or ahead a remote branch.
git add
:- Adds new or modified files to the Git staging area.
- Gives the option to select which files are to be sent to the remote repository
git rm
:- Used to remove files from the staging index or the local repository.
git commit
:- Commits changes made to the local repository and saves it like a snapshot.
- A message is recommended with every commit to keep track of changes made.
git push
:- Pushes commits made on local repository to the remote repository.
2.1.3 Tips on using Git:
- Use the command line interface instead of the web interface (e.g., upload on GitHub)
- Make frequent small commits instead of rare large commits.
- Make commit messages informative and meaningful.
- Name your files/folders by some reasonable convention.
- Lower cases are better than upper cases.
- No blanks in file/folder names.
- Keep the repo clean by not tracking generated files.
- Creat a
.gitignore
file for better output fromgit status
. - Keep the linewidth of sources to under 80 for better
git diff
view.
2.1.4 Pull Request
To contribute to an open source project (e.g., our classnotes), use pull requests. Pull requests “let you tell others about changes you’ve pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.”
Watch this YouTube video: GitHub pull requests in 100 seconds.
2.2 LaTeX for Typesetting
A source file has extension name .tex
. It is a plain text file that can
be edited by any text editor. It can be tracked easily for differences between
any two versions. Different document classes are predefined such as letter
,
article
, report
, beamer
(for presentations), and book
. Customized
document classes can be defined once
you know more about LaTeX. We focus on article
here.
The instructions in this section are practiced in a demo repo.
Anthony Zeimekakis was an undergraduate student who worked with us on a thesis. The tex sources, data, and code are in a GitHub repo, which can be used as a template too. This paper was published in American Statistician, two years after Anthony graduated. Interested students beaware that this is a serious commitment.
2.2.1 A beginner’s template
For the product to look like a paper, we need to have title, author, abstract,
sections, and references. Let us start from a very basic template in the
demo repo. Clone it to an appropriate
location on your own computer. Go to the manuscript
folder and compile the pdf
product with the following:
pdflatex statspaper
bibtex statspaper
pdflatex statspaper
pdflatex statspaper
It is the bibtex
step that incorporates the references from the bib files. Two
rounds of pdflatex
are necessary for to get all the cross-referencing
settled.
The whole process could be automated by:
latexmk -pdf statspaper
Advanded users may take a look at the Makefile
, in which different targets can
be set up and the needed opertations for each target is automated.
Tips on getting started with .
- Read the compiling log and fix the errors/warnings.
- Googling the error/warning messages usually helps.
- Limit the preamble to include only what is necessary.
- Set up document margins with the
geometry
package. - No manually controlling spaces.
- Familiarize yourself with LaTeX symbol tables.
- Keep line widths under 80 characters in source files.
- Separate paragraphs in source files by double blank lines.
- Define acronyms at their first occurrences and only once.
2.2.2 Math equations
For serious math typesetting, use packages amsmath
, amsthm
, and others.
Tips on using math:
- Punctuate equations as they always are part of sentences.
- Add spaces between symbols for better readability in sources.
- Do not start a sentence with a math symbol; rephrase to avoid it.
- No fractions (
\frac
) in inline math expressions. - No breaking inline math expressions into different lines in tex sources.
- No labeling equations that are not referenced.
- Reference labeled equations with
\eqref
instead of(\ref)
. - Keep fonts consistent for the same notations (e.g., \(n\) not n; AIC not \(AIC\)).
- Use appropriate sizes for parentheses.
- When multiple parentheses are needed in mathematical expressions, use the following ordering unless the journal specifies otherwise \([\{(\mbox{math here})\}].\)
- Use predefined math functions (e.g, \(\exp\) not \(exp\); \(\Pr\) not \(P\)).
- Use
\allowdisplaybreak
to allow page breaks in aligned equations. - Use
\dd
for differentiation operator (available from packagephysics
). - No breaking long equations arbitrarily in tex source; break them into short lines at appropriate places and add sufficient spaces to make the sources more readable.
- Align at appropriate places in multiline equations.
2.2.3 Tables
If you are manually typing a table source, think if you can generate the
source. There are multiple R packages that can generate the tex source from a
given dataset. See package xtable
for example.
Tips on professional tables:
- Use
tbp
for floating locations; avoidh
. - Make it self-contained with an informative caption.
- Captions should be located above the table unless the journal specifies otherwise.
- Avoid vertical lines.
- Put negative signs in math mode.
- Use better top, middle, and bottom rules from package
booktabs
. - Allow hierarchy by
cmidrule()
. - Do not change font size for tables. Change table layout to fit instead of re-sizing it.
- Right adjust numbers with decimal places.
- Use consistent number of decimal places within a column or row of same types of measurements.
- Avoid having many leading 0’s in decimal entries.
2.2.4 Figures
Use vector graphs, not raster graphs (unless you have to, e.g., screenshots). Save the code that generates the figures so the figures can be improved easily.
Tips on figures:
- Use
tbp
for floating locations; avoidh
. - Use latex package
graphicx
. - Make it self-contained with an informative caption.
- Captions should be located below the figure unless the journal specifies otherwise.
- For line plots with different groups, use different line pattern to distinguish them, not only color, so that readers can tell the difference if printed in black/white. Same for different dots (symbols) on plots.
- Use colorblind friendly colors (especially avoid red/green).
- Keep the right aspect ratio when necessary (e.g., basketball court; map; pp-plot).
- Remove extra margins.
- Keep the ratio when resizing (e.g.,
width = \textwidth
) - Name the figure files appropriately.
2.2.5 References
BibTeX is a reference management tool for formatting lists of references that
can be used together with to generate a reference list.
Non-referenced
references are not to be cited. All referenced references are to be listed. This
nice feature is made possible by the package natbib
. We need to collect
references in BibTeX format and save them in a bib database (.bib
) file. The
display styles of the references are controlled by bib style (.bst
)
files. Many journals have their own bib style files available for download. One
can construct a customized bib style easily with the help of custom-bib
.
An alternative to BibTeX and natbib
is biblatex
. Most journals, however, use
BibTeX and natbib
, so we focus on that here.
A reference is cited in the manuscript through its key by \citep{}
for
parenthetical citations or \citet{}
for textual citations, where the key is
placed inside the curly brackets. The key is used to cite or cross-reference the
bibliographic entry in a .tex
document. Variations
\citep*{}
and \citet*{}
prints all authors. Sometimes, \citeauthor{}
and \citeyear{}
can be useful when only author(s) or year is needed.
The key of the cited references is put in the parentheses.
For \citep{}
, multiple keys separated by commas can be put in the same
parentheses for citing
multiple references. Two optional arguments are allowed to \citep[][]{}
.
For example, \citep[see, e.g.,][p. 26]{}
could be useful when a specific page
(or section/chapter) is being referenced as an example.
In general, to compile a tex file with bibtex references into a pdf document,
one needs to run pdflatex
first, then bibtex
, and then pdflatex
twice to
get the references correct. A simpler solution is latexmk -pdf
. In my
practice, I always have a Makefile
and use make
to smartly automate the
compiling process. See, for example, Anthony’s thesis repo.
Tips on preparing BibTeX databases:
- Devise a good naming convention for reference keys and stick to it.
- Keep the bib database sorted and formatted tidy. (No repeated entries.)
- Title: Capitalize first letters of notional words (not form words).
- Use Google Scholar to get the bibtex source of a reference, but be sure to quality control the google output for missing fields and errors.
- Protect capitalization of words with special meanings in curly braces.
(e.g.,
{B}ayesian
,{M}arkov Chain {M}onte {C}arlo
) - Protect capitalization of initial words after a colon in titles.
- Use title style for jornal/book titles.
- For book chapters or proceeding articles, use
@incollection
instead of@article
, and fill thebooktitle
andeditor
fields. - Separate pages numbers with double dashes and no other spaces (e.g.,
pages = {110--118}
). - Books need to have publisher and address fields.
- For preprints, always check if they have been published recently.
- Use the
note
field to show information that should always be shown, - All references without page numbers or volume number should be checked.
2.2.6 Cross-referencing
Define a label for each object and refer to it by its label.
Tips on cross-referencing:
- Devise a good naming convention for labels and stick to it.
- Use different label prefixes for different types of objects (e.g,
eq:
for equations,sec:
for sections,tab:
for tables,fig:
for figures,alg:
for algorithms, etc.) - Labels within the source(s) for a single document must be unique.
- Watch warnings from compiling logs for undefined labels or multiply defined labels and fix them.
- Use package
xr
for cross-document referencing (and labels must be unique across documents).
2.4 Command Line Interface
On Linux or MacOS, simply open a terminal.
On Windows, several options can be considered.
- Cygwin (with X): https://x.cygwin.com
- Git Bash: https://www.gitkraken.com/blog/what-is-git-bash
The new Windows OS provides a Windows Subsystem for Linux. As the name suggests, it aims to provide a Linux system on a Windows computer. It might be worth trying out.
To jump start, here is a tutorial: Ubuntu Linux for beginners.
At least, you need to know how to handle files and traverse across directories. The tab completion and introspection supports are very useful.
Here are several commonly used shell commands:
cd
: change directory;..
means parent directory.pwd
: present working directory.ls
: list the content of a folder;-l
long version;-a
show hidden files;-t
ordered by modification time.mkdir
: create a new directory.cp
: copy file/folder from a source to a target.mv
: move file/folder from a source to a target.rm
: remove a file a folder.