Code
## End your R session programmatically
q()
This chapter gives you the minimum essentials to start using R comfortably. It assumes no prior knowledge and emphasizes good habits from the very beginning. We cover how to start and quit R, get help, understand core object types, subset objects, use basic control structures, manage your working directory, and write clean code.
Rendering note. All code chunks use Quarto syntax and can be run via quarto render
.
.R
) or Quarto document (.qmd
).Ctrl-Enter
(Win/Linux) or Cmd-Enter
(Mac). The console runs one complete line at a time.## End your R session programmatically
q()
Positron is organized into panes and a sidebar.
.R
and .qmd
files; supports tabs.Ctrl-Shift-P
or Cmd-Shift-P
to search commands.Working in a project
data/
and scripts in R/
or src/
.Running code
Ctrl/Cmd-Enter
..qmd
with the Run Cell button.Keep the Files and Console visible. Beginners benefit from constant feedback on where they are and what ran.
R has built‑in help for every function. Every call or command you type is calling a function.
Search the help system on a topic:
help.search("linear model")
Get the documentation of a function with known name:
?meanhelp(mean)
Inspect arguments quickly for a function
args(mean)
function (x, ...)
NULL
Run examples in the documentation (man page)
example(mean)
Practice: find how sd()
handles missing values.
Everything you store is a vector or built from vectors. Length‑one values are still vectors.
Atomic vector types (all of fixed type):
## Atomic vectors (length one shown; still vectors)
<- 3.14 ## double (numeric)
num <- 2L ## integer
int <- "Ann" ## character
chr <- TRUE ## logical
lgc
## A longer vector (same type throughout)
<- c(1, 2, 3) v
Higher‑level structures built from vectors:
## Matrix/array: same type, 2D or more
<- matrix(1:6, nrow = 2)
m
## List: heterogenous elements
<- list(name = "Bob", age = 25, scores = c(90, 88))
lst
## Data frame: list of equal‑length columns
## (columns can be different atomic types)
<- data.frame(name = c("Ann", "Bob"), age = c(20, 25))
df
## Function: also an object
<- function(x) x^2 sq
Inspect objects:
## Class and structure
class(df)
[1] "data.frame"
str(df)
'data.frame': 2 obs. of 2 variables:
$ name: chr "Ann" "Bob"
$ age : num 20 25
Prefer str(x)
for a compact view of what an object contains, its type, and its sizes.
Exercise. Create one example of each object above and check with class()
and str()
.
Use bracket notation consistently.
## Vectors
<- c(2, 4, 6, 8)
x 2] ## second element x[
[1] 4
1:3] ## slice x[
[1] 2 4 6
> 5] ## logical filter x[x
[1] 6 8
## Matrices
<- matrix(1:9, nrow = 3)
m 2, 3] ## row 2, col 3 m[
[1] 8
1] ## first column m[,
[1] 1 2 3
## Data frames
<- data.frame(name = c("Ann", "Bob"), age = c(20, 25))
people $age ## column by name people
[1] 20 25
1, ] ## first row people[
"name"] ## column by string people[,
[1] "Ann" "Bob"
## Replace sentinel values with NA
<- -999
x if (x == -999) {
<- NA
x
}print(x)
[1] NA
Useful when applying a simple rule across columns.
## Make a toy data frame with a sentinel value
<- data.frame(
scores math = c(95, -999, 88, 91),
eng = c(87, 90, -999, 85),
sci = c(92, 88, 94, -999)
)
## Replace -999 with NA, then compute column means
for (col in names(scores)) {
## clean
<- scores[[col]] == -999
bad <- NA
scores[[col]][bad] ## summarize
<- mean(scores[[col]], na.rm = TRUE)
m cat(col, "mean:", m, "\n")
}
math mean: 91.33333
eng mean: 87.33333
sci mean: 91.33333
Stop when an estimate is precise enough.
## Estimate P(X > 1.96) for N(0,1) via Monte Carlo
## Stop when stderr < 0.002
set.seed(1)
<- 0
count <- 0
n <- Inf
se
while (se > 0.002) {
## simulate in small batches for responsiveness
<- rnorm(1000)
z <- n + length(z)
n <- count + sum(z > 1.96)
count <- count / n
p_hat <- sqrt(p_hat * (1 - p_hat) / n)
se
}
cat("p_hat:", p_hat, "n:", n, "se:", se, "\n")
p_hat: 0.0285 n: 8000 se: 0.001860368
Exercise. Write a loop that, for each numeric column in a frame, replaces -999
with NA
, then reports the fraction of missing values.
Loops are fine for clarity. Later you will see vectorized and apply‑family solutions that are faster and shorter.
## Working directory
getwd() ## where am I
[1] "/Users/junyan/work/teaching/1010-f25/1010f25"
## setwd("path/to/folder") ## set if necessary
.qmd
.Ctrl/Cmd-Enter
.Use project‑relative paths and file.path()
to build paths. This keeps code portable across operating systems.
R can load data from text files and many other formats.
## Read a CSV file (comma-separated)
<- read.csv("data/india.csv")
cars
## Read a general table with custom separators
<- read.table("data/survey.txt", header = TRUE, sep = " ") survey
Arguments to know: - header = TRUE
tells R the first row has column names. - sep
controls the separator (“,” for CSV, ” ” for tab‑delimited).
Check the imported object with str()
or head()
immediately to ensure it loaded as expected.
The foreign package imports legacy statistical software formats (SAS, SPSS, Stata):
library(foreign)
<- read.spss("data/study.sav", to.data.frame = TRUE)
data_spss <- read.dta("data/study.dta") data_stata
More modern workflows often use the haven package (part of the tidyverse) for these formats, but foreign
is available in base R distributions.
Adopt consistent style early. Follow the tidyverse guide: https://style.tidyverse.org/
<-
for assignment.## Your Name
## 2025-09-02
## Purpose: demonstrate basic R style
<- 1 # inline note uses a single x
Comment convention. Start‑of‑line comments use at least two hashes (##
). Reserve a single #
for end‑of‑line notes.
x
and X
are different./
work on all platforms in R.## Portable path building
file.path("data", "mtcars.csv")
[1] "data/mtcars.csv"
## Floating‑point comparison
0.1 == 0.3 / 3
[1] FALSE
all.equal(0.1, 0.3 / 3)
[1] TRUE
## Reveal stored value with extra digits
print(0.1, digits = 20)
[1] 0.10000000000000000555
sprintf("%.17f", 0.1)
[1] "0.10000000000000001"
Use all.equal()
(or an absolute/relative tolerance) rather than ==
for real‑number comparisons.
You should now be able to:
class()
and str()
.if
, for
, and while
in useful contexts.