Data Science and Society with R

DSDA/STAT 1010 - Quarto Book

Author

Jason Byers and Jun Yan

Preliminaries

Tip

Welcome! This Quarto book hosts all lecture notes, in-class activities, and weekly reading for DSDA/STAT 1010. It is designed for first-year students with no prerequisites.

The notes were developed with Quarto; for details about Quarto, visit https://quarto.org/docs/books.

This book is free and is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.

Sources at GitHub

These lecture notes for STAT/DSDA 1010 in Fall 2025 are developed by Professor Jun Yan, with help from generative AI and the students enrolled in the course. This cooperative approach to education was facilitated through the use of GitHub, a platform that encourages collaborative coding and content development. To view these contributions and the lecture notes in their entirety, please visit our GitHub repository at https://github.com/statds/1010f25.

Students are welcome to contribute to the lecture notes by submitting pull requests to our GitHub repository. This method not only enriched the course material but also provided students with practical experience in collaborative software development and version control.

Adapting to Rapid Skill Acquisition

In this course, students are expected to rapidly acquire new skills, a critical aspect of data science. To emphasize this, consider this insightful quote from VanderPlas (2016):

When a technologically-minded person is asked to help a friend, family member, or colleague with a computer problem, most of the time it’s less a matter of knowing the answer as much as knowing how to quickly find an unknown answer. In data science it’s the same: searchable web resources such as online documentation, mailing-list threads, and StackOverflow answers contain a wealth of information, even (especially?) if it is a topic you’ve found yourself searching before. Being an effective practitioner of data science is less about memorizing the tool or command you should use for every possible situation, and more about learning to effectively find the information you don’t know, whether through a web search engine or another means.

This quote captures the essence of what we aim to develop in our students: the ability to swiftly navigate and utilize the vast resources available to solve complex problems in data science. Examples tasks are: install needed software (or even hardware); search and find solutions to encountered problems.

Course Tools

  • R & RStudio for analysis
  • Quarto for reproducible documents and dashboards
  • Git & GitHub for version control and project management
  • Command line for automation and efficiency

Policies & Syllabus

See the course syllabus on HuskyCT.

Key reminders: academic integrity, no AI-generated text in graded submissions, and professional email etiquette.

Grading Rubics

Baseline (C level work)

  • Your .qmd file knits to HTML without errors.
  • You answer questions correctly but do not use complete sentences.
  • There are typos and ‘junk code’ throughout the document.
  • You do not put much thought or effort into the reflection answers.
  • You do not follow the good styles in using R, Quarto, and Git.

Average (B level work)

  • You use complete sentences to answer questions.
  • You attempt every exercise/question.

Advanced (A level work)

  • Your code is simple and concise.
  • Unnecessary messages from R are hidden from being displayed in the HTML.
  • Your document is typo-free.
  • You practice all the good styles of using R, Quarto, and Git.
  • At the discretion of the instructor, you give exceptionally thoughtful or insightful responses.

Schedule and Readings

  1. Computing environment
    • R4DS Ch 28-29
    • HGR Ch 20-23
  2. Jump start with R
    • R4DS Ch 4-8; Ch 20-24
  3. Visualization
  4. Data transformation
    • R4DS Chapter 3