3. Project Management

3.1. Reproducible Data Science Project

3.1.1. Using Virtual Environment for Python

It is a good practice to set up and use virtual environments for Python (or R) projects. See a tutorial of virtual environments at Python Docs.

3.1.2. Using Jupyter Notebook

  • File name extension .ipynb

  • Separate code chunks and companion texts

  • Test and edit until the whole notebook runs as expected

  • Download the notebook into a pdf file (for GitHub project release)

3.1.3. Using python Engine in RMarkdown

  • Find a Markdown cheatsheet

  • Install R reticulate package

  • See example in hw-rmkdn.Rmd in the repo

3.2. Setting up a Git Repo

  • See documentations at GitHub Docs.

  • Demonstration with homework template

3.3. Styles

3.3.1. Programming

  • Naming

    • file/folder

    • variables

    • functions

    • modules/packages

  • Spacing

  • Indentation

Google code recommendations: https://code.google.com/archive/p/soc/wikis/PythonStyleGuide.wiki

3.3.2. Git Repo

  • Frequent commit (more snapshots)

  • Informative message

  • Keep it clean (no temporary or generated files)

  • Make it reproducible (e.g., relative path)