One widely accepted concept is the three pillars of data science: mathematics/statistics, computer science, and domain knowledge.
In her 2014 Presidential Address, Prof. Bin Yu, then President of the Institute of Mathematical Statistics, gave an interesting definition: \[
\mbox{Data Science} =
\mbox{S}\mbox{D}\mbox{C}^3,
\] where S is Statistics, D is domain/science knowledge, and the three C’s are computing, collaboration/teamwork, and communication to outsiders.
1.2 Expectations from This Course
Proficiency in project management with Git.
Proficiency in project report with Quarto.
Hands-on experience with real-world data science project.
Competency in using Python and its extensions for data science.
Full grasp of the meaning of the results from data science algorithms.
Basic understanding the principles of the data science methods.
1.3 Computing Environment
All setups are operating system dependent. As soon as possible, stay away from Windows. Otherwise, good luck (you will need it).
At least, you need to know how to handle files and traverse across directories. The tab completion and introspection supports are very useful.
1.3.2 Python
Set up Python on your computer:
Python 3.
Python package manager miniconda or pip.
Integrated Development Environment (IDE) (Jupyter Notebook; RStudio; VS Code; Emacs; etc.)
I will be using IPython and Jupyter Notebook in class.
Readability is important! Check your Python coding styles against the recommended styles: https://peps.python.org/pep-0008/. A good place to start is the Section on “Code Lay-out”.
Softmax Regression & Neural Networks with TensorFlow
1.7 Final Project Presentation Schedule
We use the same order as the topic presentation for undergraduate final presentation.
Date
Presenter
04/17
Ho, Garrick
04/17
Mastrorilli, Ginamarie
04/17
Yi, Guanghong
04/17
Karandikar, Shivaram
04/19
Jones, Courtney
04/19
Sullivan, Colin
04/19
Bedard, Kaitlyn
04/19
Nhan, Nathan
04/24
Parchekani, Kian
04/24
Noel, Luke
04/24
Whitney, William
04/24
Nguyen, Christine
04/26
Cummins, Patrick
04/26
Zheng, Michael
04/26
Lunetta, Giovanni
I encourage you to work on NYC open data or other open data for your projects and submit an abstract to the Government Advances in Statistical Programming (GASP) 2023 conference, June 14-15, 2023. The deadline for abstract submission is April 1.
1.8 Contribute to the Class Notes
Start a new branch and switch to the new branch.
On the new branch, add a qmd file for your presentation
Edit _quarto.yml add a line for your qmd file to include it in the notes.
Work on your qmd file, test with quarto render.
When satisfied, commit and make a pull request.
I have added a template file mysection.qmd and a new line to _quarto.yml as an example.