Introduction to Data Science

Introduction to Data Science#

Overview#

Data science is a multidisciplinary field that focuses on extracting meaningful insights from large datasets by combining techniques from mathematics, statistics, computer science, and domain expertise, allowing businesses and researchers to make informed decisions based on data analysis, pattern recognition, and predictive modeling

You will start with an overview of the course and then an introduction to statistics and data science. You will learn some of the tools of the trade in scientific python, particularly numerical python, Jupyter nodebooks, and handling data.

Goals#

Overview of the course, activities and policies
Setting up your environment on Google Colab
Gain familiarity with Jupyter Notebooks and Numerical python
Learn about handling and describing data
Learn about the importance of clustering data in physics
Learn how to find structure in data (clustering)
- KMeans, Spectral Clustering, DBSCAN
Measure and reduce dimensionality
Adapt linear models to nonlinear problems
Learn about Kernel functions

Lecture Materials#

Homework Assignment#

Homework 01: Introduction to Data Science

Supplemental Readings#

A Whirlwind Tour of Python, Jake VanderPlas: free PDF, notebooks online.
IPython: Beyond Normal Python
Python Data Science Handbook
Introduction to NumPy
Data Manipulation with Pandas
Scikit-learn
Whitening transformation
Eigenvalue/Eignvector refresher
Principle Component Analysis
PCA Step-by-Step
Blind Signal Separation
Kernel Method
Mercer’s Theorem
Similarity Measure
Nonlinear Dimensionality Reduction by Locally Linear Embedding