Introduction to Data Science#

Overview#

Data science is a multidisciplinary field that focuses on extracting meaningful insights from large datasets by combining techniques from mathematics, statistics, computer science, and domain expertise, allowing businesses and researchers to make informed decisions based on data analysis, pattern recognition, and predictive modeling

You will start with an overview of the course and then an introduction to statistics and data science. You will learn some of the tools of the trade in scientific python, particularly numerical python, Jupyter nodebooks, and handling data.

Goals#

  • Overview of the course, activities and policies

  • Setting up your environment on Google Colab

  • Gain familiarity with Jupyter Notebooks and Numerical python

  • Learn about handling and describing data

  • Learn about the importance of clustering data in physics

  • Learn how to find structure in data (clustering)

    • KMeans, Spectral Clustering, DBSCAN

  • Measure and reduce dimensionality

  • Adapt linear models to nonlinear problems

  • Learn about Kernel functions

Lecture Materials#

Homework Assignment#

Supplemental Readings#