Machine Learning for Physics#

Physics 498 MLP   Spring 2024

  • Instructor:

  • Class Meetings:

    • Tuesdays and Thursdays from 1:00 pm to 2:20 pm

    • Room: 3101 Sidney Lu MEB

  • 4 credit hours

Calendar#

Note: This schedule will evolve throughout the semseter

Week

Topic

Homework

Projects

Jan 15

Introduction to Data Science

HW 01

Jan 22

Probability Theory and Density Estimation

HW 02

Jan 29

Bayesian Statistics I

HW 03

Feb 05

Bayesian Statistics II

HW 04

Feb 12

Introduction to Artificial Intelligence and Machine Learning

HW 05

Feb 19

Deep Learning and Generative Modeling

Feb 26

Convolutional and Recurrent Neural Networks

HW 06

Project 01

Mar 04

Geometric Deep Learning and Graph Neural Networks

Mar 11

SPRING BREAK - NO CLASSES

Mar 18

Attention Mechanism and Transformers

Mar 25

Reinforcement Learning

HW 07

Apr 01

AI Explainablility and Uncertainty Quantification

HW 08

Apr 08

Unsupervised Learning and Anomaly Detection

Apr 15

Physics Informed Neural Networks

Project 02

Apr 22

Learning from the Machines

Apr 29

Future of AI and Physics: What Lies Ahead?

Overview#

Course Overview#

Welcome! This course presents an introduction to modern data science, artificial intelligence (AI) and machine learning (ML) from a physics perspective. Students will learn the basic concepts, tools, and methods of AI/ML applied to scientific challenges. Students will study how physics knowledge can be incorporated into AI/ML models to improve their learning efficiency, performance, and interpretability. Topics covered include artificial neural networks (NNs), AI/ML-enhanced modeling/simulation, deep generative models, simulation-free inference, variational inference, convolutional NNs, recursive NNs, geometric deep learning, attention mechanism and transformers, auto-encoders, and anomaly detection. Students will also explore the different types of learning from data, including supervised, semi-supervised and unsupervised learning. Applications to physics will be emphasized.

You can find more detail in the Calendar section on the specific topics that will be covered in this course.

Learning Objectives#

Upon completion of the course students will be able to:

  1. Understand the basic concepts and tools of modern data science, artificial intelligence (AI), and machine learning (ML)

  2. AI/ML for Physics: Apply AI/ML modern methods to address scientific challenges using open data

  3. Physics for AI/ML: Learn how to include physics knowledge into AI/ML models to improve their learning efficiency, performance and interpretability

Course Logistics#

Format#

  • This course will consist of two meetings per week: one lecture period and one in-class practical session.

  • Lecture: Tuesday from 1:00 pm to 2:20 pm in 3101 Sidney Lu MEB

  • Practical Session: Thursday from 1:00 pm to 2:20 pm in 3101 Sidney Lu MEB

Instructor#

TA#

Online Tools#

There are several online tools you will need to use as part of this course.

Campuswire#

We will use Campuswire as a class forum, a way to message the course staff and each other, and a means to submit your attendance question.

Google Colab#

Using Google Colab, you will be able to program your code in a Jupyter notebook and submit it for us to grade. Please sign in to your Illinois account. While working on the assignment, you will share each of your colab assignments with the professor and the TA (but no one else).

Gradescope#

On Gradescope, you will submit your assignments and find your graded assignments.

Coursework#

Homework Assignments#

You will be assigned weekly homework assignments that will put into practice what you learned in lecture for the week.

  • You will work on the assignments both during the in-class session on Thursdays and as homework.

  • You will submit your executed (i.e. with “RunAll”) homework notebook via Gradescope.

  • Each assignment is due at the beginning of the next class unless otherwise noted. You may turn assignment in up to one week late for 50% credit (except that all assignments are strictly due the day before Reading Day).

  • Solutions to the homeworks will not be given.

  • You may collaborate on assignments but must submit your own work.

  • Graded homework will be available through Gradescope.

Projects#

At appropriate times throughout the course, you will select from a list of projects that involve demonstrating and extending your work in class by doing something cool and interesting in data analysys. You must work alone on this (i.e. without collaboration).

For projects you will put together a Jupyter notebook that demonstrates your project. The notebook should have code and demonstrate the task but also be written in an expository way that other students could, in principle, read and learn from. It is submitted in an analogous way as the regular course assignments.

Each project notebook must be submitted via Gradescope for grading.

Grading#

  • Class attendence and participation: 5%

  • Homework: 70%

  • Projects: 25%

Letter grades will be assigned as follows:

  • A+   [97.0 - 100.0]

  • A     [93.0 - 96.9]

  • A-   [90.0 - 92.9]

  • B+   [87.0 - 89.9]

  • B     [83.0 - 86.9]

  • B-   [80.0 - 82.9]

  • C+   [77.0 - 79.9]

  • C     [73.0 - 76.9]

  • C-   [70.0 - 72.9]

  • D+   [67.0 - 69.9]

  • D     [63.0 - 66.9]

  • D-   [60.0 - 62.9]

  • F     [00.0 - 59.9]

Datasets#

In this section we describe the datasets used in the lectures and homeworks. There are additional scientific datasets used for the projects that as described in the projects area of the course page.

Line#

A simple line with errors. Columns are x, y and dy. The reported errors are systematically too large by a constant factor, and are set to NaN for a fraction of the samples. Target is y_true.

Applications:

  • Reading CSV into a Pandas dataframe.

  • Straight line regression.

  • Handling missing values.

  • Handling (overestimated) input errors.

Pong#

Each sample is a 2D trajectory of a ping-pong ball launched with different initial conditions. Trajectories are calculated with an analytic model that includes a linear drag term. There are three clusters of trajectories with similar initial conditions, identified by target ‘grp’. Target ‘th0’ gives the true initial launch angle in degrees. Target hit target identifies trajectories that pass through a fixed “hoop” at x=0.5.

Applications:

  • Reading HF5 into a Pandas dataframe.

  • Dimensionality reduction (20D points lie on a 2D manifold).

  • Nonlinear regression (target ‘th0’).

  • Clustering (target ‘grp’).

  • Classification (target ‘hit’).

Cosmo#

Each sample is LCDM cosmology defined by input parameters ‘omega_b’, ‘omega_cdm’, ‘ln10^{10}A_s’ and ‘H0’. Corresponding targets are values of ‘sigma8’, ‘rd’, ‘DA(0.57)/rd’, ‘DH(0.57)/rd’, ‘DA(2.34)/rd’, and ‘DH(2.34)/rd’ calculated with CLASS. The CLASS calculations are relatively slow (~1 hr per 1K), so the goal of this dataset is to train a faster emulator. Input values are uniformly distributed on a grid centered on the Planck2015 best fit result and spanning +/-10 sigmas.

Applications:

  • Dimensionality reduction.

  • Approximately linear regression.

Higgs#

Data from the 2014 Higgs Challenge which is now archived here.

This file is too large to include in the repo, so instead the Pandas notebook provides a function to generate higgs_data.hf5 and higgs_target.hf5 from the downloaded .csv.gz file and copy them into the installed data path.

Applications:

  • Dimensionality reduction.

  • Train/test/split.

  • Classification.

Clusters#

Demo files for clustering: 4 in 2D with 2 clusters, and 1 in 3D with 3 clusters. Data features are ‘x0’, ‘x1’ (‘x2’) and target is ‘y’.

Applications:

  • Clustering.

Spectra#

Spectra containing two peaks with variable flux and fixed locations and widths, over a constant background, with Poisson noise added. Data features are fluxes in wavelength bins (with un-named columns). Targets are the true fluxes in each peak (‘flux1’, ‘flux2’).

Applications:

  • Dimensionality reduction.

  • Clustering.

  • Regression.

Circles#

The circles files contain 500 2D points on two concentric circles with feature names ‘x0’, ‘x1’ and target integer ‘y’ = 0,1 indicating which circle they belong to.

Applications:

  • Linear clustering in higher dimensions.

  • Kernel trick.

  • Kernel PCA.

Ess#

The ess files contain 500 3D points on a 2D sheet bent into an S-shape with features named ‘x0’, ‘x1’, ‘x2’ and target value ‘y’ from 0-1 giving the coordinate along the sheet.

Applications:

  • Manifold learning.

  • Locally linear embedding (LLE).

Blobs#

The blobs files contain 2K 3D points sampled from 3 Gaussian blobs with features named ‘x0’, ‘x1’, ‘x2’ and target value ‘y’ = 0, 1, 2 giving their generated group membership.

Applications:

  • Clustering.

  • Density estimation.

Policies#

Covid#

  • Policies as it relates to COVID-19 can be found at https://covid19.illinois.edu

  • If you feel ill or are unable to come to class or complete class assignments due to issues related to COVID-19, including but not limited to testing positive yourself, feeling ill, caring for a family member with COVID-19, or having unexpected child-care obligations, you should contact your instructor immediately, and you are encouraged to copy your academic advisor.

About using code you find on the web or generative AI for homework and projects#

The quickest way to deal with the arcana of programing is to ask Google or ChatGPT for examples of what you are seeking to accomplish. But you will need to use your own judgment in terms of value added for your learning in using these techologies Your generation will need to how learn to work productively in-concert with AI. That - that’s a technological genie out of the bottle. Finding its way back into the bottle is as a likely as a broken glass spontaeously reassembling. As with any external resource, you must always credit the original source of code and other information that you paste into your own programs, notebooks, projects, etc in a comment that includes the original source. If an author says that his/her code is not to be copied or incorporated into your programs, then DON’T.

Students must cite all references, including any code they have used that they did not write themselves. Failure to cite references will be considered an academic integrity violation and be pursued according to University policy, which may include receiving a failing grade on an assignment or in the entire course. Citations do not need to follow any specific format (such as ACM style, etc.) but should mention the author’s name and where the cited work can be found (including a URL, if applicable). In code, a citation can be left in a comment.

Academic Integrity#

You must never submit the work of someone else as your own. We understand that many of you will find it helpful to work with other students to master the course. But when you collaborate with your study group on homework assignments, you must be a full, active participant in developing the solutions that you submit for credit.

It is cheating to receive answers from another student and then use them as your own. It is cheating to submit as your own work solutions that you find by searching on the worldwide web (though see “About using code you find on the web”) or using online tools such as ChatGPT, or by subscribing to an online service that suborns cheating. It is cheating—and a violation of U.S. copyright law—to give (or sell) course material to someone else who intends to redistribute and/or sell it.

All activities in this course, are subject to the Academic Integrity rules as described in Article 1, Part 4, Academic Integrity, of the Student Code.

Sexual Misconduct Reporting Obligation#

The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX Office. In turn, an individual with the Title IX Office will provide information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.

A list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality, can be found here: wecare.illinois.edu/resources/students/#confidential.

Other information about resources and reporting is available here: https://wecare.illinois.edu and https://wellness.illinois.edu.

Mental Health Services#

Significant stress, mood changes, excessive worry, substance/alcohol misuse or interferences in eating or sleep can have an impact on academic performance, social development, and emotional wellbeing. The University of Illinois offers a variety of confidential services including individual and group counseling, crisis intervention, psychiatric services, and specialized screenings which are covered through the Student Health Fee. If you or someone you know experiences any of the above mental health concerns, it is strongly encouraged to contact or visit any of the University’s resources provided below. Getting help is a smart and courageous thing to do for yourself and for those who care about you.

  • Counseling Center (217) 333-3704

  • McKinley Health Center (217) 333-2700

  • National Suicide Prevention Lifeline (800) 273-8255

  • Rosecrance Crisis Line (217) 359-4141 (available 24/7, 365 days a year)

If you are in immediate danger, call 911 *This statement is approved by the University of Illinois Counseling Center.

Students with Disabilities#

To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to https://www.disability.illinois.edu. If you are concerned you have a disability-related condition that is impacting your academic progress, there are academic screening appointments available that can help diagnosis a previously undiagnosed disability. You may access these by visiting the DRES website and selecting “Request an Academic Screening” at the bottom of the page.

Resources#

Useful references#

Quick guides#

Tools#

Git and GitHub#

Project Jupyter#

Acknowledgements#

This course was developed by Mark Neubauer. It was first taught by Mark Neubauer during the Spring 2024 semester.