Searching for Exotic Particles#
Overview#
A number of theories that propose to explain what happened in the very early universe (the first small fraction of a second) and link elementary particle physics and cosmology predict the existence of exotic particles that have yet to be discovered. IF these particles exist, they could contribute significantly to the dark matter in the universe and/or explain other puzzles in particle physics.
Data Sources#
Original Source
https://archive.ics.uci.edu/ml/datasets/HEPMASS (top-level description)
File URLs
Questions#
Question 01#
What is the Large Hadron Collider (LHC)? What is it about the LHC that makes it possible to produce heavy particles like the Higgs boson?
Question 02#
The Higgs boson is the last particle in the SM to be discovered and completes the constituent picture of that theory in the SM. In what way(s) does the Higgs boson play a particularly important role in the SM?
Question 03#
Briefly describe the ATLAS and CMS experiments that collect proton-proton collision data at the LHC to study the Higgs boson.
Question 04#
Based on ref [1], can you describe what the exotic particles in the benchmark models (a) HIGGS and (b) SUSY are why they would be important for fundamental physics?
The remaining questions refer to the following data source: https://archive.ics.uci.edu/ml/machine-learning-databases/00347 (also linked from above) Machine learning is used in high-energy physics experiments to search for the signatures of exotic particles. These signatures are learned from Monte Carlo simulations of the collisions that produce these particles and the resulting decay products. In each of the three data sets from the data source, the goal is to separate particle-producing collisions from a background source.
The mass of the new particle is unknown, so three separate data sets are provided. In each data set, 50% of the data is from a signal process, while 50% is from the background process. The data is separated into a training set of 7 million examples and a test set of 3.5 million for each.
In the
1000
dataset, the signal particle has mass=1000. (Note: this dataset does not include a mass feature since all signal examples have the same mass.)In the
not1000
dataset, the signal particle’s mass is drawn uniformly from the set {500, 750, 1250, 1500}. The mass is included as an input feature; for the background examples, the mass is selected randomly from this same set.
Download the not1000_train.csv.gz
and 1000_training.csv.gz
files from the data source
Question 05#
What is the size and shape of each data set?
Question 06#
The data set’s first column is the class label (1 for signal, 0 for background), followed by the 27 normalized features (22 low-level features then 5 high-level features), and a 28th mass feature for dataset not1000
. See the original paper (ref [2]) for more detailed information. Can you explain what those normalized features are?
Question 07#
In the 1000
data set, can you draw the histogram of 27 normalized features for signal and background separately? Can you describe the significant differences between these histograms?
Question 08#
Do the same data process as Q4 for not1000
data set.
Question 09#
What difference do you find from not1000
data set and 1000
data set ?
Question 10#
The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the LHC accelerator. The last seven features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes. When you read through the reference paper [2], what particle properties do those 28 features represent?
Question 11#
Using the data sets in this project, can you draw the histogram of 28 normalized features for signal and background separately? Could you tell the significant differences from these histogram?
Question 12#
Implement and train a shallow neural network (NN) described in ref [1] for one of the exotic particle hypotheses. You should implement a NN one that makes the training time manageable, like one of the shallow networks with hyperparameters shown in Table 2 of ref [1] or even a smaller network. Can you generate and show the NN classification outputs for this network?
References#
[1] P.J. Sadowski, D. Whiteson, P. Baldi, “Searching for Exotic Particles in High-Energy Physics with Deep Learning”, Nature Commun. 5 (2014) 4308, e-Print: 1402.4735 [hep-ph]
[2] P. Baldi, K. Cranmer, T. Faucett, P. Sadowski, D. Whiteson, “Parameterized Machine Learning for High-Energy Physics”, Eur.Phys.J.C 76 (2016) 5, 235, e-Print: 1601.07913 [hep-ex]
Acknowledgements#
Initial version: Mark Neubauer
© Copyright 2024