Dark Energy Survey#

https://raw.githubusercontent.com/illinois-ipaml/MachineLearningForPhysics/main/img/Project_DES-Figure.jpg

Overview#

The standard model of the universe, the \(Λ\)-CDM model, estimates that around 70% of the total energy-matter density of the universe is in the form of dark energy. The primary effect of dark energy is to acceleratingly expand the universe, hence measuring it accurately is vital to our understanding of the universe.

The Dark Energy Survey (DES) is an internationally collaborative large sky survey to map hundreds of millions of galaxies, detect thousands of supernovae, and find patterns of cosmic structure to better understand dark energy and the measure the cosmic acceleration of our expanding universe with high precision.

The instrument used by DES is the Dark Energy Camera (DECam), which is mounted on the Victor M. Blanco 4-meter Telescope at the Cerro Tololo Inter-American Observatory (CTIO) in the Chilean Andes. DECam has several major components: five optical lenses, a Hexapod positioning and alignment system, a shutter, a set of color filters, and a digital imager.

Data Sources#

This project is based on the Data Release 1 (DR1) of the DES. We will use the des_dr1.main table directly from NOIRLAB’s Astro Data Lab server which catalogs the DES data releases.

NOIRLAB’s Astro Data Lab: https://datalab.noirlab.edu/des/access.php

Importing and Setup#

You will need to install and import the astro-datalab module for data acquisition.

Please note, you will have to restart runtime to run your code after pip installing the astro-datalab module.

!pip install --ignore-installed --no-cache-dir astro-datalab
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn import cluster
%matplotlib inline

# Data Lab imports
from dl import queryClient as qc, storeClient as sc
from dl.helpers.utils import convert

Questions#

Question 01#

What is the Dark Energy Survey? What are it’s main goals? What part of the night sky does the DES cover?

Question 02#

Read the DES DR1 paper [1]. What are the four completementary measurements that DES does to probe the universe? To achieve these goals, what surveys in which wave bands does DES conduct?

Question 03#

What is variable or feature of the data does DES use to classify its’ objects morphologically (stars or galaxies), as described in the DR1 paper [1]

Accessing Data from Data Lab#

Below is a sample query statement to acquire the relevant data of the des_dr1.main table from Astro Data Lab [2] .

# Write query statement (sql) as a string
# NOTE: triple quotes allow us to break the string on multiple lines
query = """
        SELECT mag_auto_g as gmag, mag_auto_r as rmag, mag_auto_z as zmag,
               mag_auto_i as imag, mag_auto_y as ymag,
               kron_radius, spread_model_g, class_star_g, spread_model_r, class_star_r,
               spread_model_z, class_star_z,
               snr_g, snr_r, snr_z, ra, dec
        FROM des_dr1.main
        WHERE (fluxerr_auto_g>0 and fluxerr_auto_r>0 and fluxerr_auto_z>0)
        LIMIT 1000"""

# mag_auto_g,r,i,z,y = AB magnitudes in DECam g,r,i,z,y bands
# kron_radius        = Kron radius from SExtractor (pixels)
# spread_model_g,r,z = star/galaxy classifier quantifying light profile relative to PSF
# class_star_g,r,z   = star/extended source classifier (from 0 to 1)
# snr_g,r,z          = computed signal-to-noise ratios (S/N) in g,r,z bands
# ra,dec             = celestial coordinates
#
# WHERE: requirement that error>0 (to avoid dividing by zero) in g,r,z bands
# LIMIT: returns 1,000 rows that satisfy the query
# Execute the query synchronously, if short (e.g., LIMIT<=300,000)
# We then convert the returned result, a CSV-formatted string, to a Pandas data frame,
# making sure that any potential 'Infinity' values are converted to NaN.
sample_res = qc.query(sql=query)
sample_df = convert(sample_res,'pandas',na_values=('Infinity','-Infinity'))

Question 04#

Write a query to get the following data from the des_dr1.main table:

  • mag_auto_g,r,i,z,y = AB magnitudes in DECam g,r,i,z,y bands

  • kron_radius = Kron radius from SExtractor (pixels)

  • spread_model_g,r,z = star/galaxy classifier quantifying light profile relative to PSF

  • class_star_g,r,z = star/extended source classifier (from 0 to 1)

  • snr_g,r,z = computed signal-to-noise ratios (S/N) in g,r,z bands

  • ra,dec = celestial coordinates

Apply the following conditions and limit your query results to 100,000 samples:

  • fluxerr_auto_X > 0 for X in g, r, and z bands

  • -50 <= snr_X <= 50 for X in g, r, and z bands

Convert the acquired data into a Pandas data frame.

Question 05#

The images produced by DES are used to distinguish between stars (point sources), galaxies (resolved, if not too distant), and QSOs (which may have both point-like and resolved components).

  • kron_radius: Kron radius in pixels

  • spread_model_X: star-galaxy classifier comparing extended model to PSF model. Value is given for each band X = g, r, i, z, y

  • class_star_X: value ranging from 0.0 (not point-like) to 1.0 (point-like), for each band X = g, r, i, z, y

The value of spread_model_X is near zero for a point source (star or QSO), positive for an extended source (galaxy), negative for an artifact smaller than the PSF (e.g., bad pixel or cosmic ray).

Let’s define an object as a star if spread_model_g<=0.5, and a galaxy if spread_model_g > 0.5. In our sample data, count the number of objects that are stars and the number of objects that are galaxies.

Question 06#

Suppose we wish to divide the imaging data observed by the DES into roughly equal subsets of data for more efficient parallel data processing and analysis. We will do so by using K-Means clustering on the data points of the area of the night sky observed by DES.

In astronomy, RA (right ascension) and Dec (declination) are the coordinates on the sky that correspond to longitude and latitude on Earth and are used to describe the position of an object. Define a subset of the data with two columns: ra and dec. Using K-Means clustering, fit the data into 2 distinct clusters.

Here, each cluster will represent a subset of data, so we want the number data points of all clusters to be roughly the same. Display the fit using the helper function display(). Is the result what you expected? Is this a suitable method to equally divide our data?

def display(data, fit):
    n_clusters = len(np.unique(fit.labels_))
    # Pick good colors to distinguish the different clusters.
    import matplotlib.colors
    cmap = matplotlib.colors.ListedColormap(
        sns.color_palette("husl", n_clusters).as_hex())
    plt.scatter(data.iloc[:, 0], data.iloc[:, 1], s=5, c=fit.labels_, cmap=cmap)
    # Use standard axes to match the plot above.
#    plt.xlim(0, 360)
    plt.ylim(-90, +90)
    plt.xlabel('RA [degrees]')
    plt.ylabel('Dec [degrees]')
    plt.gca().set_aspect(1.)

Question 07#

Modify the ra column such that if ra >= 180 deg, we subtract 360 deg from ra for our observed sky map to be continuous. Now, change the number of clusters so that each cluster covers roughly equal area of the observed sky? For what n_clusters does the K-means clustering make the most sense to you?

However, you shouldn’t increase n_clusters to an arbitrarily large number just to get equally covered area of the observed sky - each cluster would represent a subset of data we use for parallel processing, and requiring more processors is not always feasible or sensible.

References#

Acknowledgements#

  • Initial version: Ferzem Khan with some guidance from Mark Neubauer

© Copyright 2024