Quickstart guide (GitHub)

Welcome to SuperNNova!

This is a quick start guide so you can start testing our framework. If you want to install SuperNNova as a module, please take a look at Quickstart guide (pip).

Installation

Clone the GitHub repository

git clone https://github.com/supernnova/supernnova.git

Setup your environment. 3 options

Please beware that SuperNNova only runs properly in Unix systems (Linux, MacOS).
  1. Create a docker image: Docker .

  2. Create a conda virtual env Environment configuration .

  1. Use misenvironment Mise .
    1. Install packages manually. Inspect env/conda_env.yml (or env/conda_gpu_env.yml when using cuda) and pyproject.toml for the list of packages we use.

Verify installation

This package provides its own bash command snn. Once the installation is completed successfully, you should be able to run the following line in the terminal:

snn --help
Usage: snn <command> <options> <arguments>

Available commands:

    make_data        create dataset for ML training
    train_rnn        train RNN model
    validate_rnn     validate RNN model
    show             vitualize different types of plot
    performance      get method performance and paper plots

Type snn <command> --help for usage help on a specific command.
For example, snn make_data --help will list all data creation options.

where make_data, train_rnn, validate_rnn, show and performance are the sub-commands.

Usage

For quick tests, a database that contains a limited number of light-curves is provided. It is located in tests/raw. For more information on the available data, check Data walkthrough.

Using command line

Build the database

snn make_data --dump_dir tests/dump --raw_dir tests/raw

an additional argument --fits_dir tests/fits can provide a SALT2 fits file for random forest training and interpretation.

Train an RNN

snn train_rnn --dump_dir tests/dump

With this command you are training and validating our Baseline RNN with the test database and generating test lightcurves as well. The trained model will be saved in a newly created model folder inside tests/dump/models.

The model folder has been named as follows: vanilla_S_0_CLF_2_R_None_photometry_DF_1.0_N_global_lstm_32x2_0.05_128_True_mean (See below for the naming conventions). This folder’s contents are:

  • saved model (*.pt): PyTorch RNN model.

  • statistics (METRICS*.pickle): pickled Pandas DataFrame with accuracy and other performance statistics for this model.

  • predictions (PRED*.pickle): pickled Pandas DataFrame with the predictions of our model on the test set.

  • figures (train_and_val_*.png): figures showing the evolution of the chosen metric at each training step.

Remember that our data is split in training, validation and test sets.

The test light-curves and their predictions can be inspected in tests/dump/lightcurves

You have trained, validated and tested your model.

Using Yaml

You can also save arguments of options in an YAML file, and load it:

snn <command> --config_file <yaml file>

Example YAML files can be found in the folder configs_yml, where classify.yml is an example of classification using existing model.

Notice: you can include options for different sub-commands in the same YAML file.

Build the database

snn make_data --config_file configs_yml/default.yml

Train an RNN

snn train_rnn --config_file configs_yml/default.yml

You can also update option specified in the YAML file by using command-line option:

snn make_data --config_file configs_yml/simple.yml --dump_dir tests/dump2
# or
snn make_data --dump_dir tests/dump2 --config_file configs_yml/simple.yml

The data will be dumpped to tests/dump2 instead of tests/dump specified in config_yml/simple.yml.

Notice: adding command-line options will update the arguments at runtime, not change the YAML file itself.

Reproduce SuperNNova paper

To reproduce the results of the paper please use the branch paper and run:

cd SuperNNova && python run_paper.py --debug --dump_dir tests/dump

--debug will train simplified models with a reduced number of epochs. Remove this flag for full reproducibility. With the --debug flag on, this should take between 15 and 30 minutes on the CPU.

Naming conventions

  • vanilla/variational/bayesian: The type of RNN to be trained. variational and bayesian are bayesian recurrent networks

  • S_0: seed used for training. Default is 0.

  • CLF_2: number of targets to be used in classification. This case uses two classes: type Ia supernovae vs. all others.

  • R_None: host-galaxy redshift provided. Options: zpho (photometric) or zspe (spectroscopic)

  • photometry: data used. In our database we split light-curves that have a succesful SALT2 fit (saltfit) and the complete dataset (photometry).

  • DF_1.0: data fraction used in training. With large datasets it is usefult to test training with a fraction of the available training set. In this case we use the whole dataset (1.0).

  • N_global: normalization used. Default: global.

  • lstm: type of layer used. Default lstm.

  • 32x2: hidden layer dimension x number the layers.

  • 0.05: dropout value.

  • 128: batch size.

  • True: if this model is bidirectional.

  • mean: output option. mean is mean pooling.

The naming convention is defined in python/supernnova/conf.py.