What is SuperNNova?
SuperNNova is a framework for lightcurve classification which uses supervised learning algorithms. Training of these algorithms rely on large annotated databases. Typically, we use simulations as the training set.
Do you have a paper describing SuperNNova? How can I cite you?
How can I install it?
What data do I need?
You only need lightcurves (photometric time-series) to use SuperNNova. Additional information can be added as well. E.g. we used supernova host-galaxy redshifts in the paper.
Is the data used in the paper publicly available?
Yes it is! SuperNNovaSimulations
We want to foster reproducibility so you can copy the data and reproduce all our experiments with
run_paper.py in the
paper branch. Beware, it will take while!
How did you create the simulations used in the paper?
We used SNANA to generate the supernovae lightcurves. Our data is similar to the Supernova Photometric Classification Challenge (SPCC) data with updated models used in the DES simulations.
Why use SuperNNova?
First, it is open source, so you can modify it for your science goal or just see for yourself what is the “blackbox”. Second, we have pretty good performance. Third, we also provide Bayesian interpretations of RNN which allow better uncertainty handling, which is useful for cosmological or any statistical analyses.
Can I use SuperNNova for my classification problem?
Please do! But beware: you need to have a large amount of lightcurves (simulated or data) per type of event you are trying to classify, otherwise performance is pretty poor.
How can I use SuperNNova for my classification problem?
It may require a little bit of code modification depending on your data. You can load data from SNANA formats (
FITRES, the latter is an ascii file) or
.csv files (like the one from the Kaggle challenge, PlastiCC). Observations are grouped per night, so if you are looking for fast transients, you may need to create your own data pipeline or modify SuperNNova time grouping. Contact us if you have questions firstname.lastname@example.org and please report any issues!
What algorithms are available for classification?
Currently we have a Baseline RNN and two Bayesian RNNs. The Bayesian RNNs are based on the work of Fortunato et al 2017 and Gal et Ghahramani 2015 and allow us to estimate prediction uncertainty. These algorithms require only raw lightcurve data. We have also a Random Forest classifier that relies lightcurve features. You can obtain these with fitters: an exponential that rises and falls or a type Ia supernova SALT2 fits.
Why is training slow ?
If you have a GPU, you can activate training on GPU with the
Alternatively, you may select a smaller data fraction
--data_fraction 0.1 to train on a smaller set.
OSError: Unable to open file (unable to open file: name = ‘/home/snndump/processed/DES_database.h5’
You have probably forgotten to set your
dump_dir correctly. Provide the
--dump_dir argument correctly.
Where do I find the model naming scheme?
You can find it in
model_name. A start guide can be found in our Quickstart guide (GitHub).
How do I change the directory where the data can be found?
You can give add to your terminal command
--dump_dir foldername. This folder should have the same structure as our data repositories (see Data documentation).
If I trained several models, is there a way to see a summary of the statistics?
Yes, you need to call
python run.py --performance. It will be created in
summary_stats.csv. It will compute various metrics which can be averaged over multiple random seeds. By default, this command will also generate all statistics (latex tables as well printout stats) and plots featured in our SuperNNova paper. To deactivate this, just comment in
run.py the two lines below
# Stats and plots in paper.