.. _Start: Quickstart guide (GitHub) ============================ Welcome to SuperNNova! This is a quick start guide so you can start testing our framework. If you want to install SuperNNova as a module, please take a look at :ref:`Start_module`. Installation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Clone the GitHub repository ----------------------------- .. code:: git clone https://github.com/supernnova/supernnova.git Setup your environment. 3 options ----------------------------------- Please beware that SuperNNova only runs properly in Unix systems (Linux, MacOS). a) Create a docker image: :ref:`DockerConfigurations` . b) Create a conda virtual env :ref:`CondaConfigurations` . c) Use misenvironment :ref:`MiseConfigurations` . d) Install packages manually. Inspect ``env/conda_env.yml`` (or ``env/conda_gpu_env.yml`` when using cuda) and ``pyproject.toml`` for the list of packages we use. Verify installation ----------------------------------- This package provides its own bash command ``snn``. Once the installation is completed successfully, you should be able to run the following line in the terminal: .. code-block:: bash snn --help .. code-block:: none Usage: snn Available commands: make_data create dataset for ML training train_rnn train RNN model validate_rnn validate RNN model show vitualize different types of plot performance get method performance and paper plots Type snn --help for usage help on a specific command. For example, snn make_data --help will list all data creation options. where ``make_data``, ``train_rnn``, ``validate_rnn``, ``show`` and ``performance`` are the sub-commands. Usage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For quick tests, a database that contains a limited number of light-curves is provided. It is located in ``tests/raw``. For more information on the available data, check :ref:`DataStructure`. Using command line ----------------------- Build the database .. code-block:: bash snn make_data --dump_dir tests/dump --raw_dir tests/raw an additional argument ``--fits_dir tests/fits`` can provide a SALT2 fits file for random forest training and interpretation. Train an RNN .. code-block:: bash snn train_rnn --dump_dir tests/dump With this command you are training and validating our Baseline RNN with the test database and generating test lightcurves as well. The trained model will be saved in a newly created model folder inside ``tests/dump/models``. The model folder has been named as follows: ``vanilla_S_0_CLF_2_R_None_photometry_DF_1.0_N_global_lstm_32x2_0.05_128_True_mean`` (See below for the naming conventions). This folder's contents are: - **saved model** (``*.pt``): PyTorch RNN model. - **statistics** (``METRICS*.pickle``): pickled Pandas DataFrame with accuracy and other performance statistics for this model. - **predictions** (``PRED*.pickle``): pickled Pandas DataFrame with the predictions of our model on the test set. - **figures** (``train_and_val_*.png``): figures showing the evolution of the chosen metric at each training step. Remember that our data is split in training, validation and test sets. The test light-curves and their predictions can be inspected in ``tests/dump/lightcurves`` **You have trained, validated and tested your model.** .. _UseYaml: Using Yaml ----------------------- You can also save arguments of options in an YAML file, and load it: .. code-block:: bash snn --config_file Example YAML files can be found in the folder ``configs_yml``, where ``classify.yml`` is an example of classification using existing model. **Notice**: you can include options for different sub-commands in the same YAML file. Build the database .. code-block:: bash snn make_data --config_file configs_yml/default.yml Train an RNN .. code-block:: bash snn train_rnn --config_file configs_yml/default.yml You can also update option specified in the YAML file by using command-line option: .. code-block:: bash snn make_data --config_file configs_yml/simple.yml --dump_dir tests/dump2 # or snn make_data --dump_dir tests/dump2 --config_file configs_yml/simple.yml The data will be dumpped to ``tests/dump2`` instead of ``tests/dump`` specified in ``config_yml/simple.yml``. **Notice**: adding command-line options will update the arguments at runtime, not change the YAML file itself. Reproduce SuperNNova paper ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To reproduce the results of the paper please use the branch ``paper`` and run: .. code:: cd SuperNNova && python run_paper.py --debug --dump_dir tests/dump ``--debug`` will train simplified models with a reduced number of epochs. Remove this flag for full reproducibility. With the ``--debug`` flag on, this should take between 15 and 30 minutes on the CPU. Naming conventions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **vanilla/variational/bayesian**: The type of RNN to be trained. ``variational`` and ``bayesian`` are bayesian recurrent networks - **S_0**: seed used for training. Default is 0. - **CLF_2**: number of targets to be used in classification. This case uses two classes: type Ia supernovae vs. all others. - **R_None**: host-galaxy redshift provided. Options: ``zpho`` (photometric) or ``zspe`` (spectroscopic) - **photometry**: data used. In our database we split light-curves that have a succesful SALT2 fit (``saltfit``) and the complete dataset (``photometry``). - **DF_1.0**: data fraction used in training. With large datasets it is usefult to test training with a fraction of the available training set. In this case we use the whole dataset (``1.0``). - **N_global**: normalization used. Default: ``global``. - **lstm**: type of layer used. Default ``lstm``. - **32x2**: hidden layer dimension x number the layers. - **0.05**: dropout value. - **128**: batch size. - **True**: if this model is bidirectional. - **mean**: output option. ``mean`` is mean pooling. The naming convention is defined in ``python/supernnova/conf.py``.