Validation documentation

Validation RandomForest

supernnova.validation.validate_randomforest.get_predictions(settings, model_file=None)[source]

Test random forest models on independent test set

Features are stored in a .FITRES file found in data_dir Use predefined splits to select test set Save predicted target and probabilities to preds_dir

Parameters:

settings (ExperimentSettings) – custom class to hold hyperparameters
model_file (str) – path to saved randomforest model

Validation RNN

supernnova.validation.validate_rnn.find_idx(array, value)[source]

Utility to find the index of the element of array that most closely matches value

Parameters:

array (np.array) – The array in which to search
value (float) – The value for which we are looking for a match

Returns:

(int) the index of of the element of array that most closely matches value

supernnova.validation.validate_rnn.get_batch_predictions(rnn, X, target)[source]

Utility to obtain predictions for a given batch

Parameters:

rnn (torch.nn) – The RNN model
X (torch.Tensor) – The batch on which to carry out predictions
target (torch.longTensor) – The true class of each element in the batch

Returns:

Tuple containing

arr_preds (np.array): predictions
arr_target (np.array): actual targets

supernnova.validation.validate_rnn.get_batch_predictions_MFE(rnn, X, target)[source]

Utility to obtain predictions for a given batch

Parameters:

rnn (torch.nn) – The RNN model
X (torch.Tensor) – The batch on which to carry out predictions
target (torch.longTensor) – The true class of each element in the batch

Returns:

Tuple containing

arr_preds (np.array): predictions
arr_target (np.array): actual targets

supernnova.validation.validate_rnn.get_predictions(settings, model_file=None)[source]

Obtain predictions for a given RNN model specified by the settings argument or alternatively, by a model_file

Models are benchmarked on the test data set
Batch size can be controled to speed up predictions
For Bayesian models, multiple predictions are carried to
obtain a distribution of predictions
Predictions are computed for full lightcurves, and around the peak light
Predictions are saved to a pickle file (for faster loading)

Parameters:

settings (ExperimentSettings) – custom class to hold hyperparameters
model_file (str) – Path to saved model weights. Default: None

supernnova.validation.validate_rnn.get_predictions_for_speed_benchmark(settings)[source]

Test RNN models inference speed

Models are benchmarked on the test data set
Batch size can be controled to speed up predictions
For Bayesian models, multiple predictions are carried to
obtain a distribution of predictions
Results are saved to a .csv for future use

Parameters:: settings (ExperimentSettings) – custom class to hold hyperparameters

Metrics

supernnova.validation.metrics.aggregate_metrics(settings)[source]

Aggregate all pre-computed METRICS files into a single dataframe for analysis

Save a csv dataframe aggregating all the metrics

Parameters:: settings (ExperimentSettings) – custom class to hold hyperparameters

supernnova.validation.metrics.get_metrics_singlemodel(settings, prediction_file=None, model_type='rnn')[source]

Launch computation of all evaluation metrics for a given model, specified by the settings object or by a model file

Save a pickled dataframe (we pickle because we’re saving numpy arrays, which are not easily savable with the to_csv method).

Parameters:

settings (ExperimentSettings) – custom class to hold hyperparameters
prediction_file (str) – Path to saved predictions. Default: None
model_type (str) – Choose rnn or randomforest

Returns:

(pandas.DataFrame) holds the performance metrics for this dataframe

supernnova.validation.metrics.get_rnn_performance_metrics_singlemodel(settings, df, host_zspe_list)[source]

Compute performance metrics (accuracy, AUC, purity etc) for an RNN model

Compute metrics around peak light (i.e. PEAKMJD) and for the full lightcurve.
For bayesian models, compute multiple predictions per lightcurve and then take the median

Parameters:

settings (ExperimentSettings) – custom class to hold hyperparameters
df (pandas.DataFrame) – dataframe containing a model’s predictions
host_zspe_list (list) – available host galaxy spectroscopic redshifts

Returns:

(pandas.DataFrame) holds the performance metrics for this dataframe

supernnova.validation.metrics.get_randomforest_performance_metrics_singlemodel(settings, df, host_zspe_list)[source]

Compute performance metrics (accuracy, AUC, purity etc) for a randomforest model

Parameters:

settings (ExperimentSettings) – custom class to hold hyperparameters
df (pandas.DataFrame) – dataframe containing a model’s predictions
host_zspe_list (list) – available host galaxy spectroscopic redshifts

Returns:

(pandas.DataFrame) holds the performance metrics for this dataframe

supernnova.validation.metrics.get_uncertainty_metrics_singlemodel(df)[source]

For any lightcurve, compute the standard deviation of the model’s predictions (this is only valid for bayesian models which yield a distribution of predictions).

Then, compute the mean and std dev of this distribution across all lightcurves A higher mean indicates a model which is less confident in its predictions

Parameters:: df (pandas.DataFrame) – dataframe containing a model’s predictions
Returns:: (pandas.DataFrame) holds the uncertainty metrics for this dataframe

supernnova.validation.metrics.get_entropy_metrics_singlemodel(df, nb_classes)[source]

Compute the entropy of the predictions Low entropy indicates a model that is very confident of its predictions

Parameters:

df (pandas.DataFrame) – dataframe containing a model’s predictions
nb_classes (int) – the number of classes in the classification task

Returns:

(pandas.DataFrame) holds the entropy metrics for this dataframe

supernnova.validation.metrics.get_calibration_metrics_singlemodel(df)[source]

Compute probability calibration dataframe. If the calibration curve is close to identity, the model is considered well-calibrated.

Parameters:: df (pandas.DataFrame) – dataframe containing a model’s predictions
Returns:: (pandas.DataFrame) holds the calibration metrics for this dataframe

supernnova.validation.metrics.get_classification_stats_singlemodel(df, nb_classes)[source]

Find out how many lightcurves are classified in each class

Parameters:

df (pandas.DataFrame) – dataframe containing a model’s predictions
nb_classes (int) – the number of classes in the classification task

Returns:

(pandas.DataFrame) holds the calibration metrics for this dataframe