Skip to contents

Computes a suite of evaluation metrics for mutational signature fitting. This function compares an "observed" model (from signature fitting) to the "truth" (the true underlying signature contributions) and reports standard accuracy metrics.

Usage

sig_model_correctness(observed, truth, all_signatures = NULL, validate = TRUE)

Arguments

observed

Numeric named vector. The fitted model (signature weights). Names must be signature IDs; values are fractional contributions (typically sum to 1). See sigshared::example_model().

truth

Numeric named vector. The true signature contributions; same format as observed.

all_signatures

Optional character vector. The complete set of possible signature IDs to be evaluated. If NULL, inferred as the union of names from observed and truth.

validate

Logical. If TRUE (default), input vectors are checked for correct formatting, and expanded/reordered as needed. If FALSE, user must ensure observed and truth are already aligned and complete.

Value

Named list of metrics quantifying model correctness (see Details).

Details

The function returns a named list with:

fitting_error

Sum of absolute differences between observed and truth, divided by 2 (range: 0–1).

RMSE

Root mean squared error between observed and truth.

n_false_positives

Number of signatures called present in observed but not present in truth.

n_false_negatives

Number of signatures present in truth but missed in observed.

n_true_positives

Number of signatures correctly called as present.

n_true_negatives

Number of signatures correctly called as absent.

total_false_positive_contributions

Sum of weights assigned to false positive signatures.

precision

Proportion of detected signatures that are truly present (TP / (TP + FP)).

recall

Proportion of true signatures that are detected (TP / (TP + FN)).

specificity

Proportion of truly absent signatures that are not detected (TN / (TN + FP)).

mathews_correlation_coeff

Matthews Correlation Coefficient (MCC), a balanced measure even for imbalanced classes.

f1

F1 score, the harmonic mean of precision and recall.

balanced_accuracy

Average of recall and specificity.

For metric calculation, the observed and truth vectors are expanded and reordered (if needed) to include all signatures in all_signatures. Any signatures not present in observed or truth are assumed to have zero contribution.

Presence/absence for each signature is defined as "present" if weight > 0, and "absent" if weight == 0.

For each signature, possible outcomes are:

True Positive (TP)

Observed > 0 and Truth > 0 (signature fitted and truly present).

False Positive (FP)

Observed > 0 and Truth == 0 (signature fitted but not truly present).

False Negative (FN)

Observed == 0 and Truth > 0 (signature missed by fitting but truly present).

True Negative (TN)

Observed == 0 and Truth == 0 (signature correctly called absent).

Counts of TP, FP, TN, and FN are used for all classification metrics. Division-by-zero results in NA for undefined metrics.

Precision, recall, F1, and MCC may not be comparable across datasets with different class balances.

Examples

observed <- c(SBS1 = 0.7, SBS5 = 0.3, SBS18 = 0)
truth <- c(SBS1 = 0.6, SBS5 = 0.4, SBS18 = 0)
sig_model_correctness(observed, truth)
#> $fitting_error
#> [1] 0.1
#> 
#> $RMSE
#> [1] 0.08164966
#> 
#> $n_false_positives
#> [1] 0
#> 
#> $n_true_positives
#> [1] 2
#> 
#> $n_false_negatives
#> [1] 0
#> 
#> $n_true_negatives
#> [1] 1
#> 
#> $total_false_positive_contributions
#> [1] 0
#> 
#> $precision
#> [1] 1
#> 
#> $recall
#> [1] 1
#> 
#> $specificity
#> [1] 1
#> 
#> $mathews_correlation_coeff
#> [1] 1
#> 
#> $f1
#> [1] 1
#> 
#> $balanced_accuracy
#> [1] 1
#>