Quantify model correctness for signature deconvolution

Computes a suite of evaluation metrics for mutational signature fitting. This function compares an "observed" model (from signature fitting) to the "truth" (the true underlying signature contributions) and reports standard accuracy metrics.

Usage

sig_model_correctness(observed, truth, all_signatures = NULL, validate = TRUE)

Arguments

observed: Numeric named vector. The fitted model (signature weights). Names must be signature IDs; values are fractional contributions (typically sum to 1). See sigshared::example_model().
truth: Numeric named vector. The true signature contributions; same format as observed.
all_signatures: Optional character vector. The complete set of possible signature IDs to be evaluated. If NULL, inferred as the union of names from observed and truth.
validate: Logical. If TRUE (default), input vectors are checked for correct formatting, and expanded/reordered as needed. If FALSE, user must ensure observed and truth are already aligned and complete.

Value

Named list of metrics quantifying model correctness (see Details).

Details

The function returns a named list with:

fitting_error: Sum of absolute differences between observed and truth, divided by 2 (range: 0–1).
RMSE: Root mean squared error between observed and truth.
n_false_positives: Number of signatures called present in observed but not present in truth.
n_false_negatives: Number of signatures present in truth but missed in observed.
n_true_positives: Number of signatures correctly called as present.
n_true_negatives: Number of signatures correctly called as absent.
total_false_positive_contributions: Sum of weights assigned to false positive signatures.
precision: Proportion of detected signatures that are truly present (TP / (TP + FP)).
recall: Proportion of true signatures that are detected (TP / (TP + FN)).
specificity: Proportion of truly absent signatures that are not detected (TN / (TN + FP)).
mathews_correlation_coeff: Matthews Correlation Coefficient (MCC), a balanced measure even for imbalanced classes.
f1: F1 score, the harmonic mean of precision and recall.
balanced_accuracy: Average of recall and specificity.

For metric calculation, the observed and truth vectors are expanded and reordered (if needed) to include all signatures in all_signatures. Any signatures not present in observed or truth are assumed to have zero contribution.

Presence/absence for each signature is defined as "present" if weight > 0, and "absent" if weight == 0.

For each signature, possible outcomes are:

True Positive (TP): Observed > 0 and Truth > 0 (signature fitted and truly present).
False Positive (FP): Observed > 0 and Truth == 0 (signature fitted but not truly present).
False Negative (FN): Observed == 0 and Truth > 0 (signature missed by fitting but truly present).
True Negative (TN): Observed == 0 and Truth == 0 (signature correctly called absent).

Counts of TP, FP, TN, and FN are used for all classification metrics. Division-by-zero results in NA for undefined metrics.

Precision, recall, F1, and MCC may not be comparable across datasets with different class balances.

Examples

observed <- c(SBS1 = 0.7, SBS5 = 0.3, SBS18 = 0)
truth <- c(SBS1 = 0.6, SBS5 = 0.4, SBS18 = 0)
sig_model_correctness(observed, truth)
#> $fitting_error
#> [1] 0.1
#> 
#> $RMSE
#> [1] 0.08164966
#> 
#> $n_false_positives
#> [1] 0
#> 
#> $n_true_positives
#> [1] 2
#> 
#> $n_false_negatives
#> [1] 0
#> 
#> $n_true_negatives
#> [1] 1
#> 
#> $total_false_positive_contributions
#> [1] 0
#> 
#> $precision
#> [1] 1
#> 
#> $recall
#> [1] 1
#> 
#> $specificity
#> [1] 1
#> 
#> $mathews_correlation_coeff
#> [1] 1
#> 
#> $f1
#> [1] 1
#> 
#> $balanced_accuracy
#> [1] 1
#>

Usage

Arguments

Value

Details

See also

Examples