Quantify model correctness for signature deconvolution
Source:R/statistics.R
sig_model_correctness.Rd
Computes a suite of evaluation metrics for mutational signature fitting. This function compares an "observed" model (from signature fitting) to the "truth" (the true underlying signature contributions) and reports standard accuracy metrics.
Arguments
- observed
Numeric named vector. The fitted model (signature weights). Names must be signature IDs; values are fractional contributions (typically sum to 1). See
sigshared::example_model()
.- truth
Numeric named vector. The true signature contributions; same format as
observed
.- all_signatures
Optional character vector. The complete set of possible signature IDs to be evaluated. If
NULL
, inferred as the union of names fromobserved
andtruth
.- validate
Logical. If
TRUE
(default), input vectors are checked for correct formatting, and expanded/reordered as needed. IfFALSE
, user must ensureobserved
andtruth
are already aligned and complete.
Details
The function returns a named list with:
- fitting_error
Sum of absolute differences between observed and truth, divided by 2 (range: 0–1).
- RMSE
Root mean squared error between observed and truth.
- n_false_positives
Number of signatures called present in observed but not present in truth.
- n_false_negatives
Number of signatures present in truth but missed in observed.
- n_true_positives
Number of signatures correctly called as present.
- n_true_negatives
Number of signatures correctly called as absent.
- total_false_positive_contributions
Sum of weights assigned to false positive signatures.
- precision
Proportion of detected signatures that are truly present (TP / (TP + FP)).
- recall
Proportion of true signatures that are detected (TP / (TP + FN)).
- specificity
Proportion of truly absent signatures that are not detected (TN / (TN + FP)).
- mathews_correlation_coeff
Matthews Correlation Coefficient (MCC), a balanced measure even for imbalanced classes.
- f1
F1 score, the harmonic mean of precision and recall.
- balanced_accuracy
Average of recall and specificity.
For metric calculation, the observed and truth vectors are expanded and reordered (if needed) to include all signatures in all_signatures
. Any signatures not present in observed
or truth
are assumed to have zero contribution.
Presence/absence for each signature is defined as "present" if weight > 0, and "absent" if weight == 0.
For each signature, possible outcomes are:
- True Positive (TP)
Observed > 0 and Truth > 0 (signature fitted and truly present).
- False Positive (FP)
Observed > 0 and Truth == 0 (signature fitted but not truly present).
- False Negative (FN)
Observed == 0 and Truth > 0 (signature missed by fitting but truly present).
- True Negative (TN)
Observed == 0 and Truth == 0 (signature correctly called absent).
Counts of TP, FP, TN, and FN are used for all classification metrics. Division-by-zero results in NA
for undefined metrics.
Precision, recall, F1, and MCC may not be comparable across datasets with different class balances.
Examples
observed <- c(SBS1 = 0.7, SBS5 = 0.3, SBS18 = 0)
truth <- c(SBS1 = 0.6, SBS5 = 0.4, SBS18 = 0)
sig_model_correctness(observed, truth)
#> $fitting_error
#> [1] 0.1
#>
#> $RMSE
#> [1] 0.08164966
#>
#> $n_false_positives
#> [1] 0
#>
#> $n_true_positives
#> [1] 2
#>
#> $n_false_negatives
#> [1] 0
#>
#> $n_true_negatives
#> [1] 1
#>
#> $total_false_positive_contributions
#> [1] 0
#>
#> $precision
#> [1] 1
#>
#> $recall
#> [1] 1
#>
#> $specificity
#> [1] 1
#>
#> $mathews_correlation_coeff
#> [1] 1
#>
#> $f1
#> [1] 1
#>
#> $balanced_accuracy
#> [1] 1
#>