Compute Summary Statistics for a Signature Collection
Source:R/statistics.R
sig_collection_stats.Rd
Calculates a panel of descriptive statistics for each signature in a sigverse
collection.
These metrics quantify different aspects of a mutational signature's shape — such as inequality,
diversity, concentration, and sparsity - and are useful for comparing how "flat", "focal", or
"distinctive" different signatures are.
Details
For each signature, the following metrics are reported:
gini
: Measures inequality (0 = perfectly flat; 1 = all weight in one context).shannon_index
: Entropy of the distribution (higher = more uncertain/diverse).shannon_index_exp
: Effective number of active contexts (e.g., 96 = flat, 1 = peaked).shannon_index_exp_scaled
: Fraction of maximum possible diversity (0–1 scale).kl_divergence_from_uniform
: Divergence from a uniform (flat) distribution.l1_norm
: Total absolute weight (larger = more mass in fewer contexts).l2_norm
: Magnitude of the vector; emphasizes focal peaks.l3_norm
: Amplifies concentration even more than L2.l0_norm
: Number of non-zero contexts (also known as the L0 "norm"). This is not a true mathematical norm but is commonly used as a measure of sparsity — how many mutation channels contribute at all. A value of 0 means the signature is completely empty; a higher value indicates more active contexts.*_scaled
variants: Norms divided by number of contexts to allow cross-signature comparisons.max_channel_fraction
: Highest single-context weight (equivalent to the infinity norm).
This function is optimised for speed (is faster than computing each norm independently for each signature) and returns a data.frame with one row per signature and columns for each computed metric.
Examples
library(sigstash)
signatures <- sig_load("COSMIC_v3.3.1_SBS_GRCh38")
# Compute statistics for all signatures
stats <- sig_collection_stats(signatures)
head(stats)
#> id gini shannon_index shannon_index_exp kl_divergence_from_uniform
#> 1 SBS1 0.9480089 1.856082 6.398621 2.7082657
#> 2 SBS2 0.9798792 1.218777 3.383048 3.3455711
#> 3 SBS3 0.3268209 4.385754 80.298771 0.1785939
#> 4 SBS4 0.6456680 3.809528 45.129134 0.7548202
#> 5 SBS5 0.4063016 4.296474 73.440415 0.2678738
#> 6 SBS6 0.8851745 2.718273 15.154126 1.8460754
#> l3_norm l2_norm l1_norm l0_norm max_channel_fraction
#> 1 0.41142780 0.4844887 1 96 0.37062390
#> 2 0.56614330 0.6231289 1 96 0.53513019
#> 3 0.06042455 0.1174484 1 96 0.02499156
#> 4 0.12333330 0.1852696 1 96 0.08026888
#> 5 0.07475926 0.1306525 1 96 0.04597922
#> 6 0.23115412 0.3114406 1 96 0.17879063
#> shannon_index_exp_scaled l3_norm_scaled l2_norm_scaled l1_norm_scaled
#> 1 0.06665230 0.0042857062 0.005046757 0.01041667
#> 2 0.03524008 0.0058973261 0.006490926 0.01041667
#> 3 0.83644553 0.0006294224 0.001223421 0.01041667
#> 4 0.47009514 0.0012847218 0.001929892 0.01041667
#> 5 0.76500432 0.0007787423 0.001360964 0.01041667
#> 6 0.15785548 0.0024078554 0.003244173 0.01041667
#> l0_norm_scaled
#> 1 1
#> 2 1
#> 3 1
#> 4 1
#> 5 1
#> 6 1
# Examine metrics for a single signature
stats[stats$id == "SBS1", ]
#> id gini shannon_index shannon_index_exp kl_divergence_from_uniform
#> 1 SBS1 0.9480089 1.856082 6.398621 2.708266
#> l3_norm l2_norm l1_norm l0_norm max_channel_fraction
#> 1 0.4114278 0.4844887 1 96 0.3706239
#> shannon_index_exp_scaled l3_norm_scaled l2_norm_scaled l1_norm_scaled
#> 1 0.0666523 0.004285706 0.005046757 0.01041667
#> l0_norm_scaled
#> 1 1