Skip to contents

Calculates a panel of descriptive statistics for each signature in a sigverse collection. These metrics quantify different aspects of a mutational signature's shape — such as inequality, diversity, concentration, and sparsity - and are useful for comparing how "flat", "focal", or "distinctive" different signatures are.

Usage

sig_collection_stats(signatures)

Arguments

signatures

A sigverse signature collection (named list of signature data.frames).

Value

A data.frame with one row per signature and columns for each computed metric.

Details

For each signature, the following metrics are reported:

  • gini: Measures inequality (0 = perfectly flat; 1 = all weight in one context).

  • shannon_index: Entropy of the distribution (higher = more uncertain/diverse).

  • shannon_index_exp: Effective number of active contexts (e.g., 96 = flat, 1 = peaked).

  • shannon_index_exp_scaled: Fraction of maximum possible diversity (0–1 scale).

  • kl_divergence_from_uniform: Divergence from a uniform (flat) distribution.

  • l1_norm: Total absolute weight (larger = more mass in fewer contexts).

  • l2_norm: Magnitude of the vector; emphasizes focal peaks.

  • l3_norm: Amplifies concentration even more than L2.

  • l0_norm: Number of non-zero contexts (also known as the L0 "norm"). This is not a true mathematical norm but is commonly used as a measure of sparsity — how many mutation channels contribute at all. A value of 0 means the signature is completely empty; a higher value indicates more active contexts.

  • *_scaled variants: Norms divided by number of contexts to allow cross-signature comparisons.

  • max_channel_fraction: Highest single-context weight (equivalent to the infinity norm).

This function is optimised for speed (is faster than computing each norm independently for each signature) and returns a data.frame with one row per signature and columns for each computed metric.

Examples

library(sigstash)
signatures <- sig_load("COSMIC_v3.3.1_SBS_GRCh38")

# Compute statistics for all signatures
stats <- sig_collection_stats(signatures)
head(stats)
#>     id      gini shannon_index shannon_index_exp kl_divergence_from_uniform
#> 1 SBS1 0.9480089      1.856082          6.398621                  2.7082657
#> 2 SBS2 0.9798792      1.218777          3.383048                  3.3455711
#> 3 SBS3 0.3268209      4.385754         80.298771                  0.1785939
#> 4 SBS4 0.6456680      3.809528         45.129134                  0.7548202
#> 5 SBS5 0.4063016      4.296474         73.440415                  0.2678738
#> 6 SBS6 0.8851745      2.718273         15.154126                  1.8460754
#>      l3_norm   l2_norm l1_norm l0_norm max_channel_fraction
#> 1 0.41142780 0.4844887       1      96           0.37062390
#> 2 0.56614330 0.6231289       1      96           0.53513019
#> 3 0.06042455 0.1174484       1      96           0.02499156
#> 4 0.12333330 0.1852696       1      96           0.08026888
#> 5 0.07475926 0.1306525       1      96           0.04597922
#> 6 0.23115412 0.3114406       1      96           0.17879063
#>   shannon_index_exp_scaled l3_norm_scaled l2_norm_scaled l1_norm_scaled
#> 1               0.06665230   0.0042857062    0.005046757     0.01041667
#> 2               0.03524008   0.0058973261    0.006490926     0.01041667
#> 3               0.83644553   0.0006294224    0.001223421     0.01041667
#> 4               0.47009514   0.0012847218    0.001929892     0.01041667
#> 5               0.76500432   0.0007787423    0.001360964     0.01041667
#> 6               0.15785548   0.0024078554    0.003244173     0.01041667
#>   l0_norm_scaled
#> 1              1
#> 2              1
#> 3              1
#> 4              1
#> 5              1
#> 6              1

# Examine metrics for a single signature
stats[stats$id == "SBS1", ]
#>     id      gini shannon_index shannon_index_exp kl_divergence_from_uniform
#> 1 SBS1 0.9480089      1.856082          6.398621                   2.708266
#>     l3_norm   l2_norm l1_norm l0_norm max_channel_fraction
#> 1 0.4114278 0.4844887       1      96            0.3706239
#>   shannon_index_exp_scaled l3_norm_scaled l2_norm_scaled l1_norm_scaled
#> 1                0.0666523    0.004285706    0.005046757     0.01041667
#>   l0_norm_scaled
#> 1              1