Skip to contents

Calculates the Kullback–Leibler (KL) divergence between a mutational signature and a uniform distribution. KL divergence quantifies how much the observed mutation context distribution (the signature) deviates from an equal-weight, flat profile across all contexts.

Usage

sig_kl_divergence(signature, base = exp(1), pseudocount = 1e-12)

Arguments

signature

A sigverse signature data.frame. Must contain a fraction column.

base

The logarithmic base to use (default is natural log, exp(1)). Use 2 for bits.

pseudocount

A small positive number added to each term to prevent log(0).

Value

A single numeric value representing the KL divergence from uniform.

Details

A value of 0 indicates a perfectly uniform signature. Higher values indicate more peaked or biased signatures. KL divergence is commonly used as a measure of "non-uniformity" or "distinctiveness" of mutation profiles.

A small pseudocount is added to avoid taking the log of zero when any context has zero weight.

Examples

library(sigstash)
signatures <- sig_load("COSMIC_v3.3.1_SBS_GRCh38")
sbs3 <- signatures[["SBS3"]]

# Compute KL divergence (how far is SBS3 from flat?)
sig_kl_divergence(sbs3)
#> [1] 0.1785939

# Use base 2 (bits)
sig_kl_divergence(sbs3, base = 2)
#> [1] 0.2576565