Skip to contents

Calculates the Gini coefficient, a measure of inequality or concentration, for a sigverse signature or catalogue. It quantifies how unevenly the mutation probability mass is distributed across contexts:

Usage

sig_gini(signature)

Arguments

signature

A sigverse signature or catalogue data.frame.

Value

A numeric value between 0 and 1 representing the Gini coefficient.

Details

  • 0: perfectly uniform distribution (e.g. all 96 contexts equal)

  • 1: total concentration in a single context

The Gini coefficient complements entropy-based measures like the Shannon index by capturing the inequality of the distribution, rather than its uncertainty or diversity.

This function uses the unbiased version of the Gini coefficient, scaled by K / (K - 1) where K is the number of mutation contexts. This adjustment:

  • Ensures the Gini ranges from 0 to 1 for any number of contexts

  • Makes the measure comparable across signatures with different numbers of mutation types

  • Is appropriate for full probability distributions (as in mutation signatures)

Examples

library(sigstash)
sigs <- sig_load("COSMIC_v3.3.1_SBS_GRCh38")

sig_gini(sigs[["SBS1"]])  # moderately peaked
#> [1] 0.9480089
sig_gini(sigs[["SBS48"]]) # highly peaked, close to 1
#> [1] 0.9780847