Skip to contents

Calculates the Lp distance (also known as the Minkowski distance) between two sigverse signatures or catalogues. This generalizes various distance metrics depending on the choice of p.

Usage

sig_lp_distance(
  signature1,
  signature2,
  p,
  value = c("fraction", "count"),
  scale = FALSE
)

Arguments

signature1, signature2

Two sigverse signature or catalogue data.frames.

p

A numeric value ≥ 0 indicating the order of the Lp norm to compute.

  • p = 0: counts the number of non-zero entries (not a true norm; useful for sparsity).

  • p = 1: Manhattan distance (sum of absolute differences).

  • p = 2: Euclidean (L2) distance.

  • p = ∞: Chebyshev distance (maximum absolute difference).

value

Either "fraction" (default) or "count" — determines which column is used for comparison.

scale

Logical. If TRUE, distance is divided by the number of contexts (i.e. channels).

Value

A non-negative numeric value representing the Lp distance between the two profiles.

Details

This function is useful for flexible distance computations when comparing mutational signatures or catalogues. All channels must match and be in the same order.

By default, distances are computed using raw values. If scale = TRUE, the distance is divided by the number of mutation contexts to allow comparisons across different signature types (e.g. SBS vs DBS).

Examples

library(sigstash)
sigs <- sig_load("COSMIC_v3.3.1_SBS_GRCh38")

sig_lp_distance(sigs[["SBS1"]], sigs[["SBS5"]], p = 1)  # L1 (Manhattan)
#> [1] 1.677304
sig_lp_distance(sigs[["SBS1"]], sigs[["SBS5"]], p = 2)  # L2 (Euclidean)
#> [1] 0.4766606
sig_lp_distance(sigs[["SBS1"]], sigs[["SBS5"]], p = 1, scale = TRUE)
#> [1] 0.01747191