Skip to contents

Calculates the L2 distance (Euclidean distance) between two sigverse signatures or catalogues. This metric quantifies how different the distributions are in terms of their numeric values.

Usage

sig_l2_distance(
  signature1,
  signature2,
  value = c("fraction", "count"),
  scale = FALSE
)

Arguments

signature1, signature2

Two sigverse signatures or catalogues. See sigshared::example_signature() or sigshared::example_catalogue(). Must contain matching channel values in identical order.

value

A character string: "fraction" for normalised signatures or "count" for raw catalogues.

scale

Logical. If TRUE, divides the L2 distance by the number of elements. This can help normalise distance values across signatures with different dimensions.

Value

A single numeric value representing the L2 distance.

Details

For vectors \( x \) and \( y \), the L2 distance is: $$\|x - y\|_2 = \sqrt{\sum_i (x_i - y_i)^2}$$

A smaller value indicates more similar signatures, while larger values indicate greater dissimilarity. This is often used as a simple and fast alternative to cosine similarity or KL divergence.

Examples

library(sigstash)

signatures <- sig_load("COSMIC_v3.3.1_SBS_GRCh38")
s1 <- signatures[["SBS1"]]
s2 <- signatures[["SBS5"]]

# Compute distance between two fractional signatures
sig_l2_distance(s1, s2)
#> [1] 0.4766606

# Compare catalogue-level distance (on raw counts)
cat1 <- sig_reconstruct(s1, n = 100)
cat2 <- sig_reconstruct(s2, n = 100)
sig_l2_distance(cat1, cat2, value = "count")
#> [1] 47.66606