Compute the L2 Distance Between Two Signatures or Catalogues
Source:R/statistics.R
sig_l2_distance.Rd
Calculates the L2 distance (Euclidean distance) between two sigverse
signatures or catalogues.
This metric quantifies how different the distributions are in terms of their numeric values.
Usage
sig_l2_distance(
signature1,
signature2,
value = c("fraction", "count"),
scale = FALSE
)
Arguments
- signature1, signature2
Two
sigverse
signatures or catalogues. Seesigshared::example_signature()
orsigshared::example_catalogue()
. Must contain matchingchannel
values in identical order.- value
A character string:
"fraction"
for normalised signatures or"count"
for raw catalogues.- scale
Logical. If
TRUE
, divides the L2 distance by the number of elements. This can help normalise distance values across signatures with different dimensions.
Details
For vectors \( x \) and \( y \), the L2 distance is: $$\|x - y\|_2 = \sqrt{\sum_i (x_i - y_i)^2}$$
A smaller value indicates more similar signatures, while larger values indicate greater dissimilarity. This is often used as a simple and fast alternative to cosine similarity or KL divergence.
Examples
library(sigstash)
signatures <- sig_load("COSMIC_v3.3.1_SBS_GRCh38")
s1 <- signatures[["SBS1"]]
s2 <- signatures[["SBS5"]]
# Compute distance between two fractional signatures
sig_l2_distance(s1, s2)
#> [1] 0.4766606
# Compare catalogue-level distance (on raw counts)
cat1 <- sig_reconstruct(s1, n = 100)
cat2 <- sig_reconstruct(s2, n = 100)
sig_l2_distance(cat1, cat2, value = "count")
#> [1] 47.66606