Skip to contents

Visualize relationships between numeric variables and categorical groupings using parallel coordinate plots.

Usage

ggparallel(
  data,
  col_id = NULL,
  col_colour = NULL,
  highlight = NULL,
  interactive = TRUE,
  order_columns_by = c("appearance", "random", "auto"),
  order_observations_by = c("frequency", "original"),
  verbose = TRUE,
  palette_colour = palette.colors(palette = "Set2"),
  palette_highlight = c("red", "grey90"),
  convert_binary_numeric_to_factor = TRUE,
  scaling = c("uniminmax", "none"),
  return = c("plot", "data"),
  options = ggparallel_options()
)

Arguments

data

A data frame containing the variables to plot.

col_id

The name of the column to use as an identifier. If NULL, artificial IDs will be generated based on row numbers. (character)

col_colour

Name of the column to use for coloring lines in the plot. If NULL, no coloring is applied. (character)

highlight

A level from col_colour to emphasize in the plot. Ignored if col_colour is not set. (character)

interactive

Produce interactive ggiraph visualiastion (flag)

order_columns_by

Strategy for ordering columns in the plot. Options include:

  • "appearance": Order columns by their order in data (default).

  • "random": Randomly order columns.

  • "auto": Automatically order columns based on context:

    • If highlight is set, columns are ordered to maximize separation between the highlighted level and all others, using mutual information.

    • If col_colour is set but highlight is not, columns are ordered based on mutual information with all classes in col_colour.

    • If neither highlight nor col_colour is set, columns are ordered to minimize the estimated number of crossings, using a repetitive nearest neighbour approach with two-opt refinement.

order_observations_by

Strategy for ordering lines in the plot. Options include:

  • "frequency": Draw the largest groups first.

  • "original": Preserve the original order in data.

Ignored if highlight is set.

verbose

Logical; whether to display informative messages during execution. (default: TRUE)

palette_colour

A named vector of colors for categorical levels in col_colour. (default: Set2 palette)

palette_highlight

A two-color vector for highlighting (highlight and others). (default: c("red", "grey90"))

convert_binary_numeric_to_factor

Logical; whether to convert numeric columns containing only 0, 1, and NA to factors. (default: TRUE)

scaling

Method for scaling numeric variables. Options include:

  • "uniminmax": Rescale each variable to range [0, 1].

  • "none": No rescaling. Use raw values.

return

What to return. Options include:

  • "plot": Return the ggplot object (default).

  • "data": Return the processed data used for plotting.

options

A list of additional visualization parameters created by ggparallel_options().

Value

A ggplot object or a processed data frame, depending on the return parameter.

Examples

ggparallel(
  data = minibeans,
  col_colour = "Class",
  order_columns_by = "auto"
)
#>  Ordering columns based on mutual information with [Class]
#>  Making plot interactive since `interactive = TRUE`
ggparallel( data = minibeans, col_colour = "Class", highlight = "DERMASON", order_columns_by = "auto" ) #> Ordering columns based on how well they differentiate 1 group from the rest [DERMASON] (based on mutual information) #> Making plot interactive since `interactive = TRUE`