This function acts as a drop-in replacement for the base rank() function with the added option to:
Rank categorical factors based on frequency instead of alphabetically
Rank in descending or ascending order
Arguments
- x
A numeric, character, or factor vector
- sort_by
Sort ranking either by "alphabetical" or "frequency" . Default is "alphabetical"
- desc
A logical indicating whether the ranking should be in descending ( TRUE ) or ascending ( FALSE ) order. When input is numeric, ranking is always based on numeric order.
- ties.method
a character string specifying how ties are treated, see ‘Details’; can be abbreviated.
- na.last
a logical or character string controlling the treatment of
NAs. IfTRUE, missing values in the data are put last; ifFALSE, they are put first; ifNA, they are removed; if"keep"they are kept with rankNA.- freq_tiebreak
Controls how alphabetical tie-breaking works when
sort_by = "frequency"andxis character/factor/logical. Must be one of:"match_desc"(default): alphabetical tie-breaking direction followsdesc(ascending whendesc = FALSE, descending whendesc = TRUE)."asc": ties are always broken by ascending alphabetical order, regardless ofdesc."desc": ties are always broken by descending alphabetical order, regardless ofdesc.
- verbose
verbose (flag)
Details
If x includes ‘ties’ (equal values), the ties.method argument determines how the rank value is decided. Must be one of:
average: replaces integer ranks of tied values with their average (default)
first: first-occurring value is assumed to be the lower rank (closer to one)
last: last-occurring value is assumed to be the lower rank (closer to one)
max or min: integer ranks of tied values are replaced with their maximum and minimum respectively (latter is typical in sports-ranking)
random which of the tied values are higher / lower rank is randomly decided.
NA values are never considered to be equal: for na.last = TRUE and na.last = FALSE they are given distinct ranks in the order in which they occur in x.
Note
When sort_by = "frequency", ties based on frequency are broken by
alphabetical order of the terms. Use freq_tiebreak to control whether
that alphabetical tie-breaking is ascending, descending, or follows
desc.
When sort_by = "frequency" and input is character, ties.method is ignored. Each distinct element level gets its own rank, and each rank is 1 unit away from the next element, irrespective of how many duplicates
Examples
# ------------------
## CATEGORICAL INPUT
# ------------------
fruits <- c("Apple", "Orange", "Apple", "Pear", "Orange")
# rank alphabetically
smartrank(fruits)
#> [1] 1.5 3.5 1.5 5.0 3.5
#> [1] 1.5 3.5 1.5 5.0 3.5
# rank based on frequency
smartrank(fruits, sort_by = "frequency")
#> [1] 2.5 4.5 2.5 1.0 4.5
#> [1] 2.5 4.5 2.5 1.0 4.5
# rank based on descending order of frequency
smartrank(fruits, sort_by = "frequency", desc = TRUE)
#> [1] 3.5 1.5 3.5 5.0 1.5
#> [1] 1.5 3.5 1.5 5.0 3.5
# sort fruits vector based on rank
ranks <- smartrank(fruits,sort_by = "frequency", desc = TRUE)
fruits[order(ranks)]
#> [1] "Orange" "Orange" "Apple" "Apple" "Pear"
#> [1] "Apple" "Apple" "Orange" "Orange" "Pear"
# ------------------
## NUMERICAL INPUT
# ------------------
# rank numerically
smartrank(c(1, 3, 2))
#> [1] 1 3 2
#> [1] 1 3 2
# rank numerically based on descending order
smartrank(c(1, 3, 2), desc = TRUE)
#> [1] 3 1 2
#> [1] 3 1 2
# always rank numeric vectors based on values, irrespective of sort_by
smartrank(c(1, 3, 2), sort_by = "frequency")
#> smartrank: Sorting a non-categorical variable. Ignoring `sort_by` and sorting numerically
#> [1] 1 3 2
#> smartrank: Sorting a non-categorical variable. Ignoring `sort_by` and sorting numerically
#> [1] 1 3 2
