Aim
To build a sunburst chart that represents microbial composition of some microbiome sample. Our input will be a vector of ncbi taxonomy IDs (taxid).
A taxids abundance = the frequency in input vector
Libraries
We’ll need two R packages sunburst and taxizedbextra
# remotes::install_github("selkamand/sunburst")
library(sunburst)
# remotes::install_github("selkamand/taxizedbextra")
library(taxizedbextra) Download ncbi taxonomy database
First, lets use `db_download_ncbi()` from taxizedb (exposed by taxizedbextra) to download the ncbi taxonomy database locally to help us build sunburst plots from taxonomy ids at blistering speed. Its just over 2 gigs so might take a while, but it’ll be worth it down the road.
# Download ncbi taxonomy database.
db_download_ncbi(overwrite = TRUE)On my macbook it saves the db to ~/Library/Caches/R/taxizedb. You can check where its downloaded for you by running locate_taxonomy_cache()
# Where is my taxonomy database downloaded to?
locate_taxonomy_cache()
#> <hoard>
#> path: taxizedb
#> cache path: ~/.cache/R/taxizedbGet data for sunburst plot
We need to get data in the required format (numeric vector of ncbi taxids) You can use taxid = -1 for ‘unclassified sequences’
# Here we're simulating our data
taxids = c(rep(561, times = 10), rep(1639, times = 20), rep(529731, times = 10))
taxids
#> [1] 561 561 561 561 561 561 561 561 561 561
#> [11] 1639 1639 1639 1639 1639 1639 1639 1639 1639 1639
#> [21] 1639 1639 1639 1639 1639 1639 1639 1639 1639 1639
#> [31] 529731 529731 529731 529731 529731 529731 529731 529731 529731 529731Create sunburst plot
# generate sunburst plot
microbial_sunburst(
taxids = taxids,
ranks_to_include = c("species", "genus", "family")
)
#> [ℹ] Getting taxid lineages
#> [ℹ] Constructing sunburst plot
#> Registered S3 method overwritten by 'httr':
#> method from
#> print.cache_info hoardrrun ?microbial_sunburst() to learn how to customise this plot