| Title: | Integrated Analysis and Visualization of Microbiome Data |
|---|---|
| Description: | A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. |
| Authors: | Daniel P. Smith [aut, cre] (ORCID: <https://orcid.org/0000-0002-2479-2044>), Alkek Center for Metagenomics and Microbiome Research [cph, fnd] |
| Maintainer: | Daniel P. Smith <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 3.1.0 |
| Built: | 2026-06-03 23:49:09 UTC |
| Source: | https://github.com/cmmr/rbiom |
Visualize alpha diversity with boxplots.
adiv_boxplot( biom, x = NULL, adiv = "Shannon", layers = "x", stat.by = x, facet.by = NULL, colors = TRUE, shapes = TRUE, patterns = FALSE, flip = FALSE, stripe = NULL, ci = "ci", level = 0.95, p.adj = "fdr", outliers = NULL, xlab.angle = "auto", p.label = 0.05, transform = "none", caption = TRUE, ... )adiv_boxplot( biom, x = NULL, adiv = "Shannon", layers = "x", stat.by = x, facet.by = NULL, colors = TRUE, shapes = TRUE, patterns = FALSE, flip = FALSE, stripe = NULL, ci = "ci", level = 0.95, p.adj = "fdr", outliers = NULL, xlab.angle = "auto", p.label = 0.05, transform = "none", caption = TRUE, ... )
biom |
An rbiom object, or any value accepted by |
x |
A categorical metadata column name to use for the x-axis. Or
|
adiv |
Alpha diversity metric(s) to use. Options are:
|
layers |
One or more of
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
patterns |
Patterns for each group.
Options are similar to |
flip |
Transpose the axes, so that taxa are present as rows instead
of columns. Default: |
stripe |
Shade every other x position. Default: same as flip |
ci |
How to calculate min/max of the crossbar,
errorbar, linerange, and pointrange layers.
Options are: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
outliers |
Show boxplot outliers? |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
p.label |
Minimum adjusted p-value to display on the plot with a bracket.
If a numeric vector with more than one value is
provided, they will be used as breaks for asterisk notation.
Default: |
transform |
Transformation to apply to calculated values. Options are:
|
caption |
Add methodology caption beneath the plot.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Patterns are added using the fillpattern R package. Options are "brick",
"chevron", "fish", "grid", "herringbone", "hexagon", "octagon",
"rain", "saw", "shingle", "rshingle", "stripe", and "wave",
optionally abbreviated and/or suffixed with modifiers. For example,
"hex10_sm" for the hexagon pattern rotated 10 degrees and shrunk by 2x.
See fillpattern::fill_pattern() for complete documentation of options.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other alpha_diversity:
adiv_corrplot(),
adiv_stats(),
adiv_table()
Other visualization:
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) biom <- rarefy(hmp50) adiv_boxplot(biom, x="Body Site", stat.by="Body Site") adiv_boxplot(biom, x="Sex", stat.by="Body Site", adiv=c("otu", "shan"), layers = "bld") adiv_boxplot(biom, x="body", stat.by="sex", adiv=c('sha', 'sim'), flip=TRUE, layers="p") # Each plot object includes additional information. fig <- adiv_boxplot(biom, x="Body Site") ## Computed Data Points ------------------- fig$data ## Statistics Table ----------------------- fig$stats ## ggplot2 Command ------------------------ fig$codelibrary(rbiom) biom <- rarefy(hmp50) adiv_boxplot(biom, x="Body Site", stat.by="Body Site") adiv_boxplot(biom, x="Sex", stat.by="Body Site", adiv=c("otu", "shan"), layers = "bld") adiv_boxplot(biom, x="body", stat.by="sex", adiv=c('sha', 'sim'), flip=TRUE, layers="p") # Each plot object includes additional information. fig <- adiv_boxplot(biom, x="Body Site") ## Computed Data Points ------------------- fig$data ## Statistics Table ----------------------- fig$stats ## ggplot2 Command ------------------------ fig$code
Visualize alpha diversity with scatterplots and trendlines.
adiv_corrplot( biom, x, adiv = "Shannon", layers = "tc", stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "emmeans", fit = "gam", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", alt = "!=", mu = 0, caption = TRUE, check = FALSE, ... )adiv_corrplot( biom, x, adiv = "Shannon", layers = "tc", stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "emmeans", fit = "gam", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", alt = "!=", mu = 0, caption = TRUE, check = FALSE, ... )
biom |
An rbiom object, or any value accepted by |
x |
Dataset field with the x-axis values. Equivalent to the |
adiv |
Alpha diversity metric(s) to use. Options are:
|
layers |
One or more of
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply to calculated values. Options are:
|
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other alpha_diversity:
adiv_boxplot(),
adiv_stats(),
adiv_table()
Other visualization:
adiv_boxplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) p <- adiv_corrplot(babies, "age", stat.by = "deliv", fit = "gam") p p$stats p$codelibrary(rbiom) p <- adiv_corrplot(babies, "age", stat.by = "deliv", fit = "gam") p p$stats p$code
A convenience wrapper for adiv_table() + stats_table().
adiv_stats( biom, regr = NULL, stat.by = NULL, adiv = "Shannon", split.by = NULL, transform = "none", test = "emmeans", fit = "gam", at = NULL, level = 0.95, alt = "!=", mu = 0, p.adj = "fdr" )adiv_stats( biom, regr = NULL, stat.by = NULL, adiv = "Shannon", split.by = NULL, transform = "none", test = "emmeans", fit = "gam", at = NULL, level = 0.95, alt = "!=", mu = 0, p.adj = "fdr" )
biom |
An rbiom object, or any value accepted by |
regr |
Dataset field with the x-axis (independent; predictive)
values. Must be numeric. Default: |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
adiv |
Alpha diversity metric(s) to use. Options are:
|
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
transform |
Transformation to apply to calculated values. Options are:
|
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
A tibble data.frame with fields from the table below. This tibble
object provides the $code operator to print the R code used to generate
the statistics.
| Field | Description |
| .mean | Estimated marginal mean. See emmeans::emmeans(). |
| .mean.diff | Difference in means. |
| .slope | Trendline slope. See emmeans::emtrends(). |
| .slope.diff | Difference in slopes. |
| .h1 | Alternate hypothesis. |
| .p.val | Probability that null hypothesis is correct. |
| .adj.p | .p.val after adjusting for multiple comparisons. |
| .effect.size | Effect size. See emmeans::eff_size(). |
| .lower | Confidence interval lower bound. |
| .upper | Confidence interval upper bound. |
| .se | Standard error. |
| .n | Number of samples. |
| .df | Degrees of freedom. |
| .stat | Wilcoxon or Kruskal-Wallis rank sum statistic. |
| .t.ratio | .mean / .se |
| .r.sqr | Percent of variation explained by the model. |
| .adj.r | .r.sqr, taking degrees of freedom into account. |
| .aic | Akaike Information Criterion (predictive models). |
| .bic | Bayesian Information Criterion (descriptive models). |
| .loglik | Log-likelihood goodness-of-fit score. |
| .fit.p | P-value for observing this fit by chance. |
Other alpha_diversity:
adiv_boxplot(),
adiv_corrplot(),
adiv_table()
Other stats_tables:
bdiv_stats(),
distmat_stats(),
stats_table(),
taxa_stats()
library(rbiom) biom <- rarefy(hmp50) adiv_stats(biom, stat.by = "Sex")[,1:6] adiv_stats(biom, stat.by = "Sex", split.by = "Body Site")[,1:6] adiv_stats(biom, stat.by = "Body Site", test = "kruskal")library(rbiom) biom <- rarefy(hmp50) adiv_stats(biom, stat.by = "Sex")[,1:6] adiv_stats(biom, stat.by = "Sex", split.by = "Body Site")[,1:6] adiv_stats(biom, stat.by = "Body Site", test = "kruskal")
Calculate the alpha diversity of each sample.
adiv_table( biom, adiv = "shannon", md = ".all", tree = NULL, transform = "none", ties = "random", seed = 0, cpus = n_cpus() ) adiv_matrix( biom, adiv = c("observed", "shannon", "simpson"), tree = NULL, transform = "none", ties = "random", seed = 0, cpus = n_cpus() ) adiv_vector( biom, adiv = "shannon", tree = NULL, transform = "none", ties = "random", seed = 0, cpus = n_cpus() )adiv_table( biom, adiv = "shannon", md = ".all", tree = NULL, transform = "none", ties = "random", seed = 0, cpus = n_cpus() ) adiv_matrix( biom, adiv = c("observed", "shannon", "simpson"), tree = NULL, transform = "none", ties = "random", seed = 0, cpus = n_cpus() ) adiv_vector( biom, adiv = "shannon", tree = NULL, transform = "none", ties = "random", seed = 0, cpus = n_cpus() )
biom |
An rbiom object, or any value accepted by |
adiv |
Alpha diversity metric(s) to use. Options are:
|
md |
Dataset field(s) to include in the output data frame, or |
tree |
A |
transform |
Transformation to apply to calculated values. Options are:
|
ties |
When |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
cpus |
The number of CPUs to use. Set to |
adiv_vector() - A named numeric vector.
adiv_matrix() - A matrix of samples x metric. The first column, 'depth', is never transformed.
adiv_table() - A tibble data.frame of alpha diversity values.
Each combination of sample/adiv has its own row.
Column names are .sample, .depth, .adiv,
and .diversity, followed by any metadata fields requested by
md.
sample_sums() for sample depths.
Other alpha_diversity:
adiv_boxplot(),
adiv_corrplot(),
adiv_stats()
library(rbiom) biom <- hmp50[1:5] adiv_table(biom) biom <- rarefy(biom) adiv_table(biom, md = NULL) adiv_vector(biom, 'faith') adiv_matrix(biom)library(rbiom) biom <- hmp50[1:5] adiv_table(biom) biom <- rarefy(biom) adiv_table(biom, md = NULL) adiv_vector(biom, 'faith') adiv_matrix(biom)
Construct an rbiom object. The returned object is an R6 reference class.
Use b <- a$clone() to create copies, not b <- a.
as_rbiom(biom, ...)as_rbiom(biom, ...)
biom |
Object which can be coerced to an rbiom-class object. For example:
|
... |
Properties to overwrite in biom: |
An rbiom object.
library(rbiom) # create a simple matrix ------------------------ mtx <- matrix( data = floor(runif(24) * 1000), nrow = 6, dimnames = list(paste0("OTU", 1:6), paste0("Sample", 1:4)) ) mtx # and some sample metadata ---------------------- df <- data.frame( .sample = paste0("Sample", 1:4), treatment = c("A", "B", "A", "B"), days = c(12, 3, 7, 8) ) # convert data set to rbiom --------------------- biom <- as_rbiom(mtx, metadata = df, id = "My BIOM") biomlibrary(rbiom) # create a simple matrix ------------------------ mtx <- matrix( data = floor(runif(24) * 1000), nrow = 6, dimnames = list(paste0("OTU", 1:6), paste0("Sample", 1:4)) ) mtx # and some sample metadata ---------------------- df <- data.frame( .sample = paste0("Sample", 1:4), treatment = c("A", "B", "A", "B"), days = c(12, 3, 7, 8) ) # convert data set to rbiom --------------------- biom <- as_rbiom(mtx, metadata = df, id = "My BIOM") biom
Convert an rbiom object to a base R list.
## S3 method for class 'rbiom' as.list(x, ...)## S3 method for class 'rbiom' as.list(x, ...)
x |
An rbiom object, such as from |
... |
Not used. |
A list with names
c('counts', 'metadata', 'taxonomy', 'tree', 'sequences', 'id', 'comment', 'date', 'generated_by').
Other conversion:
as.matrix.rbiom()
Identical to running as.matrix(biom$counts).
## S3 method for class 'rbiom' as.matrix(x, ...)## S3 method for class 'rbiom' as.matrix(x, ...)
x |
An rbiom object, such as from |
... |
Not used. |
A base R matrix with OTUs as rows and samples as columns.
Other conversion:
as.list.rbiom()
library(rbiom) as.matrix(hmp50)[1:5,1:5]library(rbiom) as.matrix(hmp50)[1:5,1:5]
Longitudinal Stool Samples from Infants (n = 2,684)
babiesbabies
An rbiom object with 2,684 samples. Includes metadata and taxonomy.
ID1, ID2, ..., ID12
Male or Female
1 - 266
"Breast milk", "Breast milk and formula", or "Formula"
"Frozen upon collection" or "Stored in alcohol"
Yes or No
Yes or No
Cesarean or Vaginal
116 - 247
https://www.nature.com/articles/s41467-018-04641-7 and doi:10.1038/s41467-017-01973-8
Other Built-In Datasets:
gems,
hmp50
babies head(babies$metadata$Age)babies head(babies$metadata$Age)
Visualize BIOM data with boxplots.
bdiv_boxplot( biom, x = NULL, bdiv = "bray", layers = "x", weighted = NULL, tree = NULL, within = NULL, between = NULL, stat.by = x, facet.by = NULL, colors = TRUE, shapes = TRUE, patterns = FALSE, flip = FALSE, stripe = NULL, ci = "ci", level = 0.95, p.adj = "fdr", outliers = NULL, xlab.angle = "auto", p.label = 0.05, transform = "none", caption = TRUE, alpha = 0.5, cpus = n_cpus(), ... )bdiv_boxplot( biom, x = NULL, bdiv = "bray", layers = "x", weighted = NULL, tree = NULL, within = NULL, between = NULL, stat.by = x, facet.by = NULL, colors = TRUE, shapes = TRUE, patterns = FALSE, flip = FALSE, stripe = NULL, ci = "ci", level = 0.95, p.adj = "fdr", outliers = NULL, xlab.angle = "auto", p.label = 0.05, transform = "none", caption = TRUE, alpha = 0.5, cpus = n_cpus(), ... )
biom |
An rbiom object, or any value accepted by |
x |
A categorical metadata column name to use for the x-axis. Or
|
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
layers |
One or more of
|
weighted |
(Deprecated - weighting is now inherent in bdiv metric name.)
Take relative abundances into account. When |
tree |
A |
within, between
|
Dataset field(s) for intra- or inter- sample
comparisons. Alternatively, dataset field names given elsewhere can
be prefixed with |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
patterns |
Patterns for each group.
Options are similar to |
flip |
Transpose the axes, so that taxa are present as rows instead
of columns. Default: |
stripe |
Shade every other x position. Default: same as flip |
ci |
How to calculate min/max of the crossbar,
errorbar, linerange, and pointrange layers.
Options are: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
outliers |
Show boxplot outliers? |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
p.label |
Minimum adjusted p-value to display on the plot with a bracket.
If a numeric vector with more than one value is
provided, they will be used as breaks for asterisk notation.
Default: |
transform |
Transformation to apply to calculated values. Options are:
|
caption |
Add methodology caption beneath the plot.
Default: |
alpha |
The alpha term to use in Generalized UniFrac. How much weight
to give to relative abundances; a value between 0 and 1, inclusive.
Setting |
cpus |
The number of CPUs to use. Set to |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Patterns are added using the fillpattern R package. Options are "brick",
"chevron", "fish", "grid", "herringbone", "hexagon", "octagon",
"rain", "saw", "shingle", "rshingle", "stripe", and "wave",
optionally abbreviated and/or suffixed with modifiers. For example,
"hex10_sm" for the hexagon pattern rotated 10 degrees and shrunk by 2x.
See fillpattern::fill_pattern() for complete documentation of options.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other beta_diversity:
bdiv_clusters(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
bdiv_ord_table(),
bdiv_stats(),
bdiv_table(),
distmat_stats()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) biom <- rarefy(hmp50) bdiv_boxplot(biom, x="==Body Site", bdiv="unweighted_unifrac", stat.by="Body Site")library(rbiom) biom <- rarefy(hmp50) bdiv_boxplot(biom, x="==Body Site", bdiv="unweighted_unifrac", stat.by="Body Site")
Cluster samples by beta diversity k-means.
bdiv_clusters( biom, bdiv = "bray", weighted = NULL, normalized = NULL, tree = NULL, k = 5, alpha = 0.5, cpus = n_cpus(), ... )bdiv_clusters( biom, bdiv = "bray", weighted = NULL, normalized = NULL, tree = NULL, k = 5, alpha = 0.5, cpus = n_cpus(), ... )
biom |
An rbiom object, or any value accepted by |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
weighted |
(Deprecated - weighting is now inherent in bdiv metric name.)
Take relative abundances into account. When |
normalized |
(Deprecated - normalization is now inherent in bdiv metric
name.) Only changes the "Weighted UniFrac" calculation. Divides result by
the total branch weights. Default: |
tree |
A |
k |
Number of clusters. Default: |
alpha |
The alpha term to use in Generalized UniFrac. How much weight
to give to relative abundances; a value between 0 and 1, inclusive.
Setting |
cpus |
The number of CPUs to use. Set to |
... |
Passed on to |
A numeric factor assigning samples to clusters.
Other beta_diversity:
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
bdiv_ord_table(),
bdiv_stats(),
bdiv_table(),
distmat_stats()
Other clustering:
taxa_clusters()
library(rbiom) biom <- rarefy(hmp50) biom$metadata$bray_cluster <- bdiv_clusters(biom) pull(biom, 'bray_cluster')[1:10] bdiv_ord_plot(biom, stat.by = "bray_cluster")library(rbiom) biom <- rarefy(hmp50) biom$metadata$bray_cluster <- bdiv_clusters(biom) pull(biom, 'bray_cluster')[1:10] bdiv_ord_plot(biom, stat.by = "bray_cluster")
Visualize beta diversity with scatterplots and trendlines.
bdiv_corrplot( biom, x, bdiv = "bray", layers = "tc", weighted = NULL, tree = NULL, within = NULL, between = NULL, stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "emmeans", fit = "gam", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", ties = "random", seed = 0, alt = "!=", mu = 0, caption = TRUE, check = FALSE, alpha = 0.5, cpus = n_cpus(), ... )bdiv_corrplot( biom, x, bdiv = "bray", layers = "tc", weighted = NULL, tree = NULL, within = NULL, between = NULL, stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "emmeans", fit = "gam", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", ties = "random", seed = 0, alt = "!=", mu = 0, caption = TRUE, check = FALSE, alpha = 0.5, cpus = n_cpus(), ... )
biom |
An rbiom object, or any value accepted by |
x |
Dataset field with the x-axis values. Equivalent to the |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
layers |
One or more of
|
weighted |
(Deprecated - weighting is now inherent in bdiv metric name.)
Take relative abundances into account. When |
tree |
A |
within, between
|
Dataset field(s) for intra- or inter- sample
comparisons. Alternatively, dataset field names given elsewhere can
be prefixed with |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply to calculated values. Options are:
|
ties |
When |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
alpha |
The alpha term to use in Generalized UniFrac. How much weight
to give to relative abundances; a value between 0 and 1, inclusive.
Setting |
cpus |
The number of CPUs to use. Set to |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other beta_diversity:
bdiv_boxplot(),
bdiv_clusters(),
bdiv_heatmap(),
bdiv_ord_plot(),
bdiv_ord_table(),
bdiv_stats(),
bdiv_table(),
distmat_stats()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) biom <- rarefy(hmp50) bdiv_corrplot(biom, "Age", stat.by = "Sex", layers = "tcp")library(rbiom) biom <- rarefy(hmp50) bdiv_corrplot(biom, "Age", stat.by = "Sex", layers = "tcp")
Display beta diversities in an all vs all grid.
bdiv_heatmap( biom, bdiv = "bray", tree = NULL, tracks = NULL, grid = "devon", label = TRUE, label_size = NULL, rescale = "none", clust = "complete", trees = TRUE, asp = 1, tree_height = 10, track_height = 10, legend = "right", title = TRUE, xlab.angle = "auto", alpha = 0.5, cpus = n_cpus(), ... )bdiv_heatmap( biom, bdiv = "bray", tree = NULL, tracks = NULL, grid = "devon", label = TRUE, label_size = NULL, rescale = "none", clust = "complete", trees = TRUE, asp = 1, tree_height = 10, track_height = 10, legend = "right", title = TRUE, xlab.angle = "auto", alpha = 0.5, cpus = n_cpus(), ... )
biom |
An rbiom object, or any value accepted by |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
tree |
A |
tracks |
A character vector of metadata fields to display as tracks
at the top of the plot. Or, a list as expected by the |
grid |
Color palette name, or a list with entries for |
label |
Label the matrix rows and columns. You can supply a list
or logical vector of length two to control row labels and column
labels separately, for example
|
label_size |
The font size to use for the row and column labels. You
can supply a numeric vector of length two to control row label sizes
and column label sizes separately, for example
|
rescale |
Rescale rows or columns to all have a common min/max.
Options: |
clust |
Clustering algorithm for reordering the rows and columns by
similarity. You can supply a list or character vector of length two to
control the row and column clustering separately, for example
Default: |
trees |
Draw a dendrogram for rows (left) and columns (top). You can
supply a list or logical vector of length two to control the row tree
and column tree separately, for example
|
asp |
Aspect ratio (height/width) for entire grid.
Default: |
tree_height, track_height
|
The height of the dendrogram or annotation
tracks as a percentage of the overall grid size. Use a numeric vector
of length two to assign |
legend |
Where to place the legend. Options are: |
title |
Plot title. Set to |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
alpha |
The alpha term to use in Generalized UniFrac. How much weight
to give to relative abundances; a value between 0 and 1, inclusive.
Setting |
cpus |
The number of CPUs to use. Set to |
... |
Additional arguments to pass on to ggplot2::theme().
For example, |
A ggplot2 plot. The computed data points and ggplot
command are available as $data and $code,
respectively.
Metadata can be displayed as colored tracks above the heatmap. Common use cases are provided below, with more thorough documentation available at https://cmmr.github.io/rbiom .
## Categorical ----------------------------
tracks = "Body Site"
tracks = list('Body Site' = "bright")
tracks = list('Body Site' = c('Stool' = "blue", 'Saliva' = "green"))
## Numeric --------------------------------
tracks = "Age"
tracks = list('Age' = "reds")
## Multiple Tracks ------------------------
tracks = c("Body Site", "Age")
tracks = list('Body Site' = "bright", 'Age' = "reds")
tracks = list(
'Body Site' = c('Stool' = "blue", 'Saliva' = "green"),
'Age' = list('colors' = "reds") )
The following entries in the track definitions are understood:
colors - A pre-defined palette name or custom set of colors to map to.
range - The c(min,max) to use for scale values.
label - Label for this track. Defaults to the name of this list element.
side - Options are "top" (default) or "left".
na.color - The color to use for NA values.
bins - Bin a gradient into this many bins/steps.
guide - A list of arguments for guide_colorbar() or guide_legend().
All built-in color palettes are colorblind-friendly.
Categorical palette names: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Numeric palette names: "reds", "oranges", "greens",
"purples", "grays", "acton", "bamako",
"batlow", "bilbao", "buda", "davos",
"devon", "grayC", "hawaii", "imola",
"lajolla", "lapaz", "nuuk", "oslo",
"tokyo", "turku", "bam", "berlin",
"broc", "cork", "lisbon", "roma",
"tofino", "vanimo", and "vik".
Other beta_diversity:
bdiv_boxplot(),
bdiv_clusters(),
bdiv_corrplot(),
bdiv_ord_plot(),
bdiv_ord_table(),
bdiv_stats(),
bdiv_table(),
distmat_stats()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) # Subset to 10 samples and rarefy them. hmp10 <- rarefy(hmp50[1:10]) bdiv_heatmap(hmp10, tracks=c("Body Site", "Age")) bdiv_heatmap(hmp10, bdiv=c("u_unifrac", "n_unifrac"), tracks="sex")library(rbiom) # Subset to 10 samples and rarefy them. hmp10 <- rarefy(hmp50[1:10]) bdiv_heatmap(hmp10, tracks=c("Body Site", "Age")) bdiv_heatmap(hmp10, bdiv=c("u_unifrac", "n_unifrac"), tracks="sex")
Ordinate samples and taxa on a 2D plane based on beta diversity distances.
bdiv_ord_plot( biom, bdiv = "bray", ord = "PCoA", layers = "petm", stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, tree = NULL, test = "adonis2", seed = 0, permutations = 999, rank = -1, taxa = 4, p.top = Inf, p.adj = "fdr", unc = "singly", caption = TRUE, alpha = 0.5, cpus = n_cpus(), ... )bdiv_ord_plot( biom, bdiv = "bray", ord = "PCoA", layers = "petm", stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, tree = NULL, test = "adonis2", seed = 0, permutations = 999, rank = -1, taxa = 4, p.top = Inf, p.adj = "fdr", unc = "singly", caption = TRUE, alpha = 0.5, cpus = n_cpus(), ... )
biom |
An rbiom object, or any value accepted by |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
ord |
Method for reducing dimensionality. Options are:
Multiple/abbreviated values allowed. Default: |
layers |
One or more of
|
stat.by |
The categorical or numeric metadata field over which statistics should be calculated. Required. |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
tree |
A |
test |
Permutational test for accessing significance. Options are:
Abbreviations are allowed. Default: |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
permutations |
Number of random permutations to use.
Default: |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
p.top |
Only display taxa with the most significant differences in
abundance. If |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
alpha |
The alpha term to use in Generalized UniFrac. How much weight
to give to relative abundances; a value between 0 and 1, inclusive.
Setting |
cpus |
The number of CPUs to use. Set to |
... |
Parameters for layer geoms (e.g. |
A ggplot2 plot.
The computed sample coordinates and ggplot command
are available as $data and $code respectively.
If stat.by is given, then $stats and
$stats$code are set.
If rank is given, then $data$taxa_coords,
$taxa_stats, and $taxa_stats$code are set.
Other beta_diversity:
bdiv_boxplot(),
bdiv_clusters(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_table(),
bdiv_stats(),
bdiv_table(),
distmat_stats()
Other ordination:
bdiv_ord_table(),
distmat_ord_table()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) biom <- rarefy(hmp50) bdiv_ord_plot(biom, layers="pemt", stat.by="Body Site", rank="g")library(rbiom) biom <- rarefy(hmp50) bdiv_ord_plot(biom, layers="pemt", stat.by="Body Site", rank="g")
The biplot parameters (taxa, unc, p.top, and
p.adj) only only have an effect when rank is not NULL.
bdiv_ord_table( biom, bdiv = "bray", ord = "PCoA", weighted = NULL, md = NULL, k = 2, stat.by = NULL, split.by = NULL, tree = NULL, test = "adonis2", seed = 0, permutations = 999, rank = NULL, taxa = 6, p.top = Inf, p.adj = "fdr", unc = "singly", alpha = 0.5, cpus = n_cpus(), ... )bdiv_ord_table( biom, bdiv = "bray", ord = "PCoA", weighted = NULL, md = NULL, k = 2, stat.by = NULL, split.by = NULL, tree = NULL, test = "adonis2", seed = 0, permutations = 999, rank = NULL, taxa = 6, p.top = Inf, p.adj = "fdr", unc = "singly", alpha = 0.5, cpus = n_cpus(), ... )
biom |
An rbiom object, or any value accepted by |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
ord |
Method for reducing dimensionality. Options are:
Multiple/abbreviated values allowed. Default: |
weighted |
(Deprecated - weighting is now inherent in bdiv metric name.)
Take relative abundances into account. When |
md |
Dataset field(s) to include in the output data frame, or |
k |
Number of ordination dimensions to return. Either |
stat.by |
The categorical or numeric metadata field over which statistics should be calculated. Required. |
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
tree |
A |
test |
Permutational test for accessing significance. Options are:
Abbreviations are allowed. Default: |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
permutations |
Number of random permutations to use.
Default: |
rank |
What rank(s) of taxa to compute biplot coordinates and
statistics for, or |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
p.top |
Only display taxa with the most significant differences in
abundance. If |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
alpha |
The alpha term to use in Generalized UniFrac. How much weight
to give to relative abundances; a value between 0 and 1, inclusive.
Setting |
cpus |
The number of CPUs to use. Set to |
... |
Additional arguments to pass on to |
A data.frame with columns .sample,
.bdiv, .ord, .x, .y, and (optionally)
.z. Any columns given by md, split.by, and
stat.by are included as well.
If stat.by is given, then $stats and
$stats$code) are set.
If rank is given, then $taxa_coords,
$taxa_stats, and $taxa_stats$code are set.
Other beta_diversity:
bdiv_boxplot(),
bdiv_clusters(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
bdiv_stats(),
bdiv_table(),
distmat_stats()
Other ordination:
bdiv_ord_plot(),
distmat_ord_table()
library(rbiom) ord <- bdiv_ord_table(hmp50, "bray", "pcoa", stat.by="Body Site", rank="g") head(ord) ord$stats ord$taxa_statslibrary(rbiom) ord <- bdiv_ord_table(hmp50, "bray", "pcoa", stat.by="Body Site", rank="g") head(ord) ord$stats ord$taxa_stats
A convenience wrapper for bdiv_table() + stats_table().
bdiv_stats( biom, regr = NULL, stat.by = NULL, bdiv = "bray", weighted = NULL, tree = NULL, within = NULL, between = NULL, split.by = NULL, transform = "none", test = "emmeans", fit = "gam", at = NULL, level = 0.95, alt = "!=", mu = 0, p.adj = "fdr", alpha = 0.5, cpus = n_cpus() )bdiv_stats( biom, regr = NULL, stat.by = NULL, bdiv = "bray", weighted = NULL, tree = NULL, within = NULL, between = NULL, split.by = NULL, transform = "none", test = "emmeans", fit = "gam", at = NULL, level = 0.95, alt = "!=", mu = 0, p.adj = "fdr", alpha = 0.5, cpus = n_cpus() )
biom |
An rbiom object, or any value accepted by |
regr |
Dataset field with the x-axis (independent; predictive)
values. Must be numeric. Default: |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
weighted |
(Deprecated - weighting is now inherent in bdiv metric name.)
Take relative abundances into account. When |
tree |
A |
within, between
|
Dataset field(s) for intra- or inter- sample
comparisons. Alternatively, dataset field names given elsewhere can
be prefixed with |
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
transform |
Transformation to apply to calculated values. Options are:
|
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
alpha |
The alpha term to use in Generalized UniFrac. How much weight
to give to relative abundances; a value between 0 and 1, inclusive.
Setting |
cpus |
The number of CPUs to use. Set to |
A tibble data.frame with fields from the table below. This tibble
object provides the $code operator to print the R code used to generate
the statistics.
| Field | Description |
| .mean | Estimated marginal mean. See emmeans::emmeans(). |
| .mean.diff | Difference in means. |
| .slope | Trendline slope. See emmeans::emtrends(). |
| .slope.diff | Difference in slopes. |
| .h1 | Alternate hypothesis. |
| .p.val | Probability that null hypothesis is correct. |
| .adj.p | .p.val after adjusting for multiple comparisons. |
| .effect.size | Effect size. See emmeans::eff_size(). |
| .lower | Confidence interval lower bound. |
| .upper | Confidence interval upper bound. |
| .se | Standard error. |
| .n | Number of samples. |
| .df | Degrees of freedom. |
| .stat | Wilcoxon or Kruskal-Wallis rank sum statistic. |
| .t.ratio | .mean / .se |
| .r.sqr | Percent of variation explained by the model. |
| .adj.r | .r.sqr, taking degrees of freedom into account. |
| .aic | Akaike Information Criterion (predictive models). |
| .bic | Bayesian Information Criterion (descriptive models). |
| .loglik | Log-likelihood goodness-of-fit score. |
| .fit.p | P-value for observing this fit by chance. |
Other beta_diversity:
bdiv_boxplot(),
bdiv_clusters(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
bdiv_ord_table(),
bdiv_table(),
distmat_stats()
Other stats_tables:
adiv_stats(),
distmat_stats(),
stats_table(),
taxa_stats()
library(rbiom) biom <- rarefy(hmp50) bdiv_stats(biom, stat.by = "Sex", bdiv = c("bray", "w_unifrac"))[,1:7] biom <- subset(biom, `Body Site` %in% c('Saliva', 'Stool', 'Buccal mucosa')) bdiv_stats(biom, stat.by = "Body Site", split.by = "==Sex")[,1:6]library(rbiom) biom <- rarefy(hmp50) bdiv_stats(biom, stat.by = "Sex", bdiv = c("bray", "w_unifrac"))[,1:7] biom <- subset(biom, `Body Site` %in% c('Saliva', 'Stool', 'Buccal mucosa')) bdiv_stats(biom, stat.by = "Body Site", split.by = "==Sex")[,1:6]
Distance / dissimilarity between samples.
bdiv_table( biom, bdiv = "bray", weighted = NULL, normalized = NULL, tree = NULL, md = ".all", within = NULL, between = NULL, delta = ".all", norm = "none", pseudocount = NULL, power = 1.5, alpha = 0.5, transform = "none", ties = "random", seed = 0, cpus = n_cpus(), ... ) bdiv_matrix( biom, bdiv = "bray", weighted = NULL, normalized = NULL, tree = NULL, within = NULL, between = NULL, norm = "none", pseudocount = NULL, power = 1.5, alpha = 0.5, transform = "none", ties = "random", seed = 0, cpus = n_cpus() ) bdiv_distmat( biom, bdiv = "bray", weighted = NULL, normalized = NULL, tree = NULL, within = NULL, between = NULL, norm = "none", pseudocount = NULL, power = 1.5, alpha = 0.5, transform = "none", ties = "random", seed = 0, cpus = n_cpus() )bdiv_table( biom, bdiv = "bray", weighted = NULL, normalized = NULL, tree = NULL, md = ".all", within = NULL, between = NULL, delta = ".all", norm = "none", pseudocount = NULL, power = 1.5, alpha = 0.5, transform = "none", ties = "random", seed = 0, cpus = n_cpus(), ... ) bdiv_matrix( biom, bdiv = "bray", weighted = NULL, normalized = NULL, tree = NULL, within = NULL, between = NULL, norm = "none", pseudocount = NULL, power = 1.5, alpha = 0.5, transform = "none", ties = "random", seed = 0, cpus = n_cpus() ) bdiv_distmat( biom, bdiv = "bray", weighted = NULL, normalized = NULL, tree = NULL, within = NULL, between = NULL, norm = "none", pseudocount = NULL, power = 1.5, alpha = 0.5, transform = "none", ties = "random", seed = 0, cpus = n_cpus() )
biom |
An rbiom object, or any value accepted by |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
weighted |
(Deprecated - weighting is now inherent in bdiv metric name.)
Take relative abundances into account. When |
normalized |
(Deprecated - normalization is now inherent in bdiv metric
name.) Only changes the "Weighted UniFrac" calculation. Divides result by
the total branch weights. Default: |
tree |
A |
md |
Dataset field(s) to include in the output data frame, or |
within, between
|
Dataset field(s) for intra- or inter- sample
comparisons. Alternatively, dataset field names given elsewhere can
be prefixed with |
delta |
For numeric metadata, report the absolute difference in values
for the two samples, for instance |
norm |
Normalize the incoming counts. Options are:
Default: |
pseudocount |
Value added to counts to handle zeros when
|
power |
Scaling factor for the magnitude of differences between
communities ( |
alpha |
The alpha term to use in Generalized UniFrac. How much weight
to give to relative abundances; a value between 0 and 1, inclusive.
Setting |
transform |
Transformation to apply to calculated values. Options are:
|
ties |
When |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
cpus |
The number of CPUs to use. Set to |
... |
Not used. |
bdiv_matrix() - An R matrix of samples x samples.
bdiv_distmat() - A dist-class distance matrix.
bdiv_table() - A tibble data.frame with columns named .sample1, .sample2,
.bdiv, .distance, and any fields requested by md. Numeric metadata
fields will be returned as abs(x - y); categorical metadata fields as
"x", "y", or "x vs y".
Prefix metadata fields with == or != to limit comparisons to within or
between groups, respectively. For example, stat.by = '==Sex' will
run calculations only for intra-group comparisons, returning "Male" and
"Female", but NOT "Female vs Male". Similarly, setting
stat.by = '!=Body Site' will only show the inter-group comparisons, such
as "Saliva vs Stool", "Anterior nares vs Buccal mucosa", and so on.
The same effect can be achieved by using the within and between
parameters. stat.by = '==Sex' is equivalent to
stat.by = 'Sex', within = 'Sex'.
Other beta_diversity:
bdiv_boxplot(),
bdiv_clusters(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
bdiv_ord_table(),
bdiv_stats(),
distmat_stats()
library(rbiom) # Subset to four samples biom <- hmp50$clone() biom$counts <- biom$counts[,c("HMP18", "HMP19", "HMP20", "HMP21")] # Return in long format with metadata bdiv_table(biom, 'w_unifrac', md = ".all") # Only look at distances among the stool samples bdiv_table(biom, 'w_unifrac', md = c("==Body Site", "Sex")) # Or between males and females bdiv_table(biom, 'w_unifrac', md = c("Body Site", "!=Sex")) # All-vs-all matrix bdiv_matrix(biom, 'w_unifrac') # All-vs-all distance matrix dm <- bdiv_distmat(biom, 'w_unifrac') dm plot(hclust(dm))library(rbiom) # Subset to four samples biom <- hmp50$clone() biom$counts <- biom$counts[,c("HMP18", "HMP19", "HMP20", "HMP21")] # Return in long format with metadata bdiv_table(biom, 'w_unifrac', md = ".all") # Only look at distances among the stool samples bdiv_table(biom, 'w_unifrac', md = c("==Body Site", "Sex")) # Or between males and females bdiv_table(biom, 'w_unifrac', md = c("Body Site", "!=Sex")) # All-vs-all matrix bdiv_matrix(biom, 'w_unifrac') # All-vs-all distance matrix dm <- bdiv_distmat(biom, 'w_unifrac') dm plot(hclust(dm))
blply() and bdply() let you divide your biom dataset into smaller
pieces, run a function on those smaller rbiom objects, and return the
results as a data.frame or list.
bdply(biom, vars, FUN, ..., iters = list(), prefix = FALSE) blply(biom, vars, FUN, ..., iters = list(), prefix = FALSE)bdply(biom, vars, FUN, ..., iters = list(), prefix = FALSE) blply(biom, vars, FUN, ..., iters = list(), prefix = FALSE)
biom |
An rbiom object, or any value accepted by |
vars |
A character vector of metadata fields. Each unique combination
of values in these columns will be used to create a subsetted
rbiom object to pass to |
FUN |
The function to execute on each subset of |
... |
Additional arguments to pass on to |
iters |
A named list of values to pass to |
prefix |
When |
You can also specify additional variables for your function to iterate over in unique combinations.
Calls plyr::ddply() or plyr::dlply() internally.
For bdply(), a tibble data.frame comprising the accumulated
outputs of FUN, along with the columns specified by
vars and iters. For blply(), a named list that has details
about vars and iters in attr(,'split_labels').
Other metadata:
glimpse.rbiom()
Other biom:
biom_merge()
library(rbiom) bdply(hmp50, "Sex", `$`, 'n_samples') blply(hmp50, "Sex", `$`, 'n_samples') %>% unlist() bdply(hmp50, c("Body Site", "Sex"), function (b) { adm <- adiv_matrix(b)[,c("shannon", "simpson")] apply(adm, 2L, mean) }) iters <- list(d = c("bray", "euclid")) bdply(hmp50, "Sex", iters = iters, function (b, d) { r <- range(bdiv_distmat(biom = b, bdiv = d)) round(data.frame(min = r[[1]], max = r[[2]])) })library(rbiom) bdply(hmp50, "Sex", `$`, 'n_samples') blply(hmp50, "Sex", `$`, 'n_samples') %>% unlist() bdply(hmp50, c("Body Site", "Sex"), function (b) { adm <- adiv_matrix(b)[,c("shannon", "simpson")] apply(adm, 2L, mean) }) iters <- list(d = c("bray", "euclid")) bdply(hmp50, "Sex", iters = iters, function (b, d) { r <- range(bdiv_distmat(biom = b, bdiv = d)) round(data.frame(min = r[[1]], max = r[[2]])) })
Scaling a matrix of proportions (or counts) to a new target depth, rounding to integers while preserving the original total abundance sum exactly.
biom_inflate(biom, depth = NULL, clone = TRUE)biom_inflate(biom, depth = NULL, clone = TRUE)
biom |
An rbiom object, or any value accepted by |
depth |
The target library size (sum) for each sample. Must be an
integer greater than 0. If |
clone |
Create a copy of |
An rbiom object.
To ensure the sum of the resulting counts equals the target depth exactly
(avoiding drift caused by simple rounding), this function uses the Largest
Remainder Method (also known as the Hare-Niemeyer method).
It assigns the integer part of the scaled value to each feature, and then distributes the remaining counts to the features with the largest fractional parts.
suggest_inflate_depths() for details on how target depths are
estimated when depth = NULL.
Other transformations:
biom_relativize(),
biom_rescale(),
modify_metadata,
rarefy(),
slice_metadata,
subset(),
with()
library(rbiom) biom <- hmp50[1:5] sample_sums(biom) biom <- biom_relativize(biom) sample_sums(biom) biom <- biom_inflate(biom) sample_sums(biom)library(rbiom) biom <- hmp50[1:5] sample_sums(biom) biom <- biom_relativize(biom) sample_sums(biom) biom <- biom_inflate(biom) sample_sums(biom)
WARNING: It is generally ill-advised to merge BIOM datasets, as OTUs mappings are dependent on upstream clustering and are not equivalent between BIOM files.
biom_merge( ..., metadata = NA, taxonomy = NA, tree = NULL, sequences = NA, id = NA, comment = NA )biom_merge( ..., metadata = NA, taxonomy = NA, tree = NULL, sequences = NA, id = NA, comment = NA )
... |
Any number of rbiom objects (e.g. from |
metadata, taxonomy, tree, sequences, id, comment
|
Replace the corresponding
data in the merged rbiom object with these values. Set to |
An rbiom object.
Other biom:
bdply()
library(rbiom) b1 <- as_rbiom(hmp50$counts[,1:4]) b2 <- as_rbiom(hmp50$counts[,5:8]) biom <- biom_merge(b1, b2) print(biom) biom$tree <- hmp50$tree biom$metadata <- hmp50$metadata print(biom)library(rbiom) b1 <- as_rbiom(hmp50$counts[,1:4]) b2 <- as_rbiom(hmp50$counts[,5:8]) biom <- biom_merge(b1, b2) print(biom) biom$tree <- hmp50$tree biom$metadata <- hmp50$metadata print(biom)
This function normalizes the data by dividing each observation by the total library size of its sample. The resulting values represent the proportion (0 to 1) of the sample composed of that specific feature.
This is a common transformation for microbiome data, as it accounts for differences in sequencing depth across samples, allowing for comparison of community composition.
biom_relativize(biom, clone = TRUE)biom_relativize(biom, clone = TRUE)
biom |
An rbiom object, or any value accepted by |
clone |
Create a copy of |
Convert absolute counts to relative abundances (proportions) where each sample sums to 1.
An rbiom object.
Other transformations:
biom_inflate(),
biom_rescale(),
modify_metadata,
rarefy(),
slice_metadata,
subset(),
with()
library(rbiom) biom <- hmp50[1:5] # Raw counts sum to different library sizes sample_sums(biom) # Relativized counts sum to 1 biom_rel <- biom_relativize(biom) sample_sums(biom_rel)library(rbiom) biom <- hmp50[1:5] # Raw counts sum to different library sizes sample_sums(biom) # Relativized counts sum to 1 biom_rel <- biom_relativize(biom) sample_sums(biom_rel)
This function performs a min-max scaling on each sample independently.
It is useful for normalization techniques that require data to be within a specific bounded range, or for visualization purposes where maintaining the relative distances between values is important but the absolute magnitude needs adjustment.
biom_rescale(biom, range = c(0, 1), clone = TRUE)biom_rescale(biom, range = c(0, 1), clone = TRUE)
biom |
An rbiom object, or any value accepted by |
range |
Numeric vector of length 2. Target min and max.
Default: |
clone |
Create a copy of |
Linearly rescale each sample's values to lie between a specified minimum and maximum.
An rbiom object.
The rescaling is performed in two steps:
Normalize: Divide values by the maximum value in that sample,
scaling them to a [0, 1] range relative to the sample's peak.
Scale and Shift: Apply the target range using the formula:
If range starts at a non-zero value (e.g., c(1, 10)), the sparsity of the
matrix will be destroyed because all zero counts will be shifted to the
minimum value. This can significantly increase memory usage for large datasets.
Other transformations:
biom_inflate(),
biom_relativize(),
modify_metadata,
rarefy(),
slice_metadata,
subset(),
with()
library(rbiom) biom <- hmp50[1:5] # Original range range(as.matrix(biom)) # Rescaled to 0-1 biom_01 <- biom_rescale(biom) range(as.matrix(biom_01)) # Rescaled to 0-100 (Percentages) biom_100 <- biom_rescale(biom, range = c(0, 100)) range(as.matrix(biom_100))library(rbiom) biom <- hmp50[1:5] # Original range range(as.matrix(biom)) # Rescaled to 0-1 biom_01 <- biom_rescale(biom) range(as.matrix(biom_01)) # Rescaled to 0-100 (Percentages) biom_100 <- biom_rescale(biom, range = c(0, 100)) range(as.matrix(biom_100))
Converts your rbiom object into other common Bioconductor data structures.
Each function requires the corresponding target package to be installed.
convert_to_animalcules(biom, ...) convert_to_biomformat(biom, ...) convert_to_phyloseq(biom, ...) convert_to_SE(biom, ...) convert_to_TSE(biom, ...)convert_to_animalcules(biom, ...) convert_to_biomformat(biom, ...) convert_to_phyloseq(biom, ...) convert_to_SE(biom, ...) convert_to_TSE(biom, ...)
biom |
An rbiom object, or any value accepted by |
... |
Not Used. |
convert_to_animalcules(): Converts to a MultiAssayExperiment object tailored for the
animalcules interactive microbiome analysis toolkit.
Includes: counts, metadata, and taxonomy.
convert_to_biomformat(): Converts to a biom object used by the
biomformat package, the standard Bioconductor
class for reading and writing BIOM data.
Includes: counts, metadata, and taxonomy.
convert_to_phyloseq(): Converts to a phyloseq object for use with the comprehensive
phyloseq ecosystem.
Includes: counts, metadata, taxonomy, phylogenetic tree, and sequences.
convert_to_SE(): Converts to a SummarizedExperiment object, a core
SummarizedExperiment Bioconductor
container for matrix-like data and annotations.
Includes: counts, metadata, and taxonomy.
convert_to_TSE(): Converts to a TreeSummarizedExperiment object. This extends the SE class
to natively support hierarchical TreeSummarizedExperiment
relationships.
Includes: counts, metadata, taxonomy, phylogenetic tree, and sequences.
An animalcules (MultiAssayExperiment class), biomformat (biom class), phyloseq,
SummarizedExperiment, or TreeSummarizedExperiment object.
## Not run: library(rbiom) print(hmp50) # Requires 'animalcules', a Bioconductor R package if (nzchar(system.file(package = "animalcules"))) { ani <- convert_to_animalcules(hmp50) print(ani) } # Requires 'biomformat', a Bioconductor R package if (nzchar(system.file(package = "biomformat"))) { bio <- convert_to_biomformat(hmp50) print(bio) } # Requires 'phyloseq', a Bioconductor R package if (nzchar(system.file(package = "phyloseq"))) { phy <- convert_to_phyloseq(hmp50) print(phy) } # Requires 'SummarizedExperiment', a Bioconductor R package if (nzchar(system.file(package = "SummarizedExperiment"))) { se <- convert_to_SE(hmp50) print(se) } # Requires 'TreeSummarizedExperiment', a Bioconductor R package if (nzchar(system.file(package = "TreeSummarizedExperiment"))) { tse <- convert_to_TSE(hmp50) print(tse) } ## End(Not run)## Not run: library(rbiom) print(hmp50) # Requires 'animalcules', a Bioconductor R package if (nzchar(system.file(package = "animalcules"))) { ani <- convert_to_animalcules(hmp50) print(ani) } # Requires 'biomformat', a Bioconductor R package if (nzchar(system.file(package = "biomformat"))) { bio <- convert_to_biomformat(hmp50) print(bio) } # Requires 'phyloseq', a Bioconductor R package if (nzchar(system.file(package = "phyloseq"))) { phy <- convert_to_phyloseq(hmp50) print(phy) } # Requires 'SummarizedExperiment', a Bioconductor R package if (nzchar(system.file(package = "SummarizedExperiment"))) { se <- convert_to_SE(hmp50) print(se) } # Requires 'TreeSummarizedExperiment', a Bioconductor R package if (nzchar(system.file(package = "TreeSummarizedExperiment"))) { tse <- convert_to_TSE(hmp50) print(tse) } ## End(Not run)
Run ordinations on a distance matrix.
distmat_ord_table(dm, ord = "PCoA", k = 2L, ...)distmat_ord_table(dm, ord = "PCoA", k = 2L, ...)
dm |
A |
ord |
Method for reducing dimensionality. Options are:
Multiple/abbreviated values allowed. Default: |
k |
Number of ordination dimensions to return. Either |
... |
Additional arguments for |
A data.frame with columns .sample, .ord, .x,
.y, and (optionally) .z.
Other ordination:
bdiv_ord_plot(),
bdiv_ord_table()
library(rbiom) dm <- bdiv_distmat(hmp50, "bray") ord <- distmat_ord_table(dm, "PCoA") head(ord)library(rbiom) dm <- bdiv_distmat(hmp50, "bray") ord <- distmat_ord_table(dm, "PCoA") head(ord)
Run statistics on a distance matrix vs a categorical or numeric variable.
distmat_stats(dm, groups, test = "adonis2", seed = 0, permutations = 999)distmat_stats(dm, groups, test = "adonis2", seed = 0, permutations = 999)
dm |
A |
groups |
A named vector of grouping values. The names should
correspond to |
test |
Permutational test for accessing significance. Options are:
Abbreviations are allowed. Default: |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
permutations |
Number of random permutations to use.
Default: |
A data.frame with summary statistics from vegan::permustats().
The columns are:
The size of the distance matrix.
The observed statistic. For mrpp, this is the overall weighted mean of group mean distances.
The difference of observed statistic and mean of permutations divided by the standard deviation of permutations (also known as z-values). Evaluated from permuted values without observed statistic.
Probability calculated by test.
R commands for reproducing the results are in $code.
Other beta_diversity:
bdiv_boxplot(),
bdiv_clusters(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
bdiv_ord_table(),
bdiv_stats(),
bdiv_table()
Other stats_tables:
adiv_stats(),
bdiv_stats(),
stats_table(),
taxa_stats()
library(rbiom) hmp10 <- hmp50$clone() hmp10$counts <- hmp10$counts[,1:10] dm <- bdiv_distmat(hmp10, 'w_unifrac') distmat_stats(dm, groups = pull(hmp10, 'Body Site')) distmat_stats(dm, groups = pull(hmp10, 'Age')) # See the R code used to calculate these statistics: stats <- distmat_stats(dm, groups = pull(hmp10, 'Age')) stats$codelibrary(rbiom) hmp10 <- hmp50$clone() hmp10$counts <- hmp10$counts[,1:10] dm <- bdiv_distmat(hmp10, 'w_unifrac') distmat_stats(dm, groups = pull(hmp10, 'Body Site')) distmat_stats(dm, groups = pull(hmp10, 'Age')) # See the R code used to calculate these statistics: stats <- distmat_stats(dm, groups = pull(hmp10, 'Age')) stats$code
Populates a directory with the following files, formatted according to QIIME 2 or mothur's specifications.
biom_counts.tsv
biom_metadata.tsv
biom_taxonomy.tsv
biom_tree.nwk
biom_seqs.fna
biom_counts.tsv will always be created. The others are dependent on
whether the content is present in the biom argument.
write_mothur(biom, dir = tempfile(), prefix = "biom_") write_qiime2(biom, dir = tempfile(), prefix = "biom_")write_mothur(biom, dir = tempfile(), prefix = "biom_") write_qiime2(biom, dir = tempfile(), prefix = "biom_")
biom |
An rbiom object, or any value accepted by |
dir |
Where to save the files. If the directory doesn't exist, it will
be created. Default: |
prefix |
A string to prepend to each file name. Default: |
The normalized directory path that was written to (invisibly).
library(rbiom) tdir <- tempfile() write_qiime2(hmp50, tdir, 'qiime2_') write_mothur(hmp50, tdir, 'mothur_') list.files(tdir) readLines(file.path(tdir, 'qiime2_metadata.tsv'), n = 4) readLines(file.path(tdir, 'mothur_taxonomy.tsv'), n = 3) unlink(tdir, recursive = TRUE)library(rbiom) tdir <- tempfile() write_qiime2(hmp50, tdir, 'qiime2_') write_mothur(hmp50, tdir, 'mothur_') list.files(tdir) readLines(file.path(tdir, 'qiime2_metadata.tsv'), n = 4) readLines(file.path(tdir, 'mothur_taxonomy.tsv'), n = 3) unlink(tdir, recursive = TRUE)
Global Enteric Multicenter Study (n = 1,006)
gemsgems
An rbiom object with 1,006 samples. Includes metadata and taxonomy.
Case or Control
0 - 4.8 (years old)
Bangladesh, Gambia, Kenya, or Mali
doi:10.1186/gb-2014-15-6-r76 and doi:10.1093/nar/gkx1027
Other Built-In Datasets:
babies,
hmp50
gems table(gems$metadata$country)gems table(gems$metadata$country)
Get a glimpse of your metadata.
## S3 method for class 'rbiom' glimpse(x, width = NULL, ...)## S3 method for class 'rbiom' glimpse(x, width = NULL, ...)
x |
An rbiom object, such as from |
width |
Width of output. See |
... |
Not used. |
The original biom, invisibly.
Other metadata:
bdply()
library(rbiom) glimpse(hmp50)library(rbiom) glimpse(hmp50)
Human Microbiome Project - demo dataset (n = 50)
hmp50hmp50
An rbiom object with 50 samples. Includes metadata, taxonomy, phylogeny, and sequences.
Male or Female
Anterior nares, Buccal mucosa, Mid vagina, Saliva, or Stool
21 - 40
19 - 32
Other Built-In Datasets:
babies,
gems
hmp50 hmp50$metadatahmp50 hmp50$metadata
A collection of transformations that operate directly on matrices.
mtx_rarefy( mtx, margin = 2L, depth = 0.1, n = NULL, seed = 0L, upsample = NULL, cpus = NULL ) mtx_percent(mtx, margin = 2L) mtx_rescale(mtx, margin = 2L, range = c(0, 1)) rarefy_cols(mtx, depth = 0.1, n = NULL, seed = 0L, cpus = NULL) rescale_rows(mtx) rescale_cols(mtx)mtx_rarefy( mtx, margin = 2L, depth = 0.1, n = NULL, seed = 0L, upsample = NULL, cpus = NULL ) mtx_percent(mtx, margin = 2L) mtx_rescale(mtx, margin = 2L, range = c(0, 1)) rarefy_cols(mtx, depth = 0.1, n = NULL, seed = 0L, cpus = NULL) rescale_rows(mtx) rescale_cols(mtx)
mtx |
A numeric matrix or sparse matrix of counts. |
margin |
Apply the transformation to the matrix's rows ( |
depth |
How many observations to keep per sample. |
n |
Deprecated. The number of samples to keep. This argument is ignored in the current version. |
seed |
An integer seed for randomizing which observations to keep or drop. |
upsample |
If the count data is in percentages, provide an integer
value here to scale each sample's observations to integers that sum
to this value. Maps to |
cpus |
The number of CPUs to use. Set to |
range |
When rescaling, what should the minimum and maximum values
be? Default: |
A A numeric matrix or sparse matrix, depending on the input type,
with the same dimensions as mtx.
mtx <- matrix(1:12, ncol = 3, dimnames = list(paste0("OTU", 1:4), paste0("Sample", 1:3))) mtx suppressWarnings({ mtx_rarefy(mtx) rarefy_cols(mtx) mtx_percent(mtx) mtx_rescale(mtx) rescale_rows(mtx) rescale_cols(mtx) })mtx <- matrix(1:12, ncol = 3, dimnames = list(paste0("OTU", 1:4), paste0("Sample", 1:3))) mtx suppressWarnings({ mtx_rarefy(mtx) rarefy_cols(mtx) mtx_percent(mtx) mtx_rescale(mtx) rescale_rows(mtx) rescale_cols(mtx) })
mutate() creates new fields in $metadata that are functions of existing
metadata fields. It can also modify (if the name is the same as an existing
field) and delete fields (by setting their value to NULL).
## S3 method for class 'rbiom' mutate(.data, ..., clone = TRUE) ## S3 method for class 'rbiom' rename(.data, ..., clone = TRUE)## S3 method for class 'rbiom' mutate(.data, ..., clone = TRUE) ## S3 method for class 'rbiom' rename(.data, ..., clone = TRUE)
.data |
An rbiom object, such as from |
... |
Passed on to |
clone |
Create a copy of |
An rbiom object.
Other transformations:
biom_inflate(),
biom_relativize(),
biom_rescale(),
rarefy(),
slice_metadata,
subset(),
with()
library(rbiom) biom <- slice_max(hmp50, BMI, n = 6) biom$metadata # Add a new field to the metadata biom <- mutate(biom, Obsese = BMI >= 30) biom$metadata # Rename a metadata field biom <- rename(biom, 'Age (years)' = "Age") biom$metadatalibrary(rbiom) biom <- slice_max(hmp50, BMI, n = 6) biom$metadata # Add a new field to the metadata biom <- mutate(biom, Obsese = BMI >= 30) biom$metadata # Rename a metadata field biom <- rename(biom, 'Age (years)' = "Age") biom$metadata
Create a heatmap with tracks and dendrograms from any matrix.
plot_heatmap( mtx, grid = list(label = "Grid Value", colors = "imola"), tracks = NULL, label = TRUE, label_size = NULL, rescale = "none", trees = TRUE, clust = "complete", dist = "euclidean", asp = 1, tree_height = 10, track_height = 10, legend = "right", title = NULL, xlab.angle = "auto", ... )plot_heatmap( mtx, grid = list(label = "Grid Value", colors = "imola"), tracks = NULL, label = TRUE, label_size = NULL, rescale = "none", trees = TRUE, clust = "complete", dist = "euclidean", asp = 1, tree_height = 10, track_height = 10, legend = "right", title = NULL, xlab.angle = "auto", ... )
mtx |
A numeric |
grid |
Color palette name, or a list with entries for |
tracks |
List of track definitions. See details below.
Default: |
label |
Label the matrix rows and columns. You can supply a list
or logical vector of length two to control row labels and column
labels separately, for example
|
label_size |
The font size to use for the row and column labels. You
can supply a numeric vector of length two to control row label sizes
and column label sizes separately, for example
|
rescale |
Rescale rows or columns to all have a common min/max.
Options: |
trees |
Draw a dendrogram for rows (left) and columns (top). You can
supply a list or logical vector of length two to control the row tree
and column tree separately, for example
|
clust |
Clustering algorithm for reordering the rows and columns by
similarity. You can supply a list or character vector of length two to
control the row and column clustering separately, for example
Default: |
dist |
Distance algorithm to use when reordering the rows and columns
by similarity. You can supply a list or character vector of length
two to control the row and column clustering separately, for example
Default: |
asp |
Aspect ratio (height/width) for entire grid.
Default: |
tree_height, track_height
|
The height of the dendrogram or annotation
tracks as a percentage of the overall grid size. Use a numeric vector
of length two to assign |
legend |
Where to place the legend. Options are: |
title |
Plot title. Default: |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
... |
Additional arguments to pass on to ggplot2::theme(). |
A ggplot2 plot. The computed data points and ggplot
command are available as $data and $code,
respectively.
One or more colored tracks can be placed on the left and/or top of the heatmap grid to visualize associated metadata values.
## Categorical ----------------------------
cat_vals <- sample(c("Male", "Female"), 10, replace = TRUE)
tracks <- list('Sex' = cat_vals)
tracks <- list('Sex' = list(values = cat_vals, colors = "bright"))
tracks <- list('Sex' = list(
values = cat_vals,
colors = c('Male' = "blue", 'Female' = "red")) )
## Numeric --------------------------------
num_vals <- sample(25:40, 10, replace = TRUE)
tracks <- list('Age' = num_vals)
tracks <- list('Age' = list(values = num_vals, colors = "greens"))
tracks <- list('Age' = list(values = num_vals, range = c(0,50)))
tracks <- list('Age' = list(
label = "Age (Years)",
values = num_vals,
colors = c("azure", "darkblue", "darkorchid") ))
## Multiple Tracks ------------------------
tracks <- list('Sex' = cat_vals, 'Age' = num_vals)
tracks <- list(
list(label = "Sex", values = cat_vals, colors = "bright"),
list(label = "Age", values = num_vals, colors = "greens") )
mtx <- matrix(sample(1:50), ncol = 10)
dimnames(mtx) <- list(letters[1:5], LETTERS[1:10])
plot_heatmap(mtx = mtx, tracks = tracks)
The following entries in the track definitions are understood:
values - The metadata values. When unnamed, order must match matrix.
range - The c(min,max) to use for scale values.
label - Label for this track. Defaults to the name of this list element.
side - Options are "top" (default) or "left".
colors - A pre-defined palette name or custom set of colors to map to.
na.color - The color to use for NA values.
bins - Bin a gradient into this many bins/steps.
guide - A list of arguments for guide_colorbar() or guide_legend().
All built-in color palettes are colorblind-friendly. See Mapping Metadata to Aesthetics for images of the palettes.
Categorical palette names: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Numeric palette names: "reds", "oranges", "greens",
"purples", "grays", "acton", "bamako",
"batlow", "bilbao", "buda", "davos",
"devon", "grayC", "hawaii", "imola",
"lajolla", "lapaz", "nuuk", "oslo",
"tokyo", "turku", "bam", "berlin",
"broc", "cork", "lisbon", "roma",
"tofino", "vanimo", and "vik".
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) set.seed(123) mtx <- matrix(runif(5*8), nrow = 5, dimnames = list(LETTERS[1:5], letters[1:8])) plot_heatmap(mtx) plot_heatmap(mtx, grid="oranges") plot_heatmap(mtx, grid=list(colors = "oranges", label = "Some %", bins = 5)) tracks <- list( 'Number' = sample(1:ncol(mtx)), 'Person' = list( values = factor(sample(c("Alice", "Bob"), ncol(mtx), TRUE)), colors = c('Alice' = "purple", 'Bob' = "darkcyan") ), 'State' = list( side = "left", values = sample(c("TX", "OR", "WA"), nrow(mtx), TRUE), colors = "bright" ) ) plot_heatmap(mtx, tracks=tracks)library(rbiom) set.seed(123) mtx <- matrix(runif(5*8), nrow = 5, dimnames = list(LETTERS[1:5], letters[1:8])) plot_heatmap(mtx) plot_heatmap(mtx, grid="oranges") plot_heatmap(mtx, grid=list(colors = "oranges", label = "Some %", bins = 5)) tracks <- list( 'Number' = sample(1:ncol(mtx)), 'Person' = list( values = factor(sample(c("Alice", "Bob"), ncol(mtx), TRUE)), colors = c('Alice' = "purple", 'Bob' = "darkcyan") ), 'State' = list( side = "left", values = sample(c("TX", "OR", "WA"), nrow(mtx), TRUE), colors = "bright" ) ) plot_heatmap(mtx, tracks=tracks)
Map sample names to metadata field values.
## S3 method for class 'rbiom' pull(.data, var = -1, name = ".sample", ...)## S3 method for class 'rbiom' pull(.data, var = -1, name = ".sample", ...)
.data |
An rbiom object, such as from |
var |
The metadata field name specified as:
Default: |
name |
The column to be used as names for a named vector.
Specified in a similar manner as var. Default: |
... |
Not used. |
A vector of metadata values, named with sample names.
taxa_map()
Other samples:
sample_sums()
library(rbiom) pull(hmp50, 'Age') %>% head() pull(hmp50, 'bod') %>% head(4)library(rbiom) pull(hmp50, 'Age') %>% head() pull(hmp50, 'bod') %>% head(4)
Visualize rarefaction curves with scatterplots and trendlines.
rare_corrplot( biom, adiv = "Shannon", layers = "tc", rline = TRUE, stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "none", fit = "log", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", alt = "!=", mu = 0, caption = TRUE, check = FALSE, cpus = n_cpus(), ... )rare_corrplot( biom, adiv = "Shannon", layers = "tc", rline = TRUE, stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "none", fit = "log", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", alt = "!=", mu = 0, caption = TRUE, check = FALSE, cpus = n_cpus(), ... )
biom |
An rbiom object, or any value accepted by |
adiv |
Alpha diversity metric(s) to use. Options are:
|
layers |
One or more of
|
rline |
Where to draw a horizontal line on the plot, intended to show
a particular rarefaction depth. Set to |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline.
Options are |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply to calculated values. Options are:
|
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
cpus |
The number of CPUs to use. Set to |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other rarefaction:
rare_multiplot(),
rare_stacked(),
sample_sums()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) biom <- subset(hmp50, `Body Site` %in% c('Saliva', 'Stool')) rare_corrplot(biom, stat.by = "body", adiv = c("sh", "o"), facet.by = "Sex")library(rbiom) biom <- subset(hmp50, `Body Site` %in% c('Saliva', 'Stool')) rare_corrplot(biom, stat.by = "body", adiv = c("sh", "o"), facet.by = "Sex")
Combines rare_corrplot and rare_stacked into a single figure.
rare_multiplot( biom, adiv = "Shannon", layers = "tc", rline = TRUE, stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "none", fit = "log", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", alt = "!=", mu = 0, caption = TRUE, check = FALSE, ... )rare_multiplot( biom, adiv = "Shannon", layers = "tc", rline = TRUE, stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "none", fit = "log", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", alt = "!=", mu = 0, caption = TRUE, check = FALSE, ... )
biom |
An rbiom object, or any value accepted by |
adiv |
Alpha diversity metric(s) to use. Options are:
|
layers |
One or more of
|
rline |
Where to draw a horizontal line on the plot, intended to show
a particular rarefaction depth. Set to |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline.
Options are |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply to calculated values. Options are:
|
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other rarefaction:
rare_corrplot(),
rare_stacked(),
sample_sums()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) rare_multiplot(hmp50, stat.by = "Body Site")library(rbiom) rare_multiplot(hmp50, stat.by = "Body Site")
Visualize the number of observations per sample.
rare_stacked( biom, rline = TRUE, counts = TRUE, labels = TRUE, y.transform = "log10", ... )rare_stacked( biom, rline = TRUE, counts = TRUE, labels = TRUE, y.transform = "log10", ... )
biom |
An rbiom object, or any value accepted by |
rline |
Where to draw a horizontal line on the plot, intended to show
a particular rarefaction depth. Set to |
counts |
Display the number of samples and reads remaining after
rarefying to |
labels |
Show sample names under each bar. Default: |
y.transform |
Y-axis transformation. Options are |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with |
A ggplot2 plot. The computed data points and ggplot
command are available as $data and $code,
respectively.
Other rarefaction:
rare_corrplot(),
rare_multiplot(),
sample_sums()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) rare_stacked(hmp50) rare_stacked(hmp50, rline = 500, r.linewidth = 2, r.linetype = "twodash") fig <- rare_stacked(hmp50, counts = FALSE) fig$codelibrary(rbiom) rare_stacked(hmp50) rare_stacked(hmp50, rline = 500, r.linewidth = 2, r.linetype = "twodash") fig <- rare_stacked(hmp50, counts = FALSE) fig$code
This function reduces the number of observations (reads) in each sample to
a fixed integer value (depth). Samples with fewer observations than the
specified depth are discarded.
Rarefaction is a common technique in microbiome analysis used to account for uneven sequencing effort across samples. By standardizing the library size, it allows for fair comparisons of alpha and beta diversity metrics.
rarefy( biom, depth = NULL, seed = 0L, inflate = FALSE, clone = TRUE, cpus = n_cpus() )rarefy( biom, depth = NULL, seed = 0L, inflate = FALSE, clone = TRUE, cpus = n_cpus() )
biom |
An rbiom object, or any value accepted by |
depth |
The number of observations to keep per sample. Must be an integer greater than 0.
|
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
inflate |
Logical. Handling for non-integer data (e.g. relative abundances).
|
clone |
Create a copy of |
cpus |
The number of CPUs to use. Set to |
Normalizes the library sizes of a dataset by randomly sub-sampling observations from each sample to a specific depth.
An rbiom object.
suggest_rarefy_depth() for details on the default depth selection.
Other transformations:
biom_inflate(),
biom_relativize(),
biom_rescale(),
modify_metadata,
slice_metadata,
subset(),
with()
library(rbiom) biom <- hmp50[1:5] sample_sums(biom) # Rarefy to the lowest sample depth # (All samples are kept, but counts are reduced) biom_rare <- rarefy(biom, depth = min(sample_sums(biom))) sample_sums(biom_rare) # Auto-select depth (may drop samples with low coverage) biom_auto <- rarefy(biom) sample_sums(biom_auto)library(rbiom) biom <- hmp50[1:5] sample_sums(biom) # Rarefy to the lowest sample depth # (All samples are kept, but counts are reduced) biom_rare <- rarefy(biom, depth = min(sample_sums(biom))) sample_sums(biom_rare) # Auto-select depth (may drop samples with low coverage) biom_auto <- rarefy(biom) sample_sums(biom_auto)
Parse counts, metadata, taxonomy, and phylogeny from a BIOM file.
read_biom(src, ...)read_biom(src, ...)
src |
Input data as either a file path, URL, or JSON string.
BIOM files can be formatted according to
version 1.0 (JSON) or 2.1 (HDF5)
specifications, or as
classical tabular format. URLs must begin with |
... |
Properties to set in the new rbiom object, for example,
|
An rbiom object.
as_rbiom()
library(rbiom) infile <- system.file("extdata", "hmp50.bz2", package = "rbiom") biom <- read_biom(infile) print(biom) # Taxa Abundances biom$counts[1:4,1:10] %>% as.matrix() biom$taxonomy %>% head() # Metadata biom$metadata %>% head() table(biom$metadata$Sex, biom$metadata$`Body Site`) sprintf("Mean age: %.1f", mean(biom$metadata$Age)) # Phylogenetic tree biom$tree %>% tree_subset(1:10) %>% plot()library(rbiom) infile <- system.file("extdata", "hmp50.bz2", package = "rbiom") biom <- read_biom(infile) print(biom) # Taxa Abundances biom$counts[1:4,1:10] %>% as.matrix() biom$taxonomy %>% head() # Metadata biom$metadata %>% head() table(biom$metadata$Sex, biom$metadata$`Body Site`) sprintf("Mean age: %.1f", mean(biom$metadata$Age)) # Phylogenetic tree biom$tree %>% tree_subset(1:10) %>% plot()
Parse a fasta file into a named character vector.
read_fasta(file, ids = NULL)read_fasta(file, ids = NULL)
file |
A file/URL with fasta-formatted sequences. Can optionally be compressed with gzip, bzip2, xz, or lzma. |
ids |
Character vector of IDs to retrieve. The default, |
A named character vector in which names are the fasta headers and values are the sequences.
A phylogenetic tree is required for computing UniFrac distance matrices. You can load a tree from a file or by providing the tree string directly. This tree must be in Newick format, also known as parenthetic format and New Hampshire format.
read_tree(src, underscores = FALSE)read_tree(src, underscores = FALSE)
src |
Input data as either a file path, URL, or Newick string. Compressed (gzip or bzip2) files are also supported. |
underscores |
When parsing the tree, should underscores be kept as
is? By default they will be converted to spaces (unless the entire ID
is quoted). Default |
A phylo class object representing the tree.
Other phylogeny:
tree_subset()
library(rbiom) infile <- system.file("extdata", "newick.tre", package = "rbiom") tree <- read_tree(infile) print(tree) tree <- read_tree(" (A:0.99,((B:0.87,C:0.89):0.51,(((D:0.16,(E:0.83,F:0.96) :0.94):0.69,(G:0.92,(H:0.62,I:0.85):0.54):0.23):0.74,J:0.1 2):0.43):0.67);") plot(tree)library(rbiom) infile <- system.file("extdata", "newick.tre", package = "rbiom") tree <- read_tree(infile) print(tree) tree <- read_tree(" (A:0.99,((B:0.87,C:0.89):0.51,(((D:0.16,(E:0.83,F:0.96) :0.94):0.69,(G:0.92,(H:0.62,I:0.85):0.54):0.23):0.74,J:0.1 2):0.43):0.67);") plot(tree)
Summarize the taxa observations in each sample.
sample_sums(biom, rank = -1, sort = NULL, unc = "singly") sample_apply(biom, FUN, rank = -1, sort = NULL, unc = "singly", ...)sample_sums(biom, rank = -1, sort = NULL, unc = "singly") sample_apply(biom, FUN, rank = -1, sort = NULL, unc = "singly", ...)
biom |
An rbiom object, or any value accepted by |
rank |
What rank(s) of taxa to display. E.g. |
sort |
Sort the result. Options: |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
FUN |
The function to apply to each column of |
... |
Optional arguments to |
For sample_sums, A named numeric vector of the number of
observations in each sample. For sample_apply, a named vector or
list with the results of FUN. The names are the taxa IDs.
Other samples:
pull.rbiom()
Other rarefaction:
rare_corrplot(),
rare_multiplot(),
rare_stacked()
Other taxa_abundance:
taxa_boxplot(),
taxa_clusters(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked(),
taxa_stats(),
taxa_sums(),
taxa_table()
library(rbiom) library(ggplot2) sample_sums(hmp50, sort = 'asc') %>% head() # Unique OTUs and "cultured" classes per sample nnz <- function (x) sum(x > 0) # number of non-zeroes sample_apply(hmp50, nnz, 'otu') %>% head() sample_apply(hmp50, nnz, 'class', unc = 'drop') %>% head() # Number of reads in each sample's most abundant family sample_apply(hmp50, base::max, 'f', sort = 'desc') %>% head() ggplot() + geom_histogram(aes(x=sample_sums(hmp50)), bins = 20)library(rbiom) library(ggplot2) sample_sums(hmp50, sort = 'asc') %>% head() # Unique OTUs and "cultured" classes per sample nnz <- function (x) sum(x > 0) # number of non-zeroes sample_apply(hmp50, nnz, 'otu') %>% head() sample_apply(hmp50, nnz, 'class', unc = 'drop') %>% head() # Number of reads in each sample's most abundant family sample_apply(hmp50, base::max, 'f', sort = 'desc') %>% head() ggplot() + geom_histogram(aes(x=sample_sums(hmp50)), bins = 20)
Subset to a specific number of samples.
## S3 method for class 'rbiom' slice(.data, ..., .by = NULL, .preserve = FALSE, clone = TRUE) ## S3 method for class 'rbiom' slice_head(.data, n, prop, by = NULL, clone = TRUE, ...) ## S3 method for class 'rbiom' slice_tail(.data, n, prop, by = NULL, clone = TRUE, ...) ## S3 method for class 'rbiom' slice_min( .data, order_by, n, prop, by = NULL, with_ties = TRUE, na_rm = FALSE, clone = TRUE, ... ) ## S3 method for class 'rbiom' slice_max( .data, order_by, n, prop, by = NULL, with_ties = TRUE, na_rm = FALSE, clone = TRUE, ... ) ## S3 method for class 'rbiom' slice_sample( .data, n, prop, by = NULL, weight_by = NULL, replace = FALSE, clone = TRUE, ... )## S3 method for class 'rbiom' slice(.data, ..., .by = NULL, .preserve = FALSE, clone = TRUE) ## S3 method for class 'rbiom' slice_head(.data, n, prop, by = NULL, clone = TRUE, ...) ## S3 method for class 'rbiom' slice_tail(.data, n, prop, by = NULL, clone = TRUE, ...) ## S3 method for class 'rbiom' slice_min( .data, order_by, n, prop, by = NULL, with_ties = TRUE, na_rm = FALSE, clone = TRUE, ... ) ## S3 method for class 'rbiom' slice_max( .data, order_by, n, prop, by = NULL, with_ties = TRUE, na_rm = FALSE, clone = TRUE, ... ) ## S3 method for class 'rbiom' slice_sample( .data, n, prop, by = NULL, weight_by = NULL, replace = FALSE, clone = TRUE, ... )
.data |
An rbiom object, such as from |
... |
For |
.by, by
|
< |
.preserve |
Relevant when the |
clone |
Create a copy of |
n, prop
|
Provide either A negative value of |
order_by |
< |
with_ties |
Should ties be kept together? The default, |
na_rm |
Should missing values in |
weight_by |
< |
replace |
Should sampling be performed with ( |
An rbiom object.
Other transformations:
biom_inflate(),
biom_relativize(),
biom_rescale(),
modify_metadata,
rarefy(),
subset(),
with()
library(rbiom) # The last 3 samples in the metadata table. biom <- slice_tail(hmp50, n = 3) biom$metadata # The 3 oldest subjects sampled. biom <- slice_max(hmp50, Age, n = 3) biom$metadata # Pick 3 samples at random. biom <- slice_sample(hmp50, n = 3) biom$metadatalibrary(rbiom) # The last 3 samples in the metadata table. biom <- slice_tail(hmp50, n = 3) biom$metadata # The 3 oldest subjects sampled. biom <- slice_max(hmp50, Age, n = 3) biom$metadata # Pick 3 samples at random. biom <- slice_sample(hmp50, n = 3) biom$metadata
Visualize categorical metadata effects on numeric values.
stats_boxplot( df, x = NULL, y = attr(df, "response"), layers = "x", stat.by = x, facet.by = NULL, colors = TRUE, shapes = TRUE, patterns = FALSE, test = "auto", flip = FALSE, stripe = NULL, ci = "ci", level = 0.95, p.adj = "fdr", p.top = Inf, outliers = NULL, xlab.angle = "auto", p.label = 0.05, caption = TRUE, ... )stats_boxplot( df, x = NULL, y = attr(df, "response"), layers = "x", stat.by = x, facet.by = NULL, colors = TRUE, shapes = TRUE, patterns = FALSE, test = "auto", flip = FALSE, stripe = NULL, ci = "ci", level = 0.95, p.adj = "fdr", p.top = Inf, outliers = NULL, xlab.angle = "auto", p.label = 0.05, caption = TRUE, ... )
df |
The dataset (data.frame or tibble object). "Dataset fields"
mentioned below should match column names in |
x |
A categorical metadata column name to use for the x-axis. Or
|
y |
A numeric metadata column name to use for the y-axis.
Default: |
layers |
One or more of
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
patterns |
Patterns for each group.
Options are similar to |
test |
Method for computing p-values: |
flip |
Transpose the axes, so that taxa are present as rows instead
of columns. Default: |
stripe |
Shade every other x position. Default: same as flip |
ci |
How to calculate min/max of the crossbar,
errorbar, linerange, and pointrange layers.
Options are: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
p.top |
Only display taxa with the most significant differences in
abundance. If |
outliers |
Show boxplot outliers? |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
p.label |
Minimum adjusted p-value to display on the plot with a bracket.
If a numeric vector with more than one value is
provided, they will be used as breaks for asterisk notation.
Default: |
caption |
Add methodology caption beneath the plot.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Patterns are added using the fillpattern R package. Options are "brick",
"chevron", "fish", "grid", "herringbone", "hexagon", "octagon",
"rain", "saw", "shingle", "rshingle", "stripe", and "wave",
optionally abbreviated and/or suffixed with modifiers. For example,
"hex10_sm" for the hexagon pattern rotated 10 degrees and shrunk by 2x.
See fillpattern::fill_pattern() for complete documentation of options.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) df <- adiv_table(rarefy(hmp50)) stats_boxplot(df, x = "Body Site") stats_boxplot(df, x = "Sex", stat.by = "Body Site", layers = "be")library(rbiom) df <- adiv_table(rarefy(hmp50)) stats_boxplot(df, x = "Body Site") stats_boxplot(df, x = "Sex", stat.by = "Body Site", layers = "be")
Visualize regression with scatterplots and trendlines.
stats_corrplot( df, x, y = attr(df, "response"), layers = "tc", stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "emmeans", fit = "gam", at = NULL, level = 0.95, p.adj = "fdr", p.top = Inf, alt = "!=", mu = 0, caption = TRUE, check = FALSE, ... )stats_corrplot( df, x, y = attr(df, "response"), layers = "tc", stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "emmeans", fit = "gam", at = NULL, level = 0.95, p.adj = "fdr", p.top = Inf, alt = "!=", mu = 0, caption = TRUE, check = FALSE, ... )
df |
The dataset (data.frame or tibble object). "Dataset fields"
mentioned below should match column names in |
x |
Dataset field with the x-axis values. Equivalent to the |
y |
A numeric metadata column name to use for the y-axis.
Default: |
layers |
One or more of
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
p.top |
Only display taxa with the most significant differences in
abundance. If |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) biom <- subset(hmp50, `Body Site` %in% c('Saliva', 'Stool')) df <- adiv_table(rarefy(biom)) stats_corrplot(df, "age", stat.by = "body") stats_corrplot( df = df, x = "Age", stat.by = "Body Site", facet.by = "Sex", layers = "trend" )library(rbiom) biom <- subset(hmp50, `Body Site` %in% c('Saliva', 'Stool')) df <- adiv_table(rarefy(biom)) stats_corrplot(df, "age", stat.by = "body") stats_corrplot( df = df, x = "Age", stat.by = "Body Site", facet.by = "Sex", layers = "trend" )
A simple interface to lower-level statistics functions, including
stats::wilcox.test(), stats::kruskal.test(), emmeans::emmeans(),
and emmeans::emtrends().
stats_table( df, regr = NULL, resp = attr(df, "response"), stat.by = NULL, split.by = NULL, test = "emmeans", fit = "gam", at = NULL, level = 0.95, alt = "!=", mu = 0, p.adj = "fdr" )stats_table( df, regr = NULL, resp = attr(df, "response"), stat.by = NULL, split.by = NULL, test = "emmeans", fit = "gam", at = NULL, level = 0.95, alt = "!=", mu = 0, p.adj = "fdr" )
df |
The dataset (data.frame or tibble object). "Dataset fields"
mentioned below should match column names in |
regr |
Dataset field with the x-axis (independent; predictive)
values. Must be numeric. Default: |
resp |
Dataset field with the y-axis (dependent; response) values,
such as taxa abundance or alpha diversity.
Default: |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
A tibble data.frame with fields from the table below. This tibble
object provides the $code operator to print the R code used to generate
the statistics.
| Field | Description |
| .mean | Estimated marginal mean. See emmeans::emmeans(). |
| .mean.diff | Difference in means. |
| .slope | Trendline slope. See emmeans::emtrends(). |
| .slope.diff | Difference in slopes. |
| .h1 | Alternate hypothesis. |
| .p.val | Probability that null hypothesis is correct. |
| .adj.p | .p.val after adjusting for multiple comparisons. |
| .effect.size | Effect size. See emmeans::eff_size(). |
| .lower | Confidence interval lower bound. |
| .upper | Confidence interval upper bound. |
| .se | Standard error. |
| .n | Number of samples. |
| .df | Degrees of freedom. |
| .stat | Wilcoxon or Kruskal-Wallis rank sum statistic. |
| .t.ratio | .mean / .se |
| .r.sqr | Percent of variation explained by the model. |
| .adj.r | .r.sqr, taking degrees of freedom into account. |
| .aic | Akaike Information Criterion (predictive models). |
| .bic | Bayesian Information Criterion (descriptive models). |
| .loglik | Log-likelihood goodness-of-fit score. |
| .fit.p | P-value for observing this fit by chance. |
Other stats_tables:
adiv_stats(),
bdiv_stats(),
distmat_stats(),
taxa_stats()
library(rbiom) biom <- rarefy(hmp50) df <- taxa_table(biom, rank = "Family") stats_table(df, stat.by = "Body Site")[,1:6] df <- adiv_table(biom) stats_table(df, stat.by = "Sex", split.by = "Body Site")[,1:7]library(rbiom) biom <- rarefy(hmp50) df <- taxa_table(biom, rank = "Family") stats_table(df, stat.by = "Body Site")[,1:6] df <- adiv_table(biom) stats_table(df, stat.by = "Sex", split.by = "Body Site")[,1:7]
Dropping samples or OTUs will lead to observations being removed from the
OTU matrix (biom$counts). OTUs and samples with zero observations are
automatically removed from the rbiom object.
## S3 method for class 'rbiom' subset(x, subset, clone = TRUE, ...) ## S3 method for class 'rbiom' x[i, j, ..., clone = TRUE, drop = FALSE] ## S3 method for class 'rbiom' na.omit(object, fields = ".all", clone = TRUE, ...) subset_taxa(x, subset, clone = TRUE, ...)## S3 method for class 'rbiom' subset(x, subset, clone = TRUE, ...) ## S3 method for class 'rbiom' x[i, j, ..., clone = TRUE, drop = FALSE] ## S3 method for class 'rbiom' na.omit(object, fields = ".all", clone = TRUE, ...) subset_taxa(x, subset, clone = TRUE, ...)
x |
An rbiom object, such as from |
subset |
Logical expression for rows to keep. See |
clone |
Create a copy of |
... |
Not used. |
i, j
|
The sample or OTU names to keep. Or a logical/integer vector
indicating which sample names from |
drop |
Not used |
object |
An rbiom object, such as from |
fields |
Which metadata field(s) to check for |
An rbiom object.
Other transformations:
biom_inflate(),
biom_relativize(),
biom_rescale(),
modify_metadata,
rarefy(),
slice_metadata,
with()
library(rbiom) library(dplyr) # Subset to specific samples biom <- hmp50[c('HMP20', 'HMP42', 'HMP12')] biom$metadata # Subset to specific OTUs biom <- hmp50[c('LtbAci52', 'UncO2012'),] # <- Trailing , biom$taxonomy # Subset to specific samples and OTUs biom <- hmp50[c('LtbAci52', 'UncO2012'), c('HMP20', 'HMP42', 'HMP12')] as.matrix(biom) # Subset samples according to metadata biom <- subset(hmp50, `Body Site` %in% c('Saliva') & Age < 25) biom$metadata # Subset OTUs according to taxonomy biom <- subset_taxa(hmp50, Phylum == 'Cyanobacteria') biom$taxonomy # Remove samples with NA metadata values biom <- mutate(hmp50, BS2 = na_if(`Body Site`, 'Saliva')) biom$metadata biom <- na.omit(biom) biom$metadatalibrary(rbiom) library(dplyr) # Subset to specific samples biom <- hmp50[c('HMP20', 'HMP42', 'HMP12')] biom$metadata # Subset to specific OTUs biom <- hmp50[c('LtbAci52', 'UncO2012'),] # <- Trailing , biom$taxonomy # Subset to specific samples and OTUs biom <- hmp50[c('LtbAci52', 'UncO2012'), c('HMP20', 'HMP42', 'HMP12')] as.matrix(biom) # Subset samples according to metadata biom <- subset(hmp50, `Body Site` %in% c('Saliva') & Age < 25) biom$metadata # Subset OTUs according to taxonomy biom <- subset_taxa(hmp50, Phylum == 'Cyanobacteria') biom$taxonomy # Remove samples with NA metadata values biom <- mutate(hmp50, BS2 = na_if(`Body Site`, 'Saliva')) biom$metadata biom <- na.omit(biom) biom$metadata
Estimates the optimal sequencing depth for each sample in a matrix by leveraging the global abundance distribution structure.
suggest_inflate_depths(biom, adjust = 1.5)suggest_inflate_depths(biom, adjust = 1.5)
biom |
An rbiom object, or any value accepted by |
adjust |
Numeric. Bandwidth adjustment for the kernel density
estimation. Default: |
A named integer vector of recommended depths for each sample.
When depth = NULL, biom_inflate() calls this function to estimate the original
sequencing depth for each sample. The underlying assumption is that in
typical microbiome datasets, the most frequent count value (the mode of the
abundance distribution) is 1 (a singleton).
The algorithm works as follows:
Log-Transformation: Non-zero relative abundances are log10-transformed.
Global Consensus: To overcome sparsity in individual samples, distributions are centered by their medians and aggregated across all samples.
Peak Detection: Kernel Density Estimation (KDE) is used to identify the peak (mode) of this aggregated distribution.
Scaling: A scaling factor is calculated for each sample that shifts this peak to correspond to an integer count of 1.
This approach effectively "shoehorns" relative abundance data into integer formats required by diversity metrics (like rarefaction or Chao1) by maximizing the number of singletons in the resulting matrix.
biom_inflate() which uses this heuristic when depth = NULL.
library(rbiom) depths <- suggest_inflate_depths(hmp50) head(depths)library(rbiom) depths <- suggest_inflate_depths(hmp50) head(depths)
Calculates a rarefaction depth that balances retaining samples against retaining total observations.
suggest_rarefy_depth(biom)suggest_rarefy_depth(biom)
biom |
An rbiom object, or any value accepted by |
A single integer representing the suggested rarefaction depth.
This function selects a depth by analyzing the trade-off between dropping samples (to increase depth) and lowering depth (to keep samples).
Calculate Yields: For every distinct sample depth in the dataset, calculate the total number of observations that would remain if the dataset were rarefied to that level.
Where is the depth and is the number of samples
with at least that many reads.
Define Threshold: Calculate 10% of the total observations in the original un-rarefied dataset.
Select Depth: Find the lowest depth where the
exceeds this 10% threshold.
This approach prioritizes keeping as many samples as possible, provided that doing so doesn't discard more than 90% of the dataset's total information.
rarefy() which uses this heuristic when depth = NULL.
library(rbiom) suggest_rarefy_depth(hmp50)library(rbiom) suggest_rarefy_depth(hmp50)
Visualize BIOM data with boxplots.
taxa_boxplot( biom, x = NULL, rank = -1, layers = "x", taxa = 6, unc = "singly", other = FALSE, p.top = Inf, stat.by = x, facet.by = NULL, colors = TRUE, shapes = TRUE, patterns = FALSE, flip = FALSE, stripe = NULL, ci = "ci", level = 0.95, p.adj = "fdr", outliers = NULL, xlab.angle = "auto", p.label = 0.05, transform = "none", y.transform = "sqrt", caption = TRUE, ... )taxa_boxplot( biom, x = NULL, rank = -1, layers = "x", taxa = 6, unc = "singly", other = FALSE, p.top = Inf, stat.by = x, facet.by = NULL, colors = TRUE, shapes = TRUE, patterns = FALSE, flip = FALSE, stripe = NULL, ci = "ci", level = 0.95, p.adj = "fdr", outliers = NULL, xlab.angle = "auto", p.label = 0.05, transform = "none", y.transform = "sqrt", caption = TRUE, ... )
biom |
An rbiom object, or any value accepted by |
x |
A categorical metadata column name to use for the x-axis. Or
|
rank |
What rank(s) of taxa to display. E.g. |
layers |
One or more of
|
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
p.top |
Only display taxa with the most significant differences in
abundance. If |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
patterns |
Patterns for each group.
Options are similar to |
flip |
Transpose the axes, so that taxa are present as rows instead
of columns. Default: |
stripe |
Shade every other x position. Default: same as flip |
ci |
How to calculate min/max of the crossbar,
errorbar, linerange, and pointrange layers.
Options are: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
outliers |
Show boxplot outliers? |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
p.label |
Minimum adjusted p-value to display on the plot with a bracket.
If a numeric vector with more than one value is
provided, they will be used as breaks for asterisk notation.
Default: |
transform |
Transformation to apply to calculated values. Options are:
|
y.transform |
The transformation to apply to the y-axis. Visualizing differences of both high- and low-abundance taxa is best done with a non-linear axis. Options are:
These methods allow visualization of both high- and low-abundance
taxa simultaneously, without complaint about 'zero' count
observations. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Patterns are added using the fillpattern R package. Options are "brick",
"chevron", "fish", "grid", "herringbone", "hexagon", "octagon",
"rain", "saw", "shingle", "rshingle", "stripe", and "wave",
optionally abbreviated and/or suffixed with modifiers. For example,
"hex10_sm" for the hexagon pattern rotated 10 degrees and shrunk by 2x.
See fillpattern::fill_pattern() for complete documentation of options.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other taxa_abundance:
sample_sums(),
taxa_clusters(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked(),
taxa_stats(),
taxa_sums(),
taxa_table()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) biom <- rarefy(hmp50) taxa_boxplot(biom, stat.by = "Body Site", stripe = TRUE) taxa_boxplot(biom, layers = "bed", rank = c("Phylum", "Genus"), flip = TRUE) taxa_boxplot( biom = subset(biom, `Body Site` %in% c('Saliva', 'Stool')), taxa = 3, layers = "ps", stat.by = "Body Site", colors = c('Saliva' = "blue", 'Stool' = "red") )library(rbiom) biom <- rarefy(hmp50) taxa_boxplot(biom, stat.by = "Body Site", stripe = TRUE) taxa_boxplot(biom, layers = "bed", rank = c("Phylum", "Genus"), flip = TRUE) taxa_boxplot( biom = subset(biom, `Body Site` %in% c('Saliva', 'Stool')), taxa = 3, layers = "ps", stat.by = "Body Site", colors = c('Saliva' = "blue", 'Stool' = "red") )
Cluster samples by taxa abundances k-means.
taxa_clusters(biom, rank = ".otu", k = 5, ...)taxa_clusters(biom, rank = ".otu", k = 5, ...)
biom |
An rbiom object, or any value accepted by |
rank |
Which taxa rank to use. E.g. |
k |
Number of clusters. Default: |
... |
Passed on to |
A numeric factor assigning samples to clusters.
Other taxa_abundance:
sample_sums(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked(),
taxa_stats(),
taxa_sums(),
taxa_table()
Other clustering:
bdiv_clusters()
library(rbiom) biom <- rarefy(hmp50) biom$metadata$otu_cluster <- taxa_clusters(biom) pull(biom, 'otu_cluster')[1:10] bdiv_ord_plot(biom, layers = "p", stat.by = "otu_cluster")library(rbiom) biom <- rarefy(hmp50) biom$metadata$otu_cluster <- taxa_clusters(biom) pull(biom, 'otu_cluster')[1:10] bdiv_ord_plot(biom, layers = "p", stat.by = "otu_cluster")
Visualize taxa abundance with scatterplots and trendlines.
taxa_corrplot( biom, x, rank = -1, layers = "tc", taxa = 6, lineage = FALSE, unc = "singly", other = FALSE, stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "emmeans", fit = "gam", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", ties = "random", seed = 0, alt = "!=", mu = 0, caption = TRUE, check = FALSE, ... )taxa_corrplot( biom, x, rank = -1, layers = "tc", taxa = 6, lineage = FALSE, unc = "singly", other = FALSE, stat.by = NULL, facet.by = NULL, colors = TRUE, shapes = TRUE, test = "emmeans", fit = "gam", at = NULL, level = 0.95, p.adj = "fdr", transform = "none", ties = "random", seed = 0, alt = "!=", mu = 0, caption = TRUE, check = FALSE, ... )
biom |
An rbiom object, or any value accepted by |
x |
Dataset field with the x-axis values. Equivalent to the |
rank |
What rank(s) of taxa to display. E.g. |
layers |
One or more of
|
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply to calculated values. Options are:
|
ties |
When |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
A ggplot2 plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data,
$code, $stats, and $stats$code, respectively.
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
Other taxa_abundance:
sample_sums(),
taxa_boxplot(),
taxa_clusters(),
taxa_heatmap(),
taxa_stacked(),
taxa_stats(),
taxa_sums(),
taxa_table()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_heatmap(),
taxa_stacked()
library(rbiom) biom <- rarefy(subset(hmp50, `Body Site` %in% c('Buccal mucosa', 'Saliva'))) taxa_corrplot(biom, x = "BMI", stat.by = "Body Site", taxa = 'Streptococcus')library(rbiom) biom <- rarefy(subset(hmp50, `Body Site` %in% c('Buccal mucosa', 'Saliva'))) taxa_corrplot(biom, x = "BMI", stat.by = "Body Site", taxa = 'Streptococcus')
Display taxa abundances as a heatmap.
taxa_heatmap( biom, rank = -1, taxa = 6, tracks = NULL, grid = "bilbao", other = FALSE, unc = "singly", lineage = FALSE, label = TRUE, label_size = NULL, rescale = "none", trees = TRUE, clust = "complete", dist = "euclidean", asp = 1, tree_height = 10, track_height = 10, legend = "right", title = TRUE, xlab.angle = "auto", ... )taxa_heatmap( biom, rank = -1, taxa = 6, tracks = NULL, grid = "bilbao", other = FALSE, unc = "singly", lineage = FALSE, label = TRUE, label_size = NULL, rescale = "none", trees = TRUE, clust = "complete", dist = "euclidean", asp = 1, tree_height = 10, track_height = 10, legend = "right", title = TRUE, xlab.angle = "auto", ... )
biom |
An rbiom object, or any value accepted by |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
tracks |
A character vector of metadata fields to display as tracks
at the top of the plot. Or, a list as expected by the |
grid |
Color palette name, or a list as expected |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
label |
Label the matrix rows and columns. You can supply a list
or logical vector of length two to control row labels and column
labels separately, for example
|
label_size |
The font size to use for the row and column labels. You
can supply a numeric vector of length two to control row label sizes
and column label sizes separately, for example
|
rescale |
Rescale rows or columns to all have a common min/max.
Options: |
trees |
Draw a dendrogram for rows (left) and columns (top). You can
supply a list or logical vector of length two to control the row tree
and column tree separately, for example
|
clust |
Clustering algorithm for reordering the rows and columns by
similarity. You can supply a list or character vector of length two to
control the row and column clustering separately, for example
Default: |
dist |
Distance algorithm to use when reordering the rows and columns
by similarity. You can supply a list or character vector of length
two to control the row and column clustering separately, for example
Default: |
asp |
Aspect ratio (height/width) for entire grid.
Default: |
tree_height, track_height
|
The height of the dendrogram or annotation
tracks as a percentage of the overall grid size. Use a numeric vector
of length two to assign |
legend |
Where to place the legend. Options are: |
title |
Plot title. Set to |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
... |
Additional arguments to pass on to ggplot2::theme(). |
A ggplot2 plot. The computed data points and ggplot
command are available as $data and $code,
respectively.
Metadata can be displayed as colored tracks above the heatmap. Common use cases are provided below, with more thorough documentation available at https://cmmr.github.io/rbiom .
## Categorical ----------------------------
tracks = "Body Site"
tracks = list('Body Site' = "bright")
tracks = list('Body Site' = c('Stool' = "blue", 'Saliva' = "green"))
## Numeric --------------------------------
tracks = "Age"
tracks = list('Age' = "reds")
## Multiple Tracks ------------------------
tracks = c("Body Site", "Age")
tracks = list('Body Site' = "bright", 'Age' = "reds")
tracks = list(
'Body Site' = c('Stool' = "blue", 'Saliva' = "green"),
'Age' = list('colors' = "reds") )
The following entries in the track definitions are understood:
colors - A pre-defined palette name or custom set of colors to map to.
range - The c(min,max) to use for scale values.
label - Label for this track. Defaults to the name of this list element.
side - Options are "top" (default) or "left".
na.color - The color to use for NA values.
bins - Bin a gradient into this many bins/steps.
guide - A list of arguments for guide_colorbar() or guide_legend().
All built-in color palettes are colorblind-friendly.
Categorical palette names: "okabe", "carto", "r4",
"polychrome", "tol", "bright", "light",
"muted", "vibrant", "tableau", "classic",
"alphabet", "tableau20", "kelly", and "fishy".
Numeric palette names: "reds", "oranges", "greens",
"purples", "grays", "acton", "bamako",
"batlow", "bilbao", "buda", "davos",
"devon", "grayC", "hawaii", "imola",
"lajolla", "lapaz", "nuuk", "oslo",
"tokyo", "turku", "bam", "berlin",
"broc", "cork", "lisbon", "roma",
"tofino", "vanimo", and "vik".
Other taxa_abundance:
sample_sums(),
taxa_boxplot(),
taxa_clusters(),
taxa_corrplot(),
taxa_stacked(),
taxa_stats(),
taxa_sums(),
taxa_table()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_stacked()
library(rbiom) # Subset to 10 samples and rarefy them. hmp10 <- rarefy(hmp50[1:10]) taxa_heatmap(hmp10, rank = "Phylum", tracks = "Body Site") taxa_heatmap(hmp10, rank = "Genus", tracks = c("sex", "bo")) taxa_heatmap(hmp10, rank = "Phylum", tracks = list( 'Sex' = list(colors = c(m = "#0000FF", f = "violetred")), 'Body Site' = list(colors = "muted", label = "Source") ))library(rbiom) # Subset to 10 samples and rarefy them. hmp10 <- rarefy(hmp50[1:10]) taxa_heatmap(hmp10, rank = "Phylum", tracks = "Body Site") taxa_heatmap(hmp10, rank = "Genus", tracks = c("sex", "bo")) taxa_heatmap(hmp10, rank = "Phylum", tracks = list( 'Sex' = list(colors = c(m = "#0000FF", f = "violetred")), 'Body Site' = list(colors = "muted", label = "Source") ))
Map OTUs names to taxa names at a given rank.
taxa_map( biom, rank = NULL, taxa = Inf, unc = "singly", lineage = FALSE, other = FALSE )taxa_map( biom, rank = NULL, taxa = Inf, unc = "singly", lineage = FALSE, other = FALSE )
biom |
An rbiom object, or any value accepted by |
rank |
When |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
A tibble data.frame when rank=NULL, or a character vector named
with the OTU names.
pull.rbiom()
library(rbiom) library(dplyr, warn.conflicts = FALSE) # In $taxonomy, .otu is the first column (like a row identifier) ----- hmp50$taxonomy %>% head(4) # In taxa_map, .otu is the last column (most precise rank) ----------- taxa_map(hmp50) %>% head(4) # Generate an OTU to Genus mapping ----------------------------------- taxa_map(hmp50, "Genus") %>% head(4) # Sometimes taxonomic names are incomplete ---------------------------- otus <- c('GemAsacc', 'GcbBacte', 'Unc58411') taxa_map(hmp50, unc = "asis") %>% filter(.otu %in% otus) %>% select(Phylum:.otu) # rbiom can replace them with unique placeholders --------------------- taxa_map(hmp50, unc = "singly") %>% filter(.otu %in% otus) %>% select(Class:.otu) # Or collapse them into groups ---------------------------------------- taxa_map(hmp50, unc = "grouped") %>% filter(.otu %in% otus) %>% select(Class:Genus)library(rbiom) library(dplyr, warn.conflicts = FALSE) # In $taxonomy, .otu is the first column (like a row identifier) ----- hmp50$taxonomy %>% head(4) # In taxa_map, .otu is the last column (most precise rank) ----------- taxa_map(hmp50) %>% head(4) # Generate an OTU to Genus mapping ----------------------------------- taxa_map(hmp50, "Genus") %>% head(4) # Sometimes taxonomic names are incomplete ---------------------------- otus <- c('GemAsacc', 'GcbBacte', 'Unc58411') taxa_map(hmp50, unc = "asis") %>% filter(.otu %in% otus) %>% select(Phylum:.otu) # rbiom can replace them with unique placeholders --------------------- taxa_map(hmp50, unc = "singly") %>% filter(.otu %in% otus) %>% select(Class:.otu) # Or collapse them into groups ---------------------------------------- taxa_map(hmp50, unc = "grouped") %>% filter(.otu %in% otus) %>% select(Class:Genus)
Display taxa abundances as a stacked bar graph.
taxa_stacked( biom, rank = -1, taxa = 6, colors = TRUE, patterns = FALSE, label.by = NULL, order.by = NULL, facet.by = NULL, dist = "euclidean", clust = "complete", other = TRUE, unc = "singly", lineage = FALSE, xlab.angle = 90, ... )taxa_stacked( biom, rank = -1, taxa = 6, colors = TRUE, patterns = FALSE, label.by = NULL, order.by = NULL, facet.by = NULL, dist = "euclidean", clust = "complete", other = TRUE, unc = "singly", lineage = FALSE, xlab.angle = 90, ... )
biom |
An rbiom object, or any value accepted by |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
colors, patterns
|
A character vector of colors or patterns to use in
the graph. A named character vector can be used to map taxon names to
specific colors or patterns. Set to |
label.by, order.by
|
What metadata column to use for labeling and/or
sorting the samples across the x-axis. Set |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
dist, clust
|
Distance ( |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
... |
Parameters for underlying functions. Prefixing parameter names with a layer name ensures that a particular parameter is passed to, and only to, that layer. |
If biom is rarefied, then relative abundance will be shown on the y-axis.
Otherwise, raw abundance will be displayed.
A ggplot2 plot. The computed data points and ggplot
command are available as $data and $code,
respectively.
Other taxa_abundance:
sample_sums(),
taxa_boxplot(),
taxa_clusters(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stats(),
taxa_sums(),
taxa_table()
Other visualization:
adiv_boxplot(),
adiv_corrplot(),
bdiv_boxplot(),
bdiv_corrplot(),
bdiv_heatmap(),
bdiv_ord_plot(),
plot_heatmap(),
rare_corrplot(),
rare_multiplot(),
rare_stacked(),
stats_boxplot(),
stats_corrplot(),
taxa_boxplot(),
taxa_corrplot(),
taxa_heatmap()
library(rbiom) biom <- rarefy(hmp50) taxa_stacked(biom, rank="Phylum") taxa_stacked(biom, rank = "genus", facet.by = "body site")library(rbiom) biom <- rarefy(hmp50) taxa_stacked(biom, rank="Phylum") taxa_stacked(biom, rank = "genus", facet.by = "body site")
A convenience wrapper for taxa_table() + stats_table().
taxa_stats( biom, regr = NULL, stat.by = NULL, rank = -1, taxa = 6, lineage = FALSE, unc = "singly", other = FALSE, split.by = NULL, transform = "none", test = "emmeans", fit = "gam", at = NULL, level = 0.95, alt = "!=", mu = 0, p.adj = "fdr" )taxa_stats( biom, regr = NULL, stat.by = NULL, rank = -1, taxa = 6, lineage = FALSE, unc = "singly", other = FALSE, split.by = NULL, transform = "none", test = "emmeans", fit = "gam", at = NULL, level = 0.95, alt = "!=", mu = 0, p.adj = "fdr" )
biom |
An rbiom object, or any value accepted by |
regr |
Dataset field with the x-axis (independent; predictive)
values. Must be numeric. Default: |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
transform |
Transformation to apply to calculated values. Options are:
|
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
A tibble data.frame with fields from the table below. This tibble
object provides the $code operator to print the R code used to generate
the statistics.
| Field | Description |
| .mean | Estimated marginal mean. See emmeans::emmeans(). |
| .mean.diff | Difference in means. |
| .slope | Trendline slope. See emmeans::emtrends(). |
| .slope.diff | Difference in slopes. |
| .h1 | Alternate hypothesis. |
| .p.val | Probability that null hypothesis is correct. |
| .adj.p | .p.val after adjusting for multiple comparisons. |
| .effect.size | Effect size. See emmeans::eff_size(). |
| .lower | Confidence interval lower bound. |
| .upper | Confidence interval upper bound. |
| .se | Standard error. |
| .n | Number of samples. |
| .df | Degrees of freedom. |
| .stat | Wilcoxon or Kruskal-Wallis rank sum statistic. |
| .t.ratio | .mean / .se |
| .r.sqr | Percent of variation explained by the model. |
| .adj.r | .r.sqr, taking degrees of freedom into account. |
| .aic | Akaike Information Criterion (predictive models). |
| .bic | Bayesian Information Criterion (descriptive models). |
| .loglik | Log-likelihood goodness-of-fit score. |
| .fit.p | P-value for observing this fit by chance. |
Other taxa_abundance:
sample_sums(),
taxa_boxplot(),
taxa_clusters(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked(),
taxa_sums(),
taxa_table()
Other stats_tables:
adiv_stats(),
bdiv_stats(),
distmat_stats(),
stats_table()
library(rbiom) biom <- rarefy(hmp50) taxa_stats(biom, stat.by = "Body Site", rank = "Family")[,1:6]library(rbiom) biom <- rarefy(hmp50) taxa_stats(biom, stat.by = "Body Site", rank = "Family")[,1:6]
Get summary taxa abundances.
taxa_sums( biom, rank = -1, sort = NULL, lineage = FALSE, unc = "singly", transform = "none" ) taxa_means( biom, rank = -1, sort = NULL, lineage = FALSE, unc = "singly", transform = "none" ) taxa_apply( biom, FUN, rank = -1, sort = NULL, lineage = FALSE, unc = "singly", transform = "none", ... )taxa_sums( biom, rank = -1, sort = NULL, lineage = FALSE, unc = "singly", transform = "none" ) taxa_means( biom, rank = -1, sort = NULL, lineage = FALSE, unc = "singly", transform = "none" ) taxa_apply( biom, FUN, rank = -1, sort = NULL, lineage = FALSE, unc = "singly", transform = "none", ... )
biom |
An rbiom object, or any value accepted by |
rank |
What rank(s) of taxa to display. E.g. |
sort |
Sort the result. Options: |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
transform |
Transformation to apply to calculated values. Options are:
|
FUN |
The function to apply to each row of the |
... |
Optional arguments to |
For taxa_sums and taxa_means, a named numeric vector.
For taxa_apply, a named vector or list with the results of FUN.
The names are the taxa IDs.
Other taxa_abundance:
sample_sums(),
taxa_boxplot(),
taxa_clusters(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked(),
taxa_stats(),
taxa_table()
library(rbiom) taxa_sums(hmp50) %>% head(4) taxa_means(hmp50, 'Family') %>% head(5) taxa_apply(hmp50, max) %>% head(5) taxa_apply(hmp50, fivenum) %>% head(5)library(rbiom) taxa_sums(hmp50) %>% head(4) taxa_means(hmp50, 'Family') %>% head(5) taxa_apply(hmp50, max) %>% head(5) taxa_apply(hmp50, fivenum) %>% head(5)
taxa_matrix() - Accepts a single rank and returns a matrix.
taxa_table() - Can accept more than one rank and returns a tibble data.frame.
taxa_table( biom, rank = -1, taxa = 6, lineage = FALSE, md = ".all", unc = "singly", other = FALSE, transform = "none", ties = "random", seed = 0 ) taxa_matrix( biom, rank = -1, taxa = NULL, lineage = FALSE, sparse = FALSE, unc = "singly", other = FALSE, transform = "none", ties = "random", seed = 0 )taxa_table( biom, rank = -1, taxa = 6, lineage = FALSE, md = ".all", unc = "singly", other = FALSE, transform = "none", ties = "random", seed = 0 ) taxa_matrix( biom, rank = -1, taxa = NULL, lineage = FALSE, sparse = FALSE, unc = "singly", other = FALSE, transform = "none", ties = "random", seed = 0 )
biom |
An rbiom object, or any value accepted by |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
md |
Dataset field(s) to include in the output data frame, or |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
transform |
Transformation to apply to calculated values. Options are:
|
ties |
When |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
sparse |
If |
taxa_matrix() - A numeric matrix with taxa as rows, and samples as columns.
taxa_table() - A tibble data frame with column names .sample, .taxa, .abundance, and any requested by md.
Other taxa_abundance:
sample_sums(),
taxa_boxplot(),
taxa_clusters(),
taxa_corrplot(),
taxa_heatmap(),
taxa_stacked(),
taxa_stats(),
taxa_sums()
library(rbiom) hmp50$ranks taxa_matrix(hmp50, 'Phylum')[1:4,1:6] taxa_table(hmp50, 'Phylum')library(rbiom) hmp50$ranks taxa_matrix(hmp50, 'Phylum')[1:4,1:6] taxa_table(hmp50, 'Phylum')
Create a subtree by specifying tips to keep.
tree_subset(tree, tips, underscores = FALSE)tree_subset(tree, tips, underscores = FALSE)
tree |
A phylo object, as returned from |
tips |
A character, numeric, or logical vector of tips to keep. |
underscores |
When parsing the tree, should underscores be kept as
is? By default they will be converted to spaces (unless the entire ID
is quoted). Default |
A phylo object for the subtree.
Other phylogeny:
read_tree()
library(rbiom) infile <- system.file("extdata", "newick.tre", package = "rbiom") tree <- read_tree(infile) tree subtree <- tree_subset(tree, tips = head(tree$tip.label)) subtreelibrary(rbiom) infile <- system.file("extdata", "newick.tre", package = "rbiom") tree <- read_tree(infile) tree subtree <- tree_subset(tree, tips = head(tree$tip.label)) subtree
with() will return the result of your expression. within() will return
an rbiom object.
## S3 method for class 'rbiom' with(data, expr, ...) ## S3 method for class 'rbiom' within(data, expr, clone = TRUE, ...)## S3 method for class 'rbiom' with(data, expr, ...) ## S3 method for class 'rbiom' within(data, expr, clone = TRUE, ...)
data |
An rbiom object, such as from |
expr |
Passed on to |
... |
Not used. |
clone |
Create a copy of |
See description.
Other transformations:
biom_inflate(),
biom_relativize(),
biom_rescale(),
modify_metadata,
rarefy(),
slice_metadata,
subset()
library(rbiom) with(hmp50, table(`Body Site`, Sex)) biom <- within(hmp50, { age_bin = cut(Age, 5) bmi_bin = cut(BMI, 5) }) biom$metadatalibrary(rbiom) with(hmp50, table(`Body Site`, Sex)) biom <- within(hmp50, { age_bin = cut(Age, 5) bmi_bin = cut(BMI, 5) }) biom$metadata
Automatically creates directories and adds compression based on file name.
write_biom() - According to BIOM format specification.
write_xlsx() - Raw data and summary tables in Excel file format. See details.
write_fasta() - Sequences only in fasta format. biom may also be a named character vector.
write_tree() - Phylogenetic tree only in newick format. biom may also be a phylo object.
write_counts(), write_metadata(), write_taxonomy() - Tab-separated values.
write_biom(biom, file, format = "json") write_metadata(biom, file, quote = FALSE, sep = "\t", ...) write_counts(biom, file, quote = FALSE, sep = "\t", ...) write_taxonomy(biom, file, quote = FALSE, sep = "\t", ...) write_fasta(biom, file = NULL) write_tree(biom, file = NULL) write_xlsx(biom, file, depth = NULL, seed = 0, unc = "singly")write_biom(biom, file, format = "json") write_metadata(biom, file, quote = FALSE, sep = "\t", ...) write_counts(biom, file, quote = FALSE, sep = "\t", ...) write_taxonomy(biom, file, quote = FALSE, sep = "\t", ...) write_fasta(biom, file = NULL) write_tree(biom, file = NULL) write_xlsx(biom, file, depth = NULL, seed = 0, unc = "singly")
biom |
An rbiom object, or any value accepted by |
file |
Path to the output file. File names ending in |
format |
Options are |
quote, sep, ...
|
Parameters passed on to |
depth |
Passed on to |
seed |
Random seed to use in rarefying. See |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
For write_xlsx(), attributes(biom) are saved as additional worksheets if
the attribute is a data frame, matrix, or dist -class object. An attribute
named 'Reads Per Step' is treated specially and merged with the usual 'Reads
Per Sample' tab.
The normalized filepath that was written to (invisibly), unless
file=NULL (see file argument above).
library(rbiom) write_tree(hmp50) %>% substr(1, 50) if (FALSE) { hmp10 <- hmp50$clone() hmp10$counts <- hmp10$counts[,1:10] %>% rarefy_cols() attr(hmp10, "Weighted UniFrac") <- bdiv_distmat(hmp10, 'wunifrac') attr(hmp10, "Jaccard") <- bdiv_distmat(hmp10, 'jaccard') outfile <- write_xlsx(hmp10, tempfile(fileext = ".xlsx")) }library(rbiom) write_tree(hmp50) %>% substr(1, 50) if (FALSE) { hmp10 <- hmp50$clone() hmp10$counts <- hmp10$counts[,1:10] %>% rarefy_cols() attr(hmp10, "Weighted UniFrac") <- bdiv_distmat(hmp10, 'wunifrac') attr(hmp10, "Jaccard") <- bdiv_distmat(hmp10, 'jaccard') outfile <- write_xlsx(hmp10, tempfile(fileext = ".xlsx")) }