amr_upset: Generate a series of plots for AMR gene and combination analysis
Source:R/amr_upset.R
amr_upset.Rd
This function generates a set of visualizations to analyze AMR gene combinations, MIC values, and gene prevalence from a given binary matrix. It creates several plots, including MIC distributions, a bar plot for the percentage of strains per combination, a dot plot for gene combinations in strains, and a plot for gene prevalence. It also outputs a table summarising the MIC distribution (median, lower, upper) and number resistant, for each marker combination.
Usage
amr_upset(
binary_matrix,
min_set_size = 2,
order = "",
plot_set_size = FALSE,
plot_category = TRUE,
print_category_counts = FALSE,
print_set_size = FALSE,
boxplot_colour = "grey",
assay = "mic"
)
Arguments
- binary_matrix
A data frame containing the original binary matrix output from the
get_binary_matrix
function. Expected columns are and identifier (column 1, any name); 'pheno' (class sir, with S/I/R categories to colour points), 'mic' (class mic, with MIC values to plot), and other columns representing gene presence/absence (binary coded, ie 1=present, 0=absent).- min_set_size
An integer specifying the minimum size for a gene set to be included in the analysis and plots. Default is 2. Only genes with at least this number of occurrences are included in the plots.
- order
A character string indicating the order of the combinations on the x-axis. Options are: - "": Default (decreasing frequency of combinations), - "genes": Order by the number of genes in each combination. - "value": Order by the median assay value (MIC or disk zone) for each combination.
- plot_set_size
Logical indicating whether to include a bar plot showing the set size (i.e. number of times each combination of markers is observed). Default is FALSE.
- plot_category
Logical indicating whether to include a stacked bar plot showing, for each marker combination, the proportion of samples with each phenotype classification (specified by the 'pheno' column in the input file). Default is TRUE.
- print_category_counts
Logical indicating whether, if plot_category is set to true, to print the number of strains in each resistance category, for each marker combination in the plot (default FALSE).
- print_set_size
Logical indicating whether, if plot_set_size is set to true, to print the number of strains with marker combination on the plot (default FALSE).
- boxplot_colour
Colour for lines of the box plots summarising the MIC distribution for each marker combination, (default "grey").
- assay
A character string indicating whether to plot MIC or disk diffusion data. - "mic": Plot MIC data, stored in column 'mic' of class 'mic'. - "disk": Plot disk diffusion data, stored in column 'disk' of class 'disk'.
Value
A list containing the following elements:
- plot
A grid of plots displaying: (i) grid showing the marker combinations observed, MIC distribution per marker combination, frequency per marker and (optionally) phenotype classification and/or number of samples for each marker combination.
- summary
Summary of each marker combination observed, including median MIC (and interquartile range) and positive predictive value for resistance (R).
Details
This function processes the provided binary matrix (binary_matrix
), which is expected to contain data on gene
presence/absence, MIC values, and phenotype calls (S/I/R) (can be generated using get_binary_matrix
).
The function also includes an analysis of gene prevalence and an ordering option for visualizing combinations
in different ways.
Examples
if (FALSE) { # \dontrun{
# Example usage
ecoli_geno <- import_amrfp(ecoli_geno_raw, "Name")
binary_matrix<- get_binary_matrix(geno_table=ecoli_geno,
pheno_table=ecoli_ast,
antibiotic="Ciprofloxacin",
drug_class_list=c("Quinolones"),
sir_col="pheno",
keep_assay_values=TRUE,
keep_assay_values_from = "mic"
)
amr_upset(binary_matrix, min_set_size = 3, order = "value", assay="mic")
} # }