Skip to content

Commit e84554f

Browse files
authored
Merge pull request #179 from pachterlab/devel
Version 0.30.0
2 parents 8308dfd + bc0fddc commit e84554f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+2144
-578
lines changed

CHANGELOG.md

+32
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,35 @@
1+
# version 0.30.0
2+
3+
This version integrates [p-value aggregation](https://github.com/pachterlab/sleuth/pull/148) as described in [Yi et al.](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1419-z).
4+
The behavior of gene-level differential expression testing now follows this procedure:
5+
6+
1. Isoform-level testing.
7+
2. P-value aggregation at the gene level (using `target_mapping`) by the lancaster method.
8+
9+
Thank you to [Lynn Yi](https://github.com/lynnyi) for implementing p-value aggregation.
10+
Please see [pull request #148](https://github.com/pachterlab/sleuth/pull/148) for details.
11+
12+
The API has also slightly changed. Particularly, for `sleuth_prep`, several options have been moved to optional arguments via `...`. See [pull request #168](https://github.com/pachterlab/sleuth/pull/168) for more information or `?sleuth_prep` in R.
13+
14+
A fair amount of speed up and bug fixes have also been implemented.
15+
16+
- [Patch: bugs in sleuth_results & other miscellaneous fixes](https://github.com/pachterlab/sleuth/pull/163)
17+
- [Fix behavior of sleuth_results when gene_mode is TRUE (and error reporting)](https://github.com/pachterlab/sleuth/pull/160)
18+
- [Shiny and Plot Fixes / Enhancements](https://github.com/pachterlab/sleuth/pull/159)
19+
- [Quick Patch: UseMethod typo](https://github.com/pachterlab/sleuth/pull/157)
20+
- [Update `write_kallisto_hdf5` function and add ability ot subset kallisto object (address #131)](https://github.com/pachterlab/sleuth/pull/150)
21+
- [extend sleuth to model TPMs](https://github.com/pachterlab/sleuth/pull/145)
22+
- [Fixes to various miscellaneous issues (#73, #84, #97, #122, #135, #142)](https://github.com/pachterlab/sleuth/pull/144)
23+
- [Improvements to shiny and plot functions (solving several open issues)](https://github.com/pachterlab/sleuth/pull/143)
24+
- [Possible solution to NAs in sleuth_lrt, addressing #68](https://github.com/pachterlab/sleuth/pull/118)
25+
- [bug fix patches](https://github.com/pachterlab/sleuth/pull/117)
26+
- [address #113 - patch bug where TPM bootstrap summary target_ids are moved](https://github.com/pachterlab/sleuth/pull/116)
27+
- [New tests for ".N" target mappings](https://github.com/pachterlab/sleuth/pull/115)
28+
- [Misc bug fixes + Allow sleuth_prep to process just one sample](https://github.com/pachterlab/sleuth/pull/114)
29+
30+
A major thanks to [Warren McGee](https://github.com/warrenmcg) for doing the majority of the heavy lifting on all of the bug fixes.
31+
32+
133
# version 0.29.0
234

335
This version has numerous bug fixes and several performance upgrades.

DESCRIPTION

+12-7
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,36 @@
11
Package: sleuth
22
Title: Tools for investigating RNA-Seq
3-
Version: 0.29.0
4-
Authors@R: c(person("Harold", "Pimentel", , "[email protected]", role = c("aut", "cre")))
3+
Version: 0.30.0
4+
Authors@R: c(
5+
person("Harold", "Pimentel", , "[email protected]", role = c("aut", "cre")),
6+
person("Warren", "McGee", , "[email protected]", role = "aut"))
57
Description: Investigate transcript abundance from "kallisto" and differential
68
expression analysis from RNA-Seq data.
79
License: GPL-3
10+
Encoding: UTF-8
811
LazyData: true
912
URL: https://github.com/pachterlab/sleuth
1013
BugReports: https://github.com/pachterlab/sleuth/issues
1114
Depends:
1215
R (>= 3.2.1),
13-
methods,
14-
ggplot2,
15-
dplyr
16+
methods
1617
Imports:
18+
ggplot2,
19+
dplyr,
1720
data.table,
1821
tidyr,
1922
reshape2,
2023
rhdf5,
2124
parallel,
2225
lazyeval,
2326
matrixStats,
24-
shiny
27+
pheatmap,
28+
shiny,
29+
aggregation
2530
Suggests:
2631
MASS,
2732
lintr,
2833
testthat,
2934
knitr
3035
VignetteBuilder: knitr
31-
RoxygenNote: 5.0.1
36+
RoxygenNote: 6.0.1

NAMESPACE

+16-4
Original file line numberDiff line numberDiff line change
@@ -6,27 +6,35 @@ S3method(bias_table,sleuth)
66
S3method(get_bootstraps,kallisto)
77
S3method(get_bootstraps,sleuth)
88
S3method(head,kallisto)
9+
S3method(is_kallisto_subset,kallisto)
10+
S3method(is_kallisto_subset,sleuth)
911
S3method(models,sleuth)
1012
S3method(models,sleuth_model)
1113
S3method(plot_fld,kallisto)
1214
S3method(plot_fld,sleuth)
1315
S3method(print,kallisto)
1416
S3method(print,sleuth)
1517
S3method(print,sleuth_model)
18+
S3method(subset_kallisto,kallisto)
19+
S3method(subset_kallisto,sleuth)
1620
S3method(summary,sleuth)
1721
S3method(tests,sleuth)
18-
export("transform_fun<-")
22+
S3method(transform_status,sleuth)
23+
S3method(transform_status,sleuth_model)
24+
export("transform_fun_counts<-")
25+
export("transform_fun_tpm<-")
1926
export(basic_filter)
2027
export(bias_table)
21-
export(bs_sigma_summary)
2228
export(counts_to_fpkm)
2329
export(counts_to_tpm)
2430
export(design_matrix)
2531
export(enclosed_brush)
32+
export(excluded_ids)
2633
export(extract_model)
2734
export(get_bootstrap_summary)
2835
export(get_bootstraps)
2936
export(get_quantile)
37+
export(is_kallisto_subset)
3038
export(kallisto_table)
3139
export(log_transform)
3240
export(melt_bootstrap_sleuth)
@@ -64,13 +72,17 @@ export(sleuth_save)
6472
export(sleuth_to_matrix)
6573
export(sleuth_wt)
6674
export(sliding_window_grouping)
75+
export(subset_kallisto)
6776
export(tests)
6877
export(tpm_to_alpha)
6978
export(transcripts_from_gene)
7079
export(transform_status)
71-
export(transform_status.sleuth)
72-
export(transform_status.sleuth_model)
80+
export(write_kallisto_hdf5)
7381
import(dplyr)
82+
import(ggplot2)
7483
importFrom(data.table,fread)
84+
importFrom(dplyr,"%>%")
7585
importFrom(lazyeval,interp)
7686
importFrom(lazyeval,lazy)
87+
importFrom(rhdf5,h5write)
88+
importFrom(rhdf5,h5write.default)

R/bootstrap.R

+35-20
Original file line numberDiff line numberDiff line change
@@ -101,10 +101,12 @@ get_bootstraps.kallisto <- function(kal, transcript, max_bs = 30) {
101101
# @param kal a kallisto object
102102
# @param column the column to pull out of the kallisto results (default = "tpm")
103103
# @return a molten data.frame with columns "target_id", "sample" and the selected variable
104+
# @importFrom dplyr %>%
104105
# @export
105106
melt_bootstrap <- function(kal, column = "tpm", transform = identity) {
106107
stopifnot(is(kal, "kallisto"))
107108
stopifnot(length(kal$bootstrap) > 0)
109+
`%>%` <- dplyr::`%>%`
108110

109111
all_boot <- kal$bootstrap
110112
boot <- data.frame(lapply(all_boot, select_, .dots = list(column)))
@@ -129,11 +131,13 @@ melt_bootstrap <- function(kal, column = "tpm", transform = identity) {
129131
# @param aggregate_fun a function to aggregate
130132
# @return a data.frame nrow(mapping) rows that has been aggregated
131133
# groupwise using \code{aggregate_fun}
134+
# @importFrom dplyr %>%
132135
# @export
133136
aggregate_bootstrap <- function(kal, mapping, split_by = "gene_id",
134137
column = "tpm", aggregate_fun = sum) {
135138

136139
stopifnot( is(kal, "kallisto") )
140+
`%>%` <- dplyr::`%>%`
137141

138142
if ( !(column %in% c("tpm", "est_counts")) ) {
139143
stop("Unit must be 'tpm' or 'est_counts'")
@@ -177,9 +181,12 @@ aggregate_bootstrap <- function(kal, mapping, split_by = "gene_id",
177181
# @param kal a kallisto object with a non-null bootstrap list
178182
# @param column the column to select (rho, tpm, est_counts
179183
# @return a summarized data.frame
184+
# @importFrom dplyr %>%
180185
# @export
181186
summarize_bootstrap <- function(kal, column = "tpm", transform = identity) {
182187
stopifnot(is(kal, "kallisto"))
188+
`%>%` <- dplyr::`%>%`
189+
183190
bs <- melt_bootstrap(kal, column, transform)
184191

185192
mean_col <- paste0("bs_mean_", column)
@@ -256,11 +263,12 @@ get_bootstrap_summary <- function(obj, target_id, units = 'est_counts') {
256263
stop(paste0("'", units, "' is invalid for 'units'. please see documentation"))
257264
}
258265

259-
if (is.null(obj$bs_quants)) {
260-
if (units == 'est_counts') {
261-
stop("bootstrap summary missing. rerun sleuth_prep() with argument 'extra_bootstrap_summary = TRUE'")
266+
if (is.null(obj$bs_quants) | is.null(obj$bs_quants[[1]][[units]])) {
267+
if (units %in% c('est_counts', 'scaled_reads_per_base')) {
268+
stop("bootstrap summary appears to be missing. rerun sleuth_prep() with argument 'extra_bootstrap_summary = TRUE'")
262269
} else {
263-
stop("bootstrap summary missing. rerun sleuth_prep() with argument 'extra_bootstrap_summary = TRUE' and 'read_bootstrap_tpm = TRUE'")
270+
stop("bootstrap summary appears to be missing. rerun sleuth_prep() with argument 'extra_bootstrap_summary = TRUE' ",
271+
"and 'read_bootstrap_tpm = TRUE'")
264272
}
265273
}
266274

@@ -312,7 +320,7 @@ sample_bootstrap <- function(obj, n_samples = 100L) {
312320
mat <- matrix(NA_real_, nrow = nrow(obj$kal[[1]]$abundance),
313321
ncol = nrow(which_samp))
314322
rownames(mat) <- obj$kal[[1]]$abundance$target_id
315-
colnames(mat) <- obj$sample_to_condition$sample
323+
colnames(mat) <- obj$sample_to_covariates$sample
316324
mat
317325
})
318326

@@ -376,13 +384,15 @@ process_bootstrap <- function(i, samp_name, kal_path,
376384
read_bootstrap_tpm, gene_mode,
377385
extra_bootstrap_summary,
378386
target_id, mappings, which_ids,
379-
aggregation_column, transform_fun)
387+
aggregation_column, transform_fun_counts,
388+
transform_fun_tpm, max_bootstrap)
380389
{
381390
dot(i)
382391
bs_quants <- list()
383392

384393
num_bootstrap <- as.integer(rhdf5::h5read(kal_path$path,
385394
"aux/num_bootstrap"))
395+
num_bootstrap <- min(num_bootstrap, max_bootstrap)
386396
if (num_bootstrap == 0) {
387397
stop(paste0("File ", kal_path, " has no bootstraps.",
388398
"Please generate bootstraps using \"kallisto quant -b\"."))
@@ -396,17 +406,16 @@ process_bootstrap <- function(i, samp_name, kal_path,
396406
est_count_sf = est_count_sf)
397407

398408
if (read_bootstrap_tpm) {
399-
bs_quant_tpm <- aperm(apply(bs_mat, 1, counts_to_tpm,
409+
bs_tpm <- aperm(apply(bs_mat, 1, counts_to_tpm,
400410
eff_len))
401-
colnames(bs_quant_tpm) <- colnames(bs_mat)
411+
colnames(bs_tpm) <- colnames(bs_mat)
402412

403413
# gene level code is analogous here to below code
404414
if (gene_mode) {
405-
colnames(bs_quant_tpm) <- target_id
406415
# Make bootstrap_num an explicit column; each is treated as a "sample"
407416
bs_tpm_df <- data.frame(bootstrap_num = c(1:num_bootstrap),
408-
bs_quant_tpm, check.names = F)
409-
rm(bs_quant_tpm)
417+
bs_tpm, check.names = F)
418+
rm(bs_tpm)
410419
# Make long tidy table; this step is much faster
411420
# using data.table melt rather than tidyr gather
412421
tidy_tpm <- data.table::melt(bs_tpm_df, id.vars = "bootstrap_num",
@@ -423,13 +432,14 @@ process_bootstrap <- function(i, samp_name, kal_path,
423432
# see: http://stackoverflow.com/a/31295592
424433
quant_tpm_formula <- paste("bootstrap_num ~",
425434
aggregation_column)
426-
bs_quant_tpm <- data.table::dcast(tidy_tpm,
435+
bs_tpm <- data.table::dcast(tidy_tpm,
427436
quant_tpm_formula, value.var = "tpm",
428437
fun.aggregate = sum)
429-
bs_quant_tpm <- as.matrix(bs_quant_tpm[, -1])
438+
bs_tpm <- as.matrix(bs_tpm[, -1])
430439
rm(tidy_tpm) # these tables are very large
431440
}
432-
bs_quant_tpm <- aperm(apply(bs_quant_tpm, 2,
441+
bs_tpm <- transform_fun_tpm(bs_tpm[, which_ids])
442+
bs_quant_tpm <- aperm(apply(bs_tpm, 2,
433443
quantile))
434444
colnames(bs_quant_tpm) <- c("min", "lower", "mid",
435445
"upper", "max")
@@ -483,6 +493,7 @@ process_bootstrap <- function(i, samp_name, kal_path,
483493
rm(tidy_bs, scaled_bs)
484494
}
485495

496+
bs_mat <- transform_fun_counts(bs_mat[, which_ids])
486497
if (extra_bootstrap_summary) {
487498
bs_quant_est_counts <- aperm(apply(bs_mat, 2,
488499
quantile))
@@ -491,20 +502,24 @@ process_bootstrap <- function(i, samp_name, kal_path,
491502
bs_quants$est_counts <- bs_quant_est_counts
492503
}
493504

494-
bs_mat <- transform_fun(bs_mat)
495505
# If bs_mat was made at gene-level, already has column names
496506
# If at transcript-level, need to add target_ids
497-
if(!gene_mode) {
498-
colnames(bs_mat) <- target_id
499-
} else {
507+
if(gene_mode & extra_bootstrap_summary) {
500508
# rename est_counts to scaled_reads_per_base
501509
bs_quants$scaled_reads_per_base <- bs_quants$est_counts
502510
bs_quants$est_counts <- NULL
503511
}
504512
# all_sample_bootstrap[, i] bootstrap point estimate of the inferential
505513
# variability in sample i
506514
# NOTE: we are only keeping the ones that pass the filter
507-
bootstrap_result <- matrixStats::colVars(bs_mat[, which_ids])
515+
bootstrap_result <- matrixStats::colVars(bs_mat)
508516

509-
list(index = i, bs_quants = bs_quants, bootstrap_result = bootstrap_result)
517+
if(read_bootstrap_tpm) {
518+
tpm_result <- matrixStats::colVars(bs_tpm)
519+
list(index = i, bs_quants = bs_quants, bootstrap_result = bootstrap_result,
520+
bootstrap_tpm_result = tpm_result)
521+
} else {
522+
list(index = i, bs_quants = bs_quants, bootstrap_result = bootstrap_result,
523+
bootstrap_tpm_result = NULL)
524+
}
510525
}

0 commit comments

Comments
 (0)