Skip to content

thomsonlab/capblood-seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Capillary Blood Sequencing (capblood-seq)

This is a companion repository and package for the capillary blood single-cell sequencing project from Caltech.

Installing

pip install capblood-seq

Prerequisites

Preprocessing

Usage

This package revolves around the usage of the capblood_seq DATASET object, which you can quickly access with:

import capblood_seq

dataset = capblood_seq.load_dataset()

This will download all relevant data for the paper (both raw and processed) and save it to a local data directory. You can then proceed with any of the functionality in the examples Jupyter notebooks.

There are two intended usage modes for the package:

1. Download raw 10X Cell Ranger data and perform preprocessing yourself

To do this, start with the Preprocessing section below. The raw feature BC matrices will be automatically downloaded, and you can initialize a dataset and process it, which will overwrite any downloaded already processed data.

2. Download processed data and jump into generating figures

To do this, skip the preprocessing and jump straight into running the figure scripts. These scripts will download the already processed data for you.

Demultiplexing

To generate the subject cell labels, we used freemuxlet to identify which cells belong to which subject across the samples.

First, we use popscle to generate known variants for each sample. The script is located in bin/sample_freemuxlet.sh, and downloads all the necessary software to run the pipeline. This was tested on an Amazon Linux 2 instance, AMI amzn2-ami-hvm-2.0.20200917.0-x86_64-gp2. Run this script for each sample you want to demultiplex.

Next, we merge the outputs of each of the sample demultiplexing, retaining sample identifying information in the cell barcode. The Jupyter notebook to do this is examples/Merge Freemuxlet Output.ipynb

Finally, we run freemuxlet to identify 4 distinct samples using the script bin/merged_freemuxlet.sh. Note: this takes a substantial amount of memory (100GB+).

Preprocessing

After receiving the output from 10X Cell Ranger, we convert the data into SCRAP workspaces for downstream manipulation. An example of this is in Initialize Datasets.ipynb.

Then, we preprocess the data through several steps, an example of which is in Preprocessing.ipynb

  1. Filter out any remaining debris, empty droplets, and red blood cells. This is done by a function in scrapi, but a step by step example is in Debris Removal.ipynb
  2. Remove low count genes (any genes that have a maximum count < 3)
  3. Convert gene counts into transcripts per cell
  4. (For visualization only) Transform the normalized gene counts via PCA, and then t-SNE

Cell Type Assignment

After a workspace is created and filtered for debris, we use scVI to learn a latent representation of the cells across all samples, and perform agglomerative clustering to identify cell type clusters. An example of this is in examples/Cell Typing.ipynb

Cell type assignments for externally downloaded datasets are performed in their respective cell typing notebooks - e.g. examples/Cell Typing Hashimoto 2012.ipynb

Cell Subject Assignment

After a workspace is created and filtered for debris, and freemuxlet has assigned cells to subjects, we add it to our workspace label files. An example of this is in examples/Assign Cells to Subject IDs.ipynb

Figures

The figures in the paper were generated by the Jupyter notebook examples in the examples/ directory as below:

  • Figure 1.c: Combined t-SNE.ipynb
  • Figure 1.d: Cell Type Distribution Suburst.ipynb
  • Figure 1.e: Cell Type Percentages By Subject.ipynb and Cell Type Percentages By Subject Venous.ipynb
  • Figure 2.a: Diurnal Gene Detection.ipynb
  • Figure 2.b: Gene AM vs PM Overlapping Histogram.ipynb
  • Figure 2.c: Gene Diurnality Box Plots.ipynb
  • Figure 2.d: Gene Diurnality Box Plots.ipynb
  • Figure 3.a: Individuality and Cell Type Specificity.ipynb
  • Figure 3.b: Diurnal Individual Pathway Enrichment.ipynb
  • Figure 3.c: Gene Expression by Subject Over Time.ipynb
  • Figure 3.c: Gene Subject vs All Overlapping Histogram.ipynb
  • Supplemental Figure 1: Cell Type Marker Gene Violin Plots.ipynb
  • Supplemental Figure 2: Gene Subject vs Others Box Plot.ipynb
  • Supplemental Figure 3: Debris Filtering Analysis.ipynb
  • Supplemental Figure 4: Individuality and Cell Type Specificity vs Bulk.ipynb
  • Supplemental Figure 5: Combined Venous Capillary t-SNE.ipynb
  • Supplemental Figure 6: Cell Typing.ipynb
  • Supplemental Table 1: Diurnal Gene Detection.ipynb
  • Supplemental Table 5: Individuality By Subject.ipynb
  • Supplemental Table 6: Debris Filtering Analysis.ipynb
  • Supplemental Table 7: Cell Typing.ipynb
  • Supplemental Table 8: Combined Venous Capillary t-SNE.ipynb

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published