Capillary Blood Sequencing (capblood-seq)

This is a companion repository and package for the capillary blood single-cell sequencing project from Caltech.

Installing

pip install capblood-seq

Prerequisites

Preprocessing

Cell Ranger v3.0.0 (https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/3.0/)

Usage

This package revolves around the usage of the capblood_seq DATASET object, which you can quickly access with:

import capblood_seq

dataset = capblood_seq.load_dataset()

This will download all relevant data for the paper (both raw and processed) and save it to a local data directory. You can then proceed with any of the functionality in the examples Jupyter notebooks.

There are two intended usage modes for the package:

1. Download raw 10X Cell Ranger data and perform preprocessing yourself

To do this, start with the Preprocessing section below. The raw feature BC matrices will be automatically downloaded, and you can initialize a dataset and process it, which will overwrite any downloaded already processed data.

2. Download processed data and jump into generating figures

To do this, skip the preprocessing and jump straight into running the figure scripts. These scripts will download the already processed data for you.

Demultiplexing

To generate the subject cell labels, we used freemuxlet to identify which cells belong to which subject across the samples.

First, we use popscle to generate known variants for each sample. The script is located in bin/sample_freemuxlet.sh, and downloads all the necessary software to run the pipeline. This was tested on an Amazon Linux 2 instance, AMI amzn2-ami-hvm-2.0.20200917.0-x86_64-gp2. Run this script for each sample you want to demultiplex.

Next, we merge the outputs of each of the sample demultiplexing, retaining sample identifying information in the cell barcode. The Jupyter notebook to do this is examples/Merge Freemuxlet Output.ipynb

Finally, we run freemuxlet to identify 4 distinct samples using the script bin/merged_freemuxlet.sh. Note: this takes a substantial amount of memory (100GB+).

Preprocessing

After receiving the output from 10X Cell Ranger, we convert the data into SCRAP workspaces for downstream manipulation. An example of this is in Initialize Datasets.ipynb.

Then, we preprocess the data through several steps, an example of which is in Preprocessing.ipynb

Filter out any remaining debris, empty droplets, and red blood cells. This is done by a function in scrapi, but a step by step example is in Debris Removal.ipynb
Remove low count genes (any genes that have a maximum count < 3)
Convert gene counts into transcripts per cell
(For visualization only) Transform the normalized gene counts via PCA, and then t-SNE

Cell Type Assignment

After a workspace is created and filtered for debris, we use scVI to learn a latent representation of the cells across all samples, and perform agglomerative clustering to identify cell type clusters. An example of this is in examples/Cell Typing.ipynb

Cell type assignments for externally downloaded datasets are performed in their respective cell typing notebooks - e.g. examples/Cell Typing Hashimoto 2012.ipynb

Cell Subject Assignment

After a workspace is created and filtered for debris, and freemuxlet has assigned cells to subjects, we add it to our workspace label files. An example of this is in examples/Assign Cells to Subject IDs.ipynb

Figures

The figures in the paper were generated by the Jupyter notebook examples in the examples/ directory as below:

Figure 1.c: Combined t-SNE.ipynb
Figure 1.d: Cell Type Distribution Suburst.ipynb
Figure 1.e: Cell Type Percentages By Subject.ipynb and Cell Type Percentages By Subject Venous.ipynb
Figure 2.a: Diurnal Gene Detection.ipynb
Figure 2.b: Gene AM vs PM Overlapping Histogram.ipynb
Figure 2.c: Gene Diurnality Box Plots.ipynb
Figure 2.d: Gene Diurnality Box Plots.ipynb
Figure 3.a: Individuality and Cell Type Specificity.ipynb
Figure 3.b: Diurnal Individual Pathway Enrichment.ipynb
Figure 3.c: Gene Expression by Subject Over Time.ipynb
Figure 3.c: Gene Subject vs All Overlapping Histogram.ipynb
Supplemental Figure 1: Cell Type Marker Gene Violin Plots.ipynb
Supplemental Figure 2: Gene Subject vs Others Box Plot.ipynb
Supplemental Figure 3: Debris Filtering Analysis.ipynb
Supplemental Figure 4: Individuality and Cell Type Specificity vs Bulk.ipynb
Supplemental Figure 5: Combined Venous Capillary t-SNE.ipynb
Supplemental Figure 6: Cell Typing.ipynb
Supplemental Table 1: Diurnal Gene Detection.ipynb
Supplemental Table 5: Individuality By Subject.ipynb
Supplemental Table 6: Debris Filtering Analysis.ipynb
Supplemental Table 7: Cell Typing.ipynb
Supplemental Table 8: Combined Venous Capillary t-SNE.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
bin		bin
capblood_seq		capblood_seq
examples		examples
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capillary Blood Sequencing (capblood-seq)

Installing

Prerequisites

Preprocessing

Usage

1. Download raw 10X Cell Ranger data and perform preprocessing yourself

2. Download processed data and jump into generating figures

Demultiplexing

Preprocessing

Cell Type Assignment

Cell Subject Assignment

Figures

About

Releases

Packages

Contributors 2

Languages

License

thomsonlab/capblood-seq

Folders and files

Latest commit

History

Repository files navigation

Capillary Blood Sequencing (capblood-seq)

Installing

Prerequisites

Preprocessing

Usage

1. Download raw 10X Cell Ranger data and perform preprocessing yourself

2. Download processed data and jump into generating figures

Demultiplexing

Preprocessing

Cell Type Assignment

Cell Subject Assignment

Figures

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages