This is a companion repository and package for the capillary blood single-cell sequencing project from Caltech.
pip install capblood-seq
- Cell Ranger v3.0.0 (https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/3.0/)
This package revolves around the usage of the capblood_seq DATASET object, which you can quickly access with:
import capblood_seq
dataset = capblood_seq.load_dataset()
This will download all relevant data for the paper (both raw and processed)
and save it to a local data
directory. You can then proceed with any
of the functionality in the examples
Jupyter notebooks.
There are two intended usage modes for the package:
To do this, start with the Preprocessing section below. The raw feature BC matrices will be automatically downloaded, and you can initialize a dataset and process it, which will overwrite any downloaded already processed data.
To do this, skip the preprocessing and jump straight into running the figure scripts. These scripts will download the already processed data for you.
To generate the subject cell labels, we used freemuxlet to identify which cells belong to which subject across the samples.
First, we use popscle to generate known variants for each sample. The script is located in bin/sample_freemuxlet.sh, and downloads all the necessary software to run the pipeline. This was tested on an Amazon Linux 2 instance, AMI amzn2-ami-hvm-2.0.20200917.0-x86_64-gp2
. Run this script for each sample you want to demultiplex.
Next, we merge the outputs of each of the sample demultiplexing, retaining sample identifying information in the cell barcode. The Jupyter notebook to do this is examples/Merge Freemuxlet Output.ipynb
Finally, we run freemuxlet to identify 4 distinct samples using the script bin/merged_freemuxlet.sh
. Note: this takes a substantial amount of memory (100GB+).
After receiving the output from 10X Cell Ranger, we convert the data into
SCRAP workspaces for downstream manipulation. An example of this is in
Initialize Datasets.ipynb
.
Then, we preprocess the data through several steps, an example of which is in
Preprocessing.ipynb
- Filter out any remaining debris, empty droplets, and red blood cells. This
is done by a function in scrapi, but a step by step example is in
Debris Removal.ipynb
- Remove low count genes (any genes that have a maximum count < 3)
- Convert gene counts into transcripts per cell
- (For visualization only) Transform the normalized gene counts via PCA, and then t-SNE
After a workspace is created and filtered for debris, we use scVI to learn a latent representation of the cells across all samples, and perform agglomerative clustering to identify cell type clusters. An example of this is in examples/Cell Typing.ipynb
Cell type assignments for externally downloaded datasets are performed in their respective cell typing notebooks - e.g. examples/Cell Typing Hashimoto 2012.ipynb
After a workspace is created and filtered for debris, and freemuxlet has assigned cells to subjects, we add it to our workspace label files. An example of this is in examples/Assign Cells to Subject IDs.ipynb
The figures in the paper were generated by the Jupyter notebook examples in the
examples/
directory as below:
- Figure 1.c:
Combined t-SNE.ipynb
- Figure 1.d:
Cell Type Distribution Suburst.ipynb
- Figure 1.e:
Cell Type Percentages By Subject.ipynb
andCell Type Percentages By Subject Venous.ipynb
- Figure 2.a:
Diurnal Gene Detection.ipynb
- Figure 2.b:
Gene AM vs PM Overlapping Histogram.ipynb
- Figure 2.c:
Gene Diurnality Box Plots.ipynb
- Figure 2.d:
Gene Diurnality Box Plots.ipynb
- Figure 3.a:
Individuality and Cell Type Specificity.ipynb
- Figure 3.b:
Diurnal Individual Pathway Enrichment.ipynb
- Figure 3.c:
Gene Expression by Subject Over Time.ipynb
- Figure 3.c:
Gene Subject vs All Overlapping Histogram.ipynb
- Supplemental Figure 1:
Cell Type Marker Gene Violin Plots.ipynb
- Supplemental Figure 2:
Gene Subject vs Others Box Plot.ipynb
- Supplemental Figure 3:
Debris Filtering Analysis.ipynb
- Supplemental Figure 4:
Individuality and Cell Type Specificity vs Bulk.ipynb
- Supplemental Figure 5:
Combined Venous Capillary t-SNE.ipynb
- Supplemental Figure 6:
Cell Typing.ipynb
- Supplemental Table 1:
Diurnal Gene Detection.ipynb
- Supplemental Table 5:
Individuality By Subject.ipynb
- Supplemental Table 6:
Debris Filtering Analysis.ipynb
- Supplemental Table 7:
Cell Typing.ipynb
- Supplemental Table 8:
Combined Venous Capillary t-SNE.ipynb