You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the download modals for Datasets and Collections, please include the dataset_id and a code snippet for downloading this dataset via the Census API.
Context
Use case: today I wanted to pre-filter the tabula sapiens dataset based on metadata found in .obs before I download the count matrix. This is useful because I'm working on my local laptop, and the count data is large-ish, whereas I only actually need a small fraction of it.
In theory, this should be easy because Census provides a very nice cellxgene_census.get_obs function, which can be run something like this: cellxgene_census.get_obs(obs_value_filter='dataset_id == foo').
However, this dataset ID is impossible to find unless you query alldataset_id values in the Census and filter based on the collection_name. (H/T to @ebezzi for helping me figure out this workaround!)
Impact
I usually browse datasets online, and then download via notebook so I can be more precise in which slices of the data I actually need. Making this more seamless would save me a lot of headache trying to track down the data I want once I'm ready to download.
Alternatives you've considered
I really don't think we surface this dataset_idanywhere visible online. I even checked the dataset info box in Explorer. Maybe I'm just missing something? :)
Ideal behavior
In the modal, replace:
old:
Individual datasets and their versions may also be downloaded programmatically using the Discover API.
new:
To download this dataset via the Discover API, use this Python snippet: cellxgene_census.get_anndata(obs_value_filter='dataset_id == foo')
The text was updated successfully, but these errors were encountered:
Description
In the download modals for Datasets and Collections, please include the
dataset_id
and a code snippet for downloading this dataset via the Census API.Context
Use case: today I wanted to pre-filter the tabula sapiens dataset based on metadata found in
.obs
before I download the count matrix. This is useful because I'm working on my local laptop, and the count data is large-ish, whereas I only actually need a small fraction of it.In theory, this should be easy because Census provides a very nice
cellxgene_census.get_obs
function, which can be run something like this:cellxgene_census.get_obs(obs_value_filter='dataset_id == foo')
.However, this dataset ID is impossible to find unless you query all
dataset_id
values in the Census and filter based on thecollection_name
. (H/T to @ebezzi for helping me figure out this workaround!)Impact
I usually browse datasets online, and then download via notebook so I can be more precise in which slices of the data I actually need. Making this more seamless would save me a lot of headache trying to track down the data I want once I'm ready to download.
Alternatives you've considered
I really don't think we surface this
dataset_id
anywhere visible online. I even checked the dataset info box in Explorer. Maybe I'm just missing something? :)Ideal behavior
In the modal, replace:
old:
new:
The text was updated successfully, but these errors were encountered: