Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identical values reported for "var.nnz" and "var.n_measured_obs" for different datasets retrieved via get_anndata() #1281

Open
khughitt opened this issue Sep 16, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@khughitt
Copy link

Describe the bug

The nnz and n_measured_obs fields report the same (global?) values for adata.var.nnz and adata.var.n_measured_obs regardless of the dataset queried.

Tested for different dataset ids, but presumably this applies to all queries and not just those pulling a single dataset.

To Reproduce

import cellxgene_census

d1 = "00ff600e-6e2e-4d76-846f-0eec4f0ae417"
d2 = "0c9a8cfb-6649-4d52-b418-6d8e56bd7afe"

with cellxgene_census.open_soma(census_version="2024-07-01") as census:
    ad1 = cellxgene_census.get_anndata(
        census,
        organism="Homo sapiens",
        obs_value_filter=f"dataset_id == '{d1}'"
    )

    ad2 = cellxgene_census.get_anndata(
        census,
        organism="Homo sapiens",
        obs_value_filter=f"dataset_id == '{d2}'"
    )

    # True
    (ad1.var.nnz == ad2.var.nnz).all()

    # True
    (ad1.var.n_measured_obs == ad2.var.n_measured_obs).all()

Expected behavior

Query/dataset-specific values should be returned.

Environment

Arch Linux (64-bit)

Package                              Version
------------------------------------ --------------
aiobotocore                          2.14.0
aiohappyeyeballs                     2.4.0
aiohttp                              3.10.5
aioitertools                         0.12.0
aiosignal                            1.3.1
amply                                0.1.6
anndata                              0.10.9
anyio                                4.4.0
appdirs                              1.4.4
argon2-cffi                          23.1.0
argon2-cffi-bindings                 21.2.0
argparse-dataclass                   2.0.0
array_api_compat                     1.8
arrow                                1.3.0
asttokens                            2.4.1
async-lru                            2.0.4
attmap                               0.13.2
attrs                                24.2.0
Babel                                2.14.0
beautifulsoup4                       4.12.3
bleach                               6.1.0
bokeh                                3.5.2
botocore                             1.35.7
Brotli                               1.1.0
cached-property                      1.5.2
cellxgene-census                     1.15.0
certifi                              2024.8.30
cffi                                 1.17.1
charset-normalizer                   3.3.2
click                                8.1.7
cloudpickle                          3.0.0
colorama                             0.4.6
colorcet                             3.1.0
comm                                 0.2.2
conda-inject                         1.3.2
ConfigArgParse                       1.7
connection-pool                      0.0.3
contourpy                            1.3.0
cycler                               0.12.1
cytoolz                              0.12.3
dask                                 2024.8.2
dask-expr                            1.1.13
datashader                           0.16.3
datrie                               0.8.2
debugpy                              1.8.5
decorator                            5.1.1
defusedxml                           0.7.1
distributed                          2024.8.2
docutils                             0.21.2
dpath                                2.2.0
eido                                 0.2.2
entrypoints                          0.4
exceptiongroup                       1.2.2
executing                            2.1.0
fastjsonschema                       2.20.0
fonttools                            4.53.1
fqdn                                 1.5.1
frozenlist                           1.4.1
fsspec                               2024.9.0
get-annotations                      0.1.2
gitdb                                4.0.11
GitPython                            3.1.43
h11                                  0.14.0
h2                                   4.1.0
h5py                                 3.11.0
hdf5plugin                           5.0.0
hpack                                4.0.0
httpcore                             1.0.5
httpx                                0.27.2
humanfriendly                        10.0
hyperframe                           6.0.1
idna                                 3.8
igraph                               0.11.6
imagecodecs                          2024.6.1
imageio                              2.35.1
immutables                           0.20
importlib_metadata                   8.4.0
importlib_resources                  6.4.4
iniconfig                            2.0.0
ipykernel                            6.29.5
ipython                              8.27.0
ipywidgets                           8.1.5
isoduration                          20.11.0
jedi                                 0.19.1
Jinja2                               3.1.4
jmespath                             1.0.1
joblib                               1.4.2
json5                                0.9.25
jsonpointer                          3.0.0
jsonschema                           4.23.0
jsonschema-specifications            2023.12.1
jupyter_client                       8.6.2
jupyter_core                         5.7.2
jupyter-events                       0.10.0
jupyter-lsp                          2.2.5
jupyter_server                       2.14.2
jupyter_server_terminals             0.5.3
jupyterlab                           4.2.5
jupyterlab_pygments                  0.3.0
jupyterlab_server                    2.27.3
jupyterlab_widgets                   3.0.13
kiwisolver                           1.4.7
lazy_loader                          0.4
legacy-api-wrap                      1.4
leidenalg                            0.10.2
llvmlite                             0.43.0
locket                               1.0.0
logmuse                              0.2.6
lz4                                  4.3.3
markdown-it-py                       3.0.0
MarkupSafe                           2.1.5
matplotlib                           3.9.2
matplotlib-inline                    0.1.7
mdurl                                0.1.2
mistune                              3.0.2
msgpack                              1.0.8
multidict                            6.0.5
multipledispatch                     0.6.0
munkres                              1.1.4
natsort                              8.4.0
nbclient                             0.10.0
nbconvert                            7.16.4
nbformat                             5.10.4
nest_asyncio                         1.6.0
networkx                             3.3
notebook_shim                        0.2.4
numba                                0.60.0
numpy                                1.26.4
overrides                            7.7.0
packaging                            24.1
pandas                               2.2.2
pandocfilters                        1.5.0
param                                2.1.1
parso                                0.8.4
partd                                1.4.2
patsy                                0.5.6
peppy                                0.40.5
pexpect                              4.9.0
pickleshare                          0.7.5
pillow                               10.4.0
pip                                  24.2
pkgutil_resolve_name                 1.3.10
plac                                 1.4.3
platformdirs                         4.2.2
pluggy                               1.5.0
prometheus_client                    0.20.0
prompt_toolkit                       3.0.47
psutil                               6.0.0
ptyprocess                           0.7.0
PuLP                                 2.8.0
pure_eval                            0.2.3
pyarrow                              17.0.0
pyarrow-hotfix                       0.6
pycparser                            2.22
pyct                                 0.5.0
Pygments                             2.18.0
pynndescent                          0.5.13
pyparsing                            3.1.4
PySocks                              1.7.1
pytest                               8.3.2
python-dateutil                      2.9.0
python-json-logger                   2.0.7
pytz                                 2024.1
PyWavelets                           1.7.0
PyYAML                               6.0.2
pyzmq                                26.2.0
referencing                          0.35.1
requests                             2.32.3
reretry                              0.11.8
rfc3339-validator                    0.1.4
rfc3986-validator                    0.1.1
rich                                 13.7.1
rpds-py                              0.20.0
s3fs                                 2024.9.0
scanpy                               1.10.2
scikit-image                         0.24.0
scikit-learn                         1.5.1
scikit-misc                          0.1.4
scipy                                1.14.1
seaborn                              0.13.2
Send2Trash                           1.8.3
session-info                         1.0.0
setuptools                           73.0.1
six                                  1.16.0
slack_sdk                            3.32.0
smart_open                           7.0.4
smmap                                5.0.0
snakemake                            8.20.1
snakemake-interface-common           1.17.3
snakemake-interface-executor-plugins 9.2.0
snakemake-interface-report-plugins   1.0.0
snakemake-interface-storage-plugins  3.3.0
sniffio                              1.3.1
somacore                             1.0.11
sortedcontainers                     2.4.0
soupsieve                            2.5
stack-data                           0.6.2
statsmodels                          0.14.2
stdlib-list                          0.10.0
tabulate                             0.9.0
tblib                                3.0.0
terminado                            0.18.1
texttable                            1.7.0
threadpoolctl                        3.5.0
throttler                            1.2.2
tifffile                             2024.8.30
tiledb                               0.29.1
tiledbsoma                           1.11.4
tinycss2                             1.3.0
tomli                                2.0.1
toolz                                0.12.1
toposort                             1.10
tornado                              6.4.1
tqdm                                 4.66.5
traitlets                            5.14.3
types-python-dateutil                2.9.0.20240906
typing_extensions                    4.12.2
typing-utils                         0.1.0
tzdata                               2024.1
ubiquerg                             0.8.0
umap-learn                           0.5.6
uri-template                         1.3.0
urllib3                              2.2.2
veracitools                          0.1.3
wcwidth                              0.2.13
webcolors                            24.8.0
webencodings                         0.5.1
websocket-client                     1.8.0
wheel                                0.44.0
widgetsnbextension                   4.0.13
wrapt                                1.16.0
xarray                               2024.7.0
xyzservices                          2024.9.0
yarl                                 1.10.0
yte                                  1.5.4
zict                                 3.0.0
zipp                                 3.20.1
zstandard                            0.23.0

Additional context

I checked the docs just to make sure that this is not expected behavior and it also suggests that the expected behavior is for the values to be relative to the (dataset) queried:

n_measured_obs — the “measured” cells for this gene, effectively the number of cells for which this gene was measured in their respective dataset.

source: https://chanzuckerberg.github.io/cellxgene-census/articles/2023/20231012-normalized_layer_precalc_stats.html

--

Thanks for all of your work on this!

It's appreciated.

@khughitt khughitt added the bug Something isn't working label Sep 16, 2024
@ivirshup
Copy link
Collaborator

Thanks for the bug report @khughitt!

We are tracking this, and it looks related to #1284. But I think your interpretation is correct, just checking in with the schema owners on internal channels to make sure.

@ivirshup
Copy link
Collaborator

ivirshup commented Feb 3, 2025

@khughitt, apologies, but I completely misread this issue when I first responded.

What you are seeing is actually the expected behavior. The summary statistics you are seeing in .var are calculated across the whole census object, and are statically stored. That means for any query within an Measurement you will get the same values for var.

It sounds like what you are expecting is for these values to be calculated dynamically for each query. You will need to recalculate the values on your side for this.

Sorry for the confusion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants