Skip to content

Commit d578947

Browse files
committed
properly index
1 parent e6cf752 commit d578947

File tree

8 files changed

+152
-152
lines changed

8 files changed

+152
-152
lines changed

docs/about.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@ About Nested-Pandas
44

55
.. toctree::
66

7-
Internal Representation of Nested Data <about_nested_pandas/internals>
8-
Performance Impact of Nested-Pandas <about_nested_pandas/performance>
7+
Internal Representation of Nested Data <about/internals>
8+
Performance Impact of Nested-Pandas <about/performance>
File renamed without changes.
File renamed without changes.

docs/about/performance.ipynb

+146
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Performance Impact of `nested-pandas`\n",
8+
"\n",
9+
"For use-cases involving nesting data, `nested-pandas` can offer significant speedups compared to using the native `pandas` API. Below is a brief example workflow comparison between `pandas` and `nested-pandas`, where this example workflow calculates the amplitude of photometric fluxes after a few filtering steps."
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": 1,
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"import nested_pandas as npd\n",
19+
"import pandas as pd\n",
20+
"import light_curve as licu\n",
21+
"import numpy as np"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"metadata": {},
27+
"source": [
28+
"## Pandas"
29+
]
30+
},
31+
{
32+
"cell_type": "code",
33+
"execution_count": 2,
34+
"metadata": {},
35+
"outputs": [
36+
{
37+
"name": "stdout",
38+
"output_type": "stream",
39+
"text": [
40+
"494 ms ± 3.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
41+
]
42+
}
43+
],
44+
"source": [
45+
"%%timeit\n",
46+
"\n",
47+
"# Read data\n",
48+
"object_df = pd.read_parquet(\"objects.parquet\")\n",
49+
"source_df = pd.read_parquet(\"ztf_sources.parquet\")\n",
50+
"\n",
51+
"# Filter on object\n",
52+
"filtered_object = object_df.query(\"ra > 10.0\")\n",
53+
"#sync object to source --removes any index values of source not found in object\n",
54+
"filtered_source = filtered_object[[]].join(source_df, how=\"left\")\n",
55+
"\n",
56+
"# Count number of observations per photometric band and add it to the object table\n",
57+
"band_counts = source_df.groupby(level=0).apply(lambda x: \n",
58+
" x[[\"band\"]].value_counts().reset_index()).pivot_table(values=\"count\", \n",
59+
" index=\"index\", \n",
60+
" columns=\"band\", \n",
61+
" aggfunc=\"sum\")\n",
62+
"filtered_object = filtered_object.join(band_counts[[\"g\",\"r\"]])\n",
63+
"\n",
64+
"# Filter on our nobs\n",
65+
"filtered_object = filtered_object.query(\"g > 520\")\n",
66+
"filtered_source = filtered_object[[]].join(source_df, how=\"left\")\n",
67+
"\n",
68+
"# Calculate Amplitude\n",
69+
"amplitude = licu.Amplitude()\n",
70+
"filtered_source.groupby(level=0).apply(lambda x: amplitude(np.array(x.mjd), np.array(x.flux)))"
71+
]
72+
},
73+
{
74+
"cell_type": "markdown",
75+
"metadata": {},
76+
"source": [
77+
"## Nested-Pandas"
78+
]
79+
},
80+
{
81+
"cell_type": "code",
82+
"execution_count": null,
83+
"metadata": {},
84+
"outputs": [
85+
{
86+
"name": "stdout",
87+
"output_type": "stream",
88+
"text": [
89+
"230 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
90+
]
91+
}
92+
],
93+
"source": [
94+
"%%timeit\n",
95+
"\n",
96+
"#Read in parquet data\n",
97+
"#nesting sources into objects\n",
98+
"nf = npd.read_parquet(data=\"objects.parquet\",\n",
99+
" to_pack={\"ztf_sources\": \"ztf_sources.parquet\"})\n",
100+
"\n",
101+
"# Filter on object\n",
102+
"nf = nf.query(\"ra > 10.0\")\n",
103+
"\n",
104+
"# Count number of observations per photometric band and add it as a column\n",
105+
"from nested_pandas.utils import count_nested # utility function of nested_pandas\n",
106+
"nf = count_nested(nf, \"ztf_sources\", by=\"band\", join=True)\n",
107+
"\n",
108+
"# Filter on our nobs\n",
109+
"nf = nf.query(\"n_ztf_sources_g > 520\")\n",
110+
"\n",
111+
"# Calculate Amplitude\n",
112+
"amplitude = licu.Amplitude()\n",
113+
"nf.reduce(amplitude, \"ztf_sources.mjd\", \"ztf_sources.flux\")"
114+
]
115+
},
116+
{
117+
"cell_type": "markdown",
118+
"metadata": {},
119+
"source": [
120+
"\n",
121+
"In addition, less lines of code are needed!"
122+
]
123+
}
124+
],
125+
"metadata": {
126+
"kernelspec": {
127+
"display_name": "lsdb",
128+
"language": "python",
129+
"name": "python3"
130+
},
131+
"language_info": {
132+
"codemirror_mode": {
133+
"name": "ipython",
134+
"version": 3
135+
},
136+
"file_extension": ".py",
137+
"mimetype": "text/x-python",
138+
"name": "python",
139+
"nbconvert_exporter": "python",
140+
"pygments_lexer": "ipython3",
141+
"version": "3.11.11"
142+
}
143+
},
144+
"nbformat": 4,
145+
"nbformat_minor": 2
146+
}

docs/about_nested_pandas/performance.ipynb

-150
This file was deleted.

docs/index.rst

+4
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,9 @@ API-level information about nested-pandas is viewable in the
7979
:doc:`API Reference <reference>`
8080
section.
8181

82+
The :doc:`About Nested-Pandas <about>` section provides information on the
83+
design and performance advantages of nested-pandas.
84+
8285
Learn more about contributing to this repository in our :doc:`Contribution Guide <gettingstarted/contributing>`.
8386

8487
.. toctree::
@@ -88,3 +91,4 @@ Learn more about contributing to this repository in our :doc:`Contribution Guide
8891
Getting Started <gettingstarted>
8992
Tutorials <tutorials>
9093
API Reference <reference>
94+
About Nested-Pandas <about>

0 commit comments

Comments
 (0)