Skip to content

Commit 0e61d5a

Browse files
authored
Merge pull request #148 from sfiligoi/igor_default_fp32
Change default precision to fp32 and add explicit fp64 functions
2 parents 6ebb012 + 6194526 commit 0e61d5a

File tree

8 files changed

+884
-166
lines changed

8 files changed

+884
-166
lines changed

.github/workflows/main.yml

+8-6
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,14 @@ jobs:
2929
needs: lint
3030
strategy:
3131
matrix:
32-
python-version: ['3.7', '3.8', '3.9', '3.10']
33-
os: [ubuntu-latest, macos-latest]
32+
python-version: ['3.8', '3.9', '3.10']
33+
os: [ubuntu-latest, macos-latest, linux-gpu-cuda]
3434
runs-on: ${{ matrix.os }}
3535
steps:
3636
- uses: actions/checkout@v2
3737
- uses: conda-incubator/setup-miniconda@v2
38-
with:
38+
with:
39+
miniconda-version: "latest"
3940
auto-update-conda: true
4041
python-version: ${{ matrix.python-version }}
4142
- name: Install
@@ -59,9 +60,8 @@ jobs:
5960
else
6061
conda install --yes -c conda-forge -c bioconda clangxx_osx-64
6162
fi
62-
conda install --yes -c conda-forge -c bioconda unifrac-binaries
63-
# TEMP HACK: Use older version of scipy to work around scikit-bio problem
64-
conda install --yes -c conda-forge -c bioconda cython "scipy<1.9" "hdf5<1.12.1" biom-format numpy "h5py<3.0.0 | >3.3.0" "scikit-bio>=0.5.7" nose
63+
conda install --yes -c conda-forge -c bioconda "unifrac-binaries>=1.2"
64+
conda install --yes -c conda-forge -c bioconda cython scipy hdf5 biom-format numpy "h5py>3.3.0" "scikit-bio>=0.5.8" nose
6565
echo "$(uname -s)"
6666
if [[ "$(uname -s)" == "Linux" ]];
6767
then
@@ -80,13 +80,15 @@ jobs:
8080
shell: bash -l {0}
8181
run: |
8282
conda activate unifrac
83+
export UNIFRAC_GPU_INFO=Y
8384
ls -lrt $CONDA_PREFIX/lib/libhdf5_cpp*
8485
nosetests
8586
8687
- name: Sanity checks
8788
shell: bash -l {0}
8889
run: |
8990
conda activate unifrac
91+
export UNIFRAC_GPU_INFO=Y
9092
set -e
9193
ssu -i unifrac/tests/data/crawford.biom -t unifrac/tests/data/crawford.tre -o ci/test.dm -m unweighted
9294
python -c "import skbio; dm = skbio.DistanceMatrix.read('ci/test.dm')"

README.md

+60-44
Original file line numberDiff line numberDiff line change
@@ -135,22 +135,22 @@ To use Stacked Faith through QIIME2, given similar artifacts, you can use:
135135
The library can be accessed directly from within Python. If operating in this mode, the API methods are expecting a filepath to a BIOM-Format V2.1.0 table, and a filepath to a Newick formatted phylogeny.
136136

137137
$ python
138-
Python 3.7.8 | packaged by conda-forge | (default, Nov 27 2020, 19:24:58)
139-
[GCC 9.3.0] on linux
138+
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
140139
Type "help", "copyright", "credits" or "license" for more information.
141140
>>> import unifrac
142141
>>> dir(unifrac)
143-
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__',
144-
'__package__', '__path__', '__spec__', '__version__', '_api', '_meta', '_methods',
145-
'faith_pd',
146-
'generalized', 'generalized_fp32', 'generalized_fp32_to_file', 'generalized_to_file',
147-
'h5pcoa', 'h5unifrac', 'meta', 'pkg_resources', 'ssu', 'ssu_to_file',
148-
'unweighted', 'unweighted_fp32', 'unweighted_fp32_to_file', 'unweighted_to_file',
149-
'weighted_normalized', 'weighted_normalized_fp32', 'weighted_normalized_fp32_to_file', 'weighted_normalized_to_file',
150-
'weighted_unnormalized', 'weighted_unnormalized_fp32', 'weighted_unnormalized_fp32_to_file', 'weighted_unnormalized_to_file']
151-
>>> print(unifrac.unweighted_fp32.__doc__)
152-
Compute Unweighted UniFrac using fp32 math
153-
142+
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__',
143+
'__path__', '__spec__', '__version__', '_api', '_meta', '_methods', 'faith_pd',
144+
'generalized', 'generalized_fp32', 'generalized_fp32_to_file', 'generalized_fp64', 'generalized_fp64_to_file', 'generalized_to_file',
145+
'h5pcoa', 'h5unifrac', 'meta', 'pkg_resources', 'ssu', 'ssu_fast', 'ssu_inmem', 'ssu_to_file',
146+
'unweighted', 'unweighted_fp32', 'unweighted_fp32_to_file', 'unweighted_fp64', 'unweighted_fp64_to_file', 'unweighted_to_file',
147+
'weighted_normalized', 'weighted_normalized_fp32', 'weighted_normalized_fp32_to_file',
148+
'weighted_normalized_fp64', 'weighted_normalized_fp64_to_file', 'weighted_normalized_to_file',
149+
'weighted_unnormalized', 'weighted_unnormalized_fp32', 'weighted_unnormalized_fp32_to_file',
150+
'weighted_unnormalized_fp64', 'weighted_unnormalized_fp64_to_file', 'weighted_unnormalized_to_file']
151+
>>> print(unifrac.unweighted.__doc__)
152+
Compute Unweighted UniFrac
153+
154154
Parameters
155155
----------
156156
table : str
@@ -166,12 +166,12 @@ The library can be accessed directly from within Python. If operating in this mo
166166
by about 50%, but is an approximation.
167167
n_substeps : int, optional
168168
Internally split the problem in substeps for reduced memory footprint.
169-
169+
170170
Returns
171171
-------
172172
skbio.DistanceMatrix
173173
The resulting distance matrix.
174-
174+
175175
Raises
176176
------
177177
IOError
@@ -180,7 +180,7 @@ The library can be accessed directly from within Python. If operating in this mo
180180
ValueError
181181
If the table does not appear to be BIOM-Format v2.1.
182182
If the phylogeny does not appear to be in Newick format.
183-
183+
184184
Environment variables
185185
---------------------
186186
OMP_NUM_THREADS
@@ -189,14 +189,14 @@ The library can be accessed directly from within Python. If operating in this mo
189189
Enable or disable GPU offload. If not defined, autodetect.
190190
ACC_DEVICE_NUM
191191
The GPU to use. If not defined, the first GPU will be used.
192-
192+
193193
Notes
194194
-----
195195
Unweighted UniFrac was originally described in [1]_. Variance Adjusted
196196
UniFrac was originally described in [2]_, and while its application to
197197
Unweighted UniFrac was not described, factoring in the variance adjustment
198198
is still feasible and so it is exposed.
199-
199+
200200
References
201201
----------
202202
.. [1] Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for
@@ -205,10 +205,10 @@ The library can be accessed directly from within Python. If operating in this mo
205205
.. [2] Chang, Q., Luan, Y. & Sun, F. Variance adjusted weighted UniFrac: a
206206
powerful beta diversity measure for comparing communities based on
207207
phylogeny. BMC Bioinformatics 12:118 (2011).
208-
209-
>>> print(unifrac.unweighted_fp32_to_file.__doc__)
210-
Compute Unweighted UniFrac using fp32 math and write to file
211-
208+
209+
>>> print(unifrac.unweighted_to_file.__doc__)
210+
Compute Unweighted UniFrac and write to file
211+
212212
Parameters
213213
----------
214214
table : str
@@ -235,12 +235,12 @@ The library can be accessed directly from within Python. If operating in this mo
235235
can be used to reduce the amount of memory needed.
236236
n_substeps : int, optional
237237
Internally split the problem in substeps for reduced memory footprint.
238-
238+
239239
Returns
240240
-------
241241
str
242242
A filepath to the output file.
243-
243+
244244
Raises
245245
------
246246
IOError
@@ -250,7 +250,7 @@ The library can be accessed directly from within Python. If operating in this mo
250250
ValueError
251251
If the table does not appear to be BIOM-Format v2.1.
252252
If the phylogeny does not appear to be in Newick format.
253-
253+
254254
Environment variables
255255
---------------------
256256
OMP_NUM_THREADS
@@ -259,14 +259,14 @@ The library can be accessed directly from within Python. If operating in this mo
259259
Enable or disable GPU offload. If not defined, autodetect.
260260
ACC_DEVICE_NUM
261261
The GPU to use. If not defined, the first GPU will be used.
262-
262+
263263
Notes
264264
-----
265265
Unweighted UniFrac was originally described in [1]_. Variance Adjusted
266266
UniFrac was originally described in [2]_, and while its application to
267267
Unweighted UniFrac was not described, factoring in the variance adjustment
268268
is still feasible and so it is exposed.
269-
269+
270270
References
271271
----------
272272
.. [1] Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for
@@ -275,27 +275,27 @@ The library can be accessed directly from within Python. If operating in this mo
275275
.. [2] Chang, Q., Luan, Y. & Sun, F. Variance adjusted weighted UniFrac: a
276276
powerful beta diversity measure for comparing communities based on
277277
phylogeny. BMC Bioinformatics 12:118 (2011).
278-
278+
279279
>>> print(unifrac.h5unifrac.__doc__)
280280
Read UniFrac from a hdf5 file
281-
281+
282282
Parameters
283283
----------
284284
h5file : str
285285
A filepath to a hdf5 file.
286-
286+
287287
Returns
288288
-------
289289
skbio.DistanceMatrix
290290
The distance matrix.
291-
291+
292292
Raises
293293
------
294294
OSError
295295
If the hdf5 file is not found
296296
KeyError
297297
If the hdf5 does not have the necessary fields
298-
298+
299299
References
300300
----------
301301
.. [1] Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for
@@ -304,7 +304,7 @@ The library can be accessed directly from within Python. If operating in this mo
304304
.. [2] Chang, Q., Luan, Y. & Sun, F. Variance adjusted weighted UniFrac: a
305305
powerful beta diversity measure for comparing communities based on
306306
phylogeny. BMC Bioinformatics 12:118 (2011).
307-
307+
308308
>>> print(unifrac.faith_pd.__doc__)
309309
Execute a call to the Stacked Faith API in the UniFrac package
310310

@@ -402,14 +402,30 @@ The methods can also be used directly through the command line after install:
402402
403403
## Minor test dataset
404404

405-
A small test `.biom` and `.tre` can be found in `sucpp/`. An example with expected output is below, and should execute in 10s of milliseconds:
406-
407-
$ ssu -i sucpp/test.biom -t sucpp/test.tre -m unweighted -o test.out
408-
$ cat test.out
409-
Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
410-
Sample1 0 0.2 0.5714285714285714 0.6 0.5 0.2
411-
Sample2 0.2 0 0.4285714285714285 0.6666666666666666 0.6 0.3333333333333333
412-
Sample3 0.5714285714285714 0.4285714285714285 0 0.7142857142857143 0.8571428571428571 0.4285714285714285
413-
Sample4 0.6 0.6666666666666666 0.7142857142857143 0 0.3333333333333333 0.4
414-
Sample5 0.5 0.6 0.8571428571428571 0.3333333333333333 0 0.6
415-
Sample6 0.2 0.3333333333333333 0.4285714285714285 0.4 0.6 0
405+
A small test `.biom` and `.tre` can be found in `unifrac/tests/data/`. An example with expected output is below, and should execute in 10s of milliseconds:
406+
407+
$ python
408+
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
409+
Type "help", "copyright", "credits" or "license" for more information.
410+
>>> import unifrac
411+
>>> d=unifrac.unweighted('unifrac/tests/data/crawford.biom','unifrac/tests/data/crawford.tre')
412+
>>> d.data
413+
array([[0. , 0.71836066, 0.7131736 , 0.6974604 , 0.6258721 ,
414+
0.7282667 , 0.72065896, 0.7264058 , 0.7360605 ],
415+
[0.71836066, 0. , 0.7030297 , 0.734073 , 0.6548042 ,
416+
0.71547383, 0.7839781 , 0.723184 , 0.7613893 ],
417+
[0.7131736 , 0.7030297 , 0. , 0.6104128 , 0.623313 ,
418+
0.71848303, 0.7041634 , 0.75258476, 0.7924903 ],
419+
[0.6974604 , 0.734073 , 0.6104128 , 0. , 0.6439278 ,
420+
0.7005273 , 0.6983272 , 0.77818936, 0.72959894],
421+
[0.6258721 , 0.6548042 , 0.623313 , 0.6439278 , 0. ,
422+
0.75782686, 0.7100514 , 0.75065047, 0.7894437 ],
423+
[0.7282667 , 0.71547383, 0.71848303, 0.7005273 , 0.75782686,
424+
0. , 0.63593644, 0.71283615, 0.5831464 ],
425+
[0.72065896, 0.7839781 , 0.7041634 , 0.6983272 , 0.7100514 ,
426+
0.63593644, 0. , 0.6920076 , 0.6897206 ],
427+
[0.7264058 , 0.723184 , 0.75258476, 0.77818936, 0.75065047,
428+
0.71283615, 0.6920076 , 0. , 0.7151408 ],
429+
[0.7360605 , 0.7613893 , 0.7924903 , 0.72959894, 0.7894437 ,
430+
0.5831464 , 0.6897206 , 0.7151408 , 0. ]], dtype=float32)
431+

ci/linux-64.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ flake8
33
nose
44
scikit-bio
55
biom-format
6-
h5py==2.7.0
6+
h5py

setup.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717

1818
PREFIX = os.environ.get('PREFIX', "")
1919

20-
base = ["cython >= 0.26", "biom-format", "numpy", "h5py >= 2.7.0",
21-
"scikit-bio >= 0.5.1", "iow"]
20+
base = ["cython >= 0.26", "biom-format", "numpy", "h5py >= 3.3.0",
21+
"scikit-bio >= 0.5.8", "iow"]
2222

2323
test = ["nose", "flake8"]
2424

@@ -92,7 +92,7 @@ def run_compile_ssu(self):
9292

9393
setup(
9494
name="unifrac",
95-
version="1.0.0",
95+
version="1.2.0",
9696
packages=find_packages(),
9797
author="Daniel McDonald",
9898
license='BSD-3-Clause',

unifrac/__init__.py

+16-2
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@
1212
weighted_normalized,
1313
weighted_unnormalized,
1414
generalized,
15+
unweighted_fp64,
16+
weighted_normalized_fp64,
17+
weighted_unnormalized_fp64,
18+
generalized_fp64,
1519
unweighted_fp32,
1620
weighted_normalized_fp32,
1721
weighted_unnormalized_fp32,
@@ -20,6 +24,10 @@
2024
weighted_normalized_to_file,
2125
weighted_unnormalized_to_file,
2226
generalized_to_file,
27+
unweighted_fp64_to_file,
28+
weighted_normalized_fp64_to_file,
29+
weighted_unnormalized_fp64_to_file,
30+
generalized_fp64_to_file,
2331
unweighted_fp32_to_file,
2432
weighted_normalized_fp32_to_file,
2533
weighted_unnormalized_fp32_to_file,
@@ -32,12 +40,18 @@
3240

3341
__version__ = pkg_resources.get_distribution('unifrac').version
3442
__all__ = ['unweighted', 'weighted_normalized', 'weighted_unnormalized',
35-
'generalized', 'unweighted_fp32', 'weighted_normalized_fp32',
43+
'generalized', 'unweighted_fp64', 'weighted_normalized_fp64',
44+
'weighted_unnormalized_fp64', 'generalized_fp64',
45+
'unweighted_fp32', 'weighted_normalized_fp32',
3646
'weighted_unnormalized_fp32', 'generalized_fp32',
3747
'meta',
3848
'unweighted_to_file', 'weighted_normalized_to_file',
3949
'weighted_unnormalized_to_file',
40-
'generalized_to_file', 'unweighted_fp32_to_file',
50+
'generalized_to_file', 'unweighted_fp64_to_file',
51+
'weighted_normalized_fp64_to_file',
52+
'weighted_unnormalized_fp64_to_file',
53+
'generalized_fp64_to_file',
54+
'unweighted_fp32_to_file',
4155
'weighted_normalized_fp32_to_file',
4256
'weighted_unnormalized_fp32_to_file',
4357
'generalized_fp32_to_file',

unifrac/_api.pyx

+10-2
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ def ssu_inmem(object table, object tree,
4141
unifrac_method : str
4242
The requested UniFrac method, one of {unweighted,
4343
weighted_normalized, weighted_unnormalized, generalized,
44+
unweighted_fp64, weighted_normalized_fp64,
45+
weighted_unnormalized_fp64, generalized_fp64,
4446
unweighted_fp32, weighted_normalized_fp32,
4547
weighted_unnormalized_fp32, generalized_fp32}
4648
variance_adjust : bool
@@ -83,7 +85,7 @@ def ssu_inmem(object table, object tree,
8385
met_py_bytes = unifrac_method.encode()
8486
met_c_string = met_py_bytes
8587

86-
if '_fp32' in unifrac_method:
88+
if '_fp64' not in unifrac_method:
8789
numpy_arr_fp32 = _ssu_inmem_fp32(inmem_biom, inmem_tree, met_c_string,
8890
variance_adjust, alpha, bypass_tips,
8991
n_substeps)
@@ -196,6 +198,8 @@ def ssu_fast(str biom_filename, str tree_filename, object ids,
196198
unifrac_method : str
197199
The requested UniFrac method, one of {unweighted,
198200
weighted_normalized, weighted_unnormalized, generalized,
201+
unweighted_fp64, weighted_normalized_fp64,
202+
weighted_unnormalized_fp64, generalized_fp64,
199203
unweighted_fp32, weighted_normalized_fp32,
200204
weighted_unnormalized_fp32, generalized_fp32}
201205
variance_adjust : bool
@@ -241,7 +245,7 @@ def ssu_fast(str biom_filename, str tree_filename, object ids,
241245
tree_c_string = tree_py_bytes
242246
met_c_string = met_py_bytes
243247

244-
if '_fp32' in unifrac_method:
248+
if '_fp64' not in unifrac_method:
245249
numpy_arr_fp32 = _ssu_fast_fp32(biom_c_string, tree_c_string,
246250
ids.__len__(), met_c_string,
247251
variance_adjust, alpha, bypass_tips,
@@ -365,6 +369,8 @@ def ssu(str biom_filename, str tree_filename,
365369
unifrac_method : str
366370
The requested UniFrac method, one of {unweighted,
367371
weighted_normalized, weighted_unnormalized, generalized,
372+
unweighted_fp64, weighted_normalized_fp64,
373+
weighted_unnormalized_fp64, generalized_fp64,
368374
unweighted_fp32, weighted_normalized_fp32,
369375
weighted_unnormalized_fp32, generalized_fp32}
370376
variance_adjust : bool
@@ -529,6 +535,8 @@ def ssu_to_file(str biom_filename, str tree_filename, str out_filename,
529535
unifrac_method : str
530536
The requested UniFrac method, one of {unweighted,
531537
weighted_normalized, weighted_unnormalized, generalized,
538+
unweighted_fp64, weighted_normalized_fp64,
539+
weighted_unnormalized_fp64, generalized_fp64,
532540
unweighted_fp32, weighted_normalized_fp32,
533541
weighted_unnormalized_fp32, generalized_fp32}
534542
variance_adjust : bool

0 commit comments

Comments
 (0)