Add information on requirements for osm example #61

birdsarah · 2016-02-09T04:45:14Z

The package castra is not mentioned in the README.md.

I tried conda install -c https://conda.anaconda.org/calex castra (the only copy of castra for py34 on osx-64 on anaconda) but I got the error:

    TypeError                                 Traceback (most recent call last)
----> 1 df = dd.from_castra('data/osm.castra')
      2 df.tail()

/Users/caged/miniconda3/envs/datashader/lib/python3.4/site-packages/dask/dataframe/io.py in from_castra(x, columns)
    567     from castra import Castra
    568     if not isinstance(x, Castra):
--> 569         x = Castra(x, readonly=True)
    570     return x.to_dask(columns)
    571 

TypeError: __init__() got an unexpected keyword argument 'readonly'

The text was updated successfully, but these errors were encountered:

birdsarah · 2016-02-09T04:46:00Z

psutil also appears to be required (referenced by dask)

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-3-a2de6cc720e8> in <module>()
      3                 x_range=(-bound, bound), y_range=(-bound, bound))
      4 
----> 5 with ProgressBar(), Profiler() as prof, ResourceProfiler(0.5) as rprof:
      6     agg = cvs.points(df, 'x', 'y', ds.count())

/Users/caged/miniconda3/envs/datashader/lib/python3.4/site-packages/dask/diagnostics/profile.py in __init__(self, dt)
    136     """
    137     def __init__(self, dt=1):
--> 138         self._tracker = _Tracker(dt)
    139         self._tracker.start()
    140         self.results = []

/Users/caged/miniconda3/envs/datashader/lib/python3.4/site-packages/dask/diagnostics/profile.py in __init__(self, dt)
    173     """Background process for tracking resource usage"""
    174     def __init__(self, dt=1):
--> 175         import psutil
    176         Process.__init__(self)
    177         self.daemon = True

ImportError: No module named 'psutil'

jcrist · 2016-02-09T04:52:18Z

Yeah, that example isn't meant to be rerun without a bit of work. The user needs to download the data, build the castra, and then run the example. I actually wanted to store that notebook in the repo in pre-run form (or not at all. ping @jbednar for more opinions on this). It's a non-trivial time commitment to setting up the dataset to be processed.

birdsarah · 2016-02-09T04:56:54Z

Maybe take it out of examples?

birdsarah · 2016-02-09T04:58:25Z

(this repo's probably going to get a lot of attention after the upcoming webinar, so even if you plan to add it back later, it might be better out for now....or with big warnings in the top)

jbednar · 2016-02-09T06:13:28Z

I'd prefer having a link to instructions for how to build the castra, but with warnings that it's complicated (and explain why that is).

jbednar · 2016-02-09T14:30:44Z

As for committing the pre-run notebook, I'll be updating the repo to point people to an Anaconda cloud repository where they can see all of the pre-run notebooks. Right now that's on an account for me, but it should probably be moved to one set up just for datashader...

jbednar · 2016-02-10T04:47:53Z

@jcrist, will you be able to post a recipe for how to build the castra? I think this would be very useful to people who want to think about setting up their own workflow for processing huge files...

epifanio · 2016-02-10T20:40:53Z

I was just wondering about the same, I followed the webcast and I'll be really interested in knowing how to build the castra dataset used in the osm notebook.

I got datashader and castra installed on OSX (homebrew python)
the census example runs fine but the osm example requires data download and conversion to castra.

From the castra documentation seems to me that a conversion is possible if we:

download the osm data into something readable by pandas (csv?)
import the csv into python as pandas dataframe
convert the pandas dataframe to a castra dataset

Am I correct?
Is it possible to have info on how to download the osm data to have it compatible (same attribute columns) with the datashader notebook example?

epifanio · 2016-04-09T23:25:53Z

Any info on how to run the OSM example?

running the provided notebook osm.ipynb, I have the following log (the same happen for the census.ipynb):

df = dd.from_castra('data/osm.castra')
df.tail()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-79137965758d> in <module>()
----> 1 df = dd.from_castra('data/osm.castra')
      2 df.tail()

/home/epifanio/anaconda3/lib/python3.5/site-packages/dask/dataframe/io.py in from_castra(x, columns)
    612     from castra import Castra
    613     if not isinstance(x, Castra):
--> 614         x = Castra(x, readonly=True)
    615     return x.to_dask(columns)
    616 

/home/epifanio/dev/castra/castra/core.py in __init__(self, path, template, categories, readonly)
    139         else:
    140             raise ValueError(
--> 141                 "must specify a 'template' when creating a new Castra")
    142 
    143     def _empty_dataframe(self):

ValueError: must specify a 'template' when creating a new Castra

jbednar · 2016-04-11T20:29:39Z

That castra error just means that you don't have the file available locally. For the census data, there are links for you to download the census.castra files; just download that, unpack it as the examples/README.md says, and you should be fine for the Census example. The same would be true for the OSM example, except that we haven't made that data available for download because of its size. Instead we were going to post the instructions for how to build the castra files from the original source files, but we haven't yet had a chance to clean those up and test them on a different machine. So, you should be fine for the census example, but not yet for OSM.

zhmiao · 2017-01-25T19:25:20Z

Where can I download the census.castra files?

jbednar · 2017-01-25T19:39:00Z

The discussion of the Census castra files above is out of date; the Census data is now provided as census.h5 with the download scripts described in the examples directory. You can convert that to Castra or whatever you like once loaded.

jbednar · 2017-05-24T20:23:54Z

The OSM data is also now available, or at least a 1-billion point subset of it, so this issue can be closed.

jbednar assigned jcrist Feb 10, 2016

jbednar added the ready label Mar 10, 2016

jbednar closed this as completed May 24, 2017

jbednar removed the ready label May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add information on requirements for osm example #61

Add information on requirements for osm example #61

birdsarah commented Feb 9, 2016

birdsarah commented Feb 9, 2016

jcrist commented Feb 9, 2016

birdsarah commented Feb 9, 2016

birdsarah commented Feb 9, 2016

jbednar commented Feb 9, 2016

jbednar commented Feb 9, 2016

jbednar commented Feb 10, 2016

epifanio commented Feb 10, 2016

epifanio commented Apr 9, 2016

jbednar commented Apr 11, 2016

zhmiao commented Jan 25, 2017

jbednar commented Jan 25, 2017

jbednar commented May 24, 2017

Add information on requirements for osm example #61

Add information on requirements for osm example #61

Comments

birdsarah commented Feb 9, 2016

birdsarah commented Feb 9, 2016

jcrist commented Feb 9, 2016

birdsarah commented Feb 9, 2016

birdsarah commented Feb 9, 2016

jbednar commented Feb 9, 2016

jbednar commented Feb 9, 2016

jbednar commented Feb 10, 2016

epifanio commented Feb 10, 2016

epifanio commented Apr 9, 2016

jbednar commented Apr 11, 2016

zhmiao commented Jan 25, 2017

jbednar commented Jan 25, 2017

jbednar commented May 24, 2017