Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information on requirements for osm example #61

Closed
birdsarah opened this issue Feb 9, 2016 · 13 comments
Closed

Add information on requirements for osm example #61

birdsarah opened this issue Feb 9, 2016 · 13 comments
Assignees

Comments

@birdsarah
Copy link

The package castra is not mentioned in the README.md.

I tried conda install -c https://conda.anaconda.org/calex castra (the only copy of castra for py34 on osx-64 on anaconda) but I got the error:

    TypeError                                 Traceback (most recent call last)
----> 1 df = dd.from_castra('data/osm.castra')
      2 df.tail()

/Users/caged/miniconda3/envs/datashader/lib/python3.4/site-packages/dask/dataframe/io.py in from_castra(x, columns)
    567     from castra import Castra
    568     if not isinstance(x, Castra):
--> 569         x = Castra(x, readonly=True)
    570     return x.to_dask(columns)
    571 

TypeError: __init__() got an unexpected keyword argument 'readonly'
@birdsarah
Copy link
Author

psutil also appears to be required (referenced by dask)

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-3-a2de6cc720e8> in <module>()
      3                 x_range=(-bound, bound), y_range=(-bound, bound))
      4 
----> 5 with ProgressBar(), Profiler() as prof, ResourceProfiler(0.5) as rprof:
      6     agg = cvs.points(df, 'x', 'y', ds.count())

/Users/caged/miniconda3/envs/datashader/lib/python3.4/site-packages/dask/diagnostics/profile.py in __init__(self, dt)
    136     """
    137     def __init__(self, dt=1):
--> 138         self._tracker = _Tracker(dt)
    139         self._tracker.start()
    140         self.results = []

/Users/caged/miniconda3/envs/datashader/lib/python3.4/site-packages/dask/diagnostics/profile.py in __init__(self, dt)
    173     """Background process for tracking resource usage"""
    174     def __init__(self, dt=1):
--> 175         import psutil
    176         Process.__init__(self)
    177         self.daemon = True

ImportError: No module named 'psutil'

@jcrist
Copy link
Collaborator

jcrist commented Feb 9, 2016

Yeah, that example isn't meant to be rerun without a bit of work. The user needs to download the data, build the castra, and then run the example. I actually wanted to store that notebook in the repo in pre-run form (or not at all. ping @jbednar for more opinions on this). It's a non-trivial time commitment to setting up the dataset to be processed.

@birdsarah
Copy link
Author

Maybe take it out of examples?

@birdsarah
Copy link
Author

(this repo's probably going to get a lot of attention after the upcoming webinar, so even if you plan to add it back later, it might be better out for now....or with big warnings in the top)

@jbednar
Copy link
Member

jbednar commented Feb 9, 2016

I'd prefer having a link to instructions for how to build the castra, but with warnings that it's complicated (and explain why that is).

@jbednar
Copy link
Member

jbednar commented Feb 9, 2016

As for committing the pre-run notebook, I'll be updating the repo to point people to an Anaconda cloud repository where they can see all of the pre-run notebooks. Right now that's on an account for me, but it should probably be moved to one set up just for datashader...

@jbednar
Copy link
Member

jbednar commented Feb 10, 2016

@jcrist, will you be able to post a recipe for how to build the castra? I think this would be very useful to people who want to think about setting up their own workflow for processing huge files...

@epifanio
Copy link

I was just wondering about the same, I followed the webcast and I'll be really interested in knowing how to build the castra dataset used in the osm notebook.

I got datashader and castra installed on OSX (homebrew python)
the census example runs fine but the osm example requires data download and conversion to castra.

From the castra documentation seems to me that a conversion is possible if we:

  1. download the osm data into something readable by pandas (csv?)
  2. import the csv into python as pandas dataframe
  3. convert the pandas dataframe to a castra dataset

Am I correct?
Is it possible to have info on how to download the osm data to have it compatible (same attribute columns) with the datashader notebook example?

@jbednar jbednar added the ready label Mar 10, 2016
@epifanio
Copy link

epifanio commented Apr 9, 2016

Any info on how to run the OSM example?

running the provided notebook osm.ipynb, I have the following log (the same happen for the census.ipynb):

df = dd.from_castra('data/osm.castra')
df.tail()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-79137965758d> in <module>()
----> 1 df = dd.from_castra('data/osm.castra')
      2 df.tail()

/home/epifanio/anaconda3/lib/python3.5/site-packages/dask/dataframe/io.py in from_castra(x, columns)
    612     from castra import Castra
    613     if not isinstance(x, Castra):
--> 614         x = Castra(x, readonly=True)
    615     return x.to_dask(columns)
    616 

/home/epifanio/dev/castra/castra/core.py in __init__(self, path, template, categories, readonly)
    139         else:
    140             raise ValueError(
--> 141                 "must specify a 'template' when creating a new Castra")
    142 
    143     def _empty_dataframe(self):

ValueError: must specify a 'template' when creating a new Castra

@jbednar
Copy link
Member

jbednar commented Apr 11, 2016

That castra error just means that you don't have the file available locally. For the census data, there are links for you to download the census.castra files; just download that, unpack it as the examples/README.md says, and you should be fine for the Census example. The same would be true for the OSM example, except that we haven't made that data available for download because of its size. Instead we were going to post the instructions for how to build the castra files from the original source files, but we haven't yet had a chance to clean those up and test them on a different machine. So, you should be fine for the census example, but not yet for OSM.

@zhmiao
Copy link

zhmiao commented Jan 25, 2017

Where can I download the census.castra files?

@jbednar
Copy link
Member

jbednar commented Jan 25, 2017

The discussion of the Census castra files above is out of date; the Census data is now provided as census.h5 with the download scripts described in the examples directory. You can convert that to Castra or whatever you like once loaded.

@jbednar
Copy link
Member

jbednar commented May 24, 2017

The OSM data is also now available, or at least a 1-billion point subset of it, so this issue can be closed.

@jbednar jbednar closed this as completed May 24, 2017
@jbednar jbednar removed the ready label May 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants