-
-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: first draft of MPL artist #200
Conversation
Minimal datashader aware matplotlib artist.
attn @mdboom @astrofrog |
Looks great, thanks! I'll try it out and merge if it's all ok. |
This requires the 2.0 beta to work (lost track of when that private class I also have an idea on how to make the connection to the ds pipeline more On Mon, Jul 18, 2016, 15:42 James A. Bednar [email protected]
|
@tacaswell - would it make sense to expose (as public) the currently private class and private |
Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a |
@tacaswell, I'm not sure how to get the mpl 2.0 beta from conda-forge. It's only offering me 1.5.2:
|
You need to ask for the rc channel as well conda install -c conda-forge/label/rc -c conda-forge matplotlib Sorry for not being clearer about that. On Thu, Jul 21, 2016 at 12:52 PM James A. Bednar [email protected]
|
Very nice! I couldn't get scroll zooming to work, but box zooming was very snappy. I had to make some edits for Python2 compatibility: 0172-jbednar:~/datashader/datashader> diff mpl_ext.py~ mpl_ext.py
14c14
< super().__init__(ax, **kwargs)
---
> super(DSArtist,self).__init__(ax, **kwargs)
48c48
< return (*self.axes.get_xlim(), *self.axes.get_ylim())
---
> return self.axes.get_xlim() + self.axes.get_ylim() We'd want to include a runnable example with the distribution, so I adapted your snippet above into a new file examples/nyc_taxi_mpl.py:
This pair of files worked well for me, anyway! Note that I reversed the colormap, so that it works better on a white background: |
scroll zooming is not one of the default interactions which is why it didn't work 😈 The reversed color map does look much better. |
Overall, it seems like this approach will work well for simple datashader pipelines, as in the case illustrated above (basically anything supported by datashader.pipeline.Pipeline). But it won't support more complex pipelines, where it's not just the reduction (argument "agg" of DSArtist) that needs to be overridden, but the pipeline itself. E.g. in the census example, there are user-defined operations on the aggregate array before it is displayed:
I'm not sure how a user could inject the Those examples also use It seems like it would be more general if mpl_ext could support a create_image() callback instead of the current approach, as datashader.InteractiveImage does now, so that users could supply any arbitrary pipeline in just a few lines of code. Supporting a create_aggregate() callback would also be useful, so that users could employ mpl's own colormapping, though I'm not sure how that would work for categorical information. |
| It seems like it would be more general if mpl_ext could support a create_image() callback | Another reasonable option would be to move this artist into MPL and tweak the API so that all of the datashader dependency is injected as a pipeline argument. These two suggestions may amount to the same thing; if so then it's clear how to move forward! |
My current best thought is to have the users provide a callback which has a signature like def ds_cb(canvas, data):
return float_or_int_img mpl has support for discrete color maps (if your norm returns integers the values are used as direct lookups in the color table). Given that mpl users already know how to use the mpl colorization code (one hopes), I would greatly prefer that level get delegated to us, but making this class smart enough to check if it got back a I agree that we seem to be in agreement. attn @story645 (who is a GSOC student working on integrating categorical plotting into mpl) |
I agree that we'd want as much of the processing to use mpl's code as is practical, to help integrate it more easily into mpl users' workflows and make it more familiar to them. Let datashader do what datashader is best at, and let mpl handle the rest! mpl's discrete colormap support may work for categorical information, but I don't know enough about it to be sure. datashader.tf.colorize() does use discrete colors, but it then (a) mixes those discrete colors according to the counts in each category for that pixel, and (b) adjusts the alpha value of the color from a continuous range, depending on the total count for that pixel compared to the others. So the result is an arbitrarily large set of colors, starting from the nominally discrete colormap like the 5 base colors used here: Not sure if that's similar to what mpl supports or will support. |
There is not support of the catagorical blending (yet but we have been talking about generalizing the norm/colormap chain for a while now). |
Ok, then it sounds like supporting both NxMx1 and NxMx4 would be good in the meantime. |
I'd love to get matplotlib support into datashader. Any progress on addressing some of the issues above? |
Sorry, I have been swamped with other work. |
@tacaswell @jbednar Have there been any updates to this? Looking at the travis output, it seems to work fine with python 3. I will see how far I can get with the example above. |
I realize my question is more related to usage (and probably just demonstrates my unfamiliarity with datashader and matplotlib) and is not specifically related to accepting/updating this PR. This was just the best resource I found when searching how to use datashader with matplotlib. Let me know if you prefer I move this to stack overflow and I will delete this. Following the above example, with Here is what I get using datashader with This is the much fainter version I get using matplotlib: This screenshot shows what I have been trying. Any thoughts @tacaswell? |
For now, this PR's discussion is fine as a place to collect anything about MPL support for datashader. I'm surprised that you aren't seeing comparable results between MPL's shading and the linear shading in datashader. It would be good to post a side by side comparison using the same colormap and ranges with linear mapping; those should be at least very nearly the same regardless of who is doing the colormapping. |
As a side note, I am including a reference to this in mpl's GSoC ideas list. |
Great! I'd be happy to work with a GSoC-er to help make this move forward. We are hoping to have funding for datashader start up again soon, and we'll be putting various functions in place that will help make it simpler to build legends, colorbars, etc. Those functions should help any downstream plotting library to summarize what's in the plot accurately and easily. |
In addition to the general question on how to make the matplotlib version match the datashader version, I have two specific questions.
|
To answer that, can you please post the same size image from both mpl and datashader colormapping, using a grayscale colormap, with linear mapping? You might have already provided enough info above, but I can't find any pair of images that should truly be mathematically identical, which is always the safest place to start. Grayscale should be comparable across all libraries.
I'm not aware of any histogram equalization option in matplotlib or bokeh, or else we probably would have just used those instead of adding our own to datashader. It would be very convenient if plotting libraries would support eq_hist directly, which would make it simpler to have meaningful colorbars, legends, and hover information. MPL is welcome to steal our eq-hist code; it's only 15 lines of Numpy-based Python, adapted from scikit-image.
See: http://matplotlib.org/examples/pylab_examples/custom_cmap.html |
See http://matplotlib.org/users/colormapnorms.html for details of how the color mapping process in mpl works. Also, turn the DPI down on the mpl plots, the spatial bins passed to datashader are set by the physical pixels in the axes. |
Thanks for the links. Based on those, I created a black and white palette and colormap that should match
then defined the image size and plot ranges
then generated the datashader plot using
Note that datashader uses pixel unit and matplotlib uses inches and dpi. You can see the code I used to convert between these and eliminate the axes on the matplotlib plot so the image uses the entire space. I verified these are each 600x300 pixels
|
Looks like the mpl linear version isn't respecting the NaN mask in the same way, but it's good to see the logarithmic versions matching. |
Looks to me like this is happening because different renderers are drawing at different resolutions. My hypothesis is that the inline backend is using the hi-dpi option (presumably because you have a macbook) and therefore sampling at a higher resolution than you get when saving the datashader plot directly or using matplotlib to save it. |
Right -- the results will vary a lot at different resolutions, by design, though you can use tf.spread() or tf.dyn_spread() to ensure that individual dots are visible at high resolutions. |
BTW, note that recent dev releases of HoloViews now support datashader, with matplotlib or any other backend. Here's an example: https://anaconda.org/jbednar/census-hv-mpl/notebook |
I noticed this thread on twitter, it reminded me to put in a matplotlib backend for vaex based in ipympl. The code lives here and might be useful for this discussion, since it tries to attach a similar problem. What might be useful is the debounced decorate that I use for instance here that will only execute after 0.5 seconds have passed, to avoid many update when moving and zooming. It only works when there is an ipykernel, for Qt you need a different debounce method (should have that code somewhere). |
datashader is not made available on pypi, a solution that could also support "pypi-compatible" alternatives, like "mpl-scatter-density", would be great. |
I couldn't do: |
@ruiyangl, to get this experimental code, you would have to check out the branch of datashader associated with this PR, and use that instead of any released datashader version. We'd be happy to merge support for Matplotlib into Datashader whenever this PR can be completed. In the meantime, you can use HoloViews+Matplotlib to see static Datashader output inside a Matplotlib plot as mentioned above. |
Is this available in the current version? |
The HoloViews support for Matplotlib+Datashader is in any recent version, but is only Agg (image) based (not interactive). No one has ever finished up this PR and made it mergeable, but if anyone wants to take it on, I'd be very happy to help get it merged! Meanwhile, https://github.com/astrofrog/mpl-scatter-density does similar things (though only for points). |
BTW, Datashader is available on PyPi, nowadays (since 2018 at least). |
I'm pretty new to Just to be clear, there is no way to use |
This PR allows creating a Matplotlib artist that uses Datashader internally; it doesn't accept any Datashader pipelines you've already created. So there's no conversion involved. I'm not sure what you mean by a Datashader layout or why that would defeat the purpose, but the purpose of this PR is to create an object that will interactively re-draw on zooming; you can already create a matrix that you can plot manually with Matplotlib. If that's not what you're asking, please elaborate! |
Sorry, by datashader layout I meant the hammer_bundle algorithm http://datashader.org/api.html#datashader.bundling.hammer_bundle I noticed it returns many more rows than there are nodes. What are these rows? Is it possible to use this output to plot using Regarding "defeating the purpose" I was referring to fact that the output of Hope that makes sense! |
Ah, I see what you mean. I don't think this PR will help you. This PR hard-codes a call to cvs.points() to plot a 2D histogram of points, whereas a bundled-edge network graph requires line plotting, not point plotting. Each row of the output of These rows can be visualized using any plotting program capable of plotting line segments, including presumably Matplotlib, if you want to write code for that library. But you are correct that for large networks the number of line segments involved will typically be too large for most plotting programs. With that in mind, you can use Datashader on the resulting segment dataframe to create a rectangular array with the fully rendered output, and then display that in any plotting program capable of plotting rectangular arrays, including Matplotlib. This PR won't be useful for any of those options unless it is heavily generalized, so I'd recommend just sticking with the functions described at Datashader.org, using the output from them with whatever your favorite plotting library is. |
It would be great to have a more configurable version of this PR, especially if it works with ipympl so that it would be interactive in notebooks, but for the moment, I think most or all of what it provides is covered by mpl_scatter_density (zoomable point density plots) and the Matplotlib backend already available in HoloViews (not zoomable, but which can be dynamically updated with widgets). So I'll close this for now, but we would welcome more full featured Matplotlib support for Datashader whenever anyone wants to work on it. |
This should work with ipympl out of the box and tweaking it to take in a pre-configured pipeline should be straight forward. |
Are you interested in making it configurable in that way? If not, do you consider it useful in its current form? If so I'm happy to merge; it just seemed to be sitting here getting more stale while still being labeled "draft". If it works with ipympl I could put an example showing how to do this on examples.pyviz.org for people to run, which would make a cool demo... |
I suspect it would be much easier for someone familiar with datashader to make that change, I don't know what people would expect the API to be. |
Reopening this PR so that we don't forget about it; the work that there is to do is relatively minor and at the Datashader side, so if we ever get a chance to look at it, we should be able to merge this. |
Closing in favor of #939. |
Minimal datashader aware matplotlib artist.
This is using DS to just do the binning and then re-using mpl's existing normalization and color-mapping tools.