prototypes: data.py, data.r, analysis.py, analysis.r #94

magland · 2024-06-26T21:37:39Z

This is a redo of #74

In addition to being based on the new main branch, the difference is that it uses an unobtrusive method for triggering the prototypes view.

See this comment

// We want to be as unobtrusive as possible, so we're not going to hook into the
// routing system for the application. We just have this test for
// prototypes, which is a query parameter that can be set to 1 to enable the
// prototypes. Then below we just render the Prototypes Window on this
// condition.
const query = new URLSearchParams(window.location.search)
const prototypesMode = query.get('prototypes') === '1'

So you would do http://localhost:3000?prototypes=1

The idea here is to be able to develop the functionality of generating data.json using data.py or data.r (and also create plots using analysis.py) and then later decide how to integrate it into the UI. It think it would be a mistake to try to do both at the same time.

WardBrian

At first look the functionality here looks really good! Have you had a go at plot outputs from webr yet?

A few simplifying suggestions and a question:

gui/src/app/prototypes/DataPyPrototype/DataPyFileEditor.tsx

WardBrian · 2024-06-27T14:01:05Z

gui/src/app/prototypes/AnalysisPyPrototype/AnalysisPyFileEditor.tsx

+    const [status, setStatus] = useState<'idle' | 'loading' | 'running' | 'completed' | 'failed'>('idle')
+
+    const loadPyodideInstance = useMemo(() => {
+        let pyodide: PyodideInterface | null = null


Is there any hope of sharing code between this and the data file? Not just for code organization reasons, but I also noticed that this seems to re-download pyodide if you run data.py and then switch and run analysis.py

We need to make a decision on whether to use the same pyodide instance for data.py and analysis.py. Since they are going to be depending on a different set of packages (analysis.py will potentially be a lot heavier), I suggest that we use two separate instances even if it means we download twice.

Having the variables available from the data script (or at the very least, the data variable) has some value, I think

WardBrian · 2024-06-27T14:02:48Z

gui/src/app/prototypes/AnalysisPyPrototype/AnalysisPyPrototype.tsx

+    height: number
+}
+
+const exampleScript = `import sys


Just to note, the stanio package will probably also be helpful here in terms of preparing a useful output object from the draws. You can see what I'm doing in the normal python tinystan here:

https://github.com/WardBrian/tinystan/blob/main/clients/python/tinystan/output.py

I'm wondering whether we should use a pandas dataframe (long form, all chains in one). We want something that will stand the test of time so that old SP projects won't stop working.

If you want compatibility with existing packages, your best bet will either be something like the tinystan object (which is basically just a dict with ndarray entries), or for maximum interoperability something like ArViz's inferencedata object

I think simpler is better. And we should also try to make it as consistent as possible between R and Python.

R is a slightly different story, but it has a similar sort of lock-in by a package called posterior. If whatever we provide is not at least easily convertible to something posterior can understand, it won't be used

One approach could be to provide the data to the script in as raw a format as possible. Then have a standard first few of lines that load it into posterior, or arviz, or pandas, etc. Those conversion lines would be part of the analysis.py/analysis.r.

That seems fine to me

magland · 2024-06-27T15:39:29Z

At first look the functionality here looks really good! Have you had a go at plot outputs from webr yet?

I did look into it, but it's a lot less straightforward than for pyodide (a lot of ugly message passing). I suggest we hold off on that until after we are able to incorporate the analysis.py into the UI.

WardBrian · 2024-06-27T15:55:56Z

I did look into it, but it's a lot less straightforward than for pyodide (a lot of ugly message passing). I suggest we hold off on that until after we are able to incorporate the analysis.py into the UI.

Sounds good. We may want to look at how the Quarto plugin for webr does things: https://github.com/coatless/quarto-webr/blob/8795c3d75fdc8348734d6c05f1ba6350fb225f8e/_extensions/webr/qwebr-compute-engine.js

WardBrian · 2024-06-27T16:32:36Z

Quarto plugin for webr

Here is a very minimal example showing a plot loaded in the same way they do it: https://stackblitz.com/edit/vitejs-vite-6wuedv?file=src%2FApp.tsx

magland · 2024-06-28T13:57:51Z

@WardBrian I have added support for analysis.r based on your hints.

WardBrian · 2024-06-28T13:59:37Z

Nice! Trying it out now

WardBrian · 2024-07-01T13:39:11Z

A few notes before this can move to the non-prototype stage:

I noticed yarn build is raising a bunch of warnings on this branch, possibly related to the pyodide instructions here: https://pyodide.org/en/stable/usage/working-with-bundlers.html#vite
Similarly, I think it would make sense to tell vite to separately chunk webr and pyodide outside of index.js
Pyodide is currently blocking the main thread. It has web worker support, but it seems to be something you need to do manually: https://pyodide.org/en/stable/usage/webworker.html -- WebR seems to be in a worker already/by default.

magland · 2024-07-01T17:25:20Z

A few notes before this can move to the non-prototype stage:

I noticed yarn build is raising a bunch of warnings on this branch, possibly related to the pyodide instructions here: https://pyodide.org/en/stable/usage/working-with-bundlers.html#vite

If I follow the link in the warning it goes to
https://vitejs.dev/guide/troubleshooting.html#module-externalized-for-browser-compatibility
which explains that the problem is with the pyodide code. I tried following those bundling instructions that you provided, but it didn't help. I could be wrong, but I think we just need to ignore those warnings.

Similarly, I think it would make sense to tell vite to separately chunk webr and pyodide outside of index.js

I tried lazy importing which resulted in code splitting. However, the split off chunks were just around 16 kb for pyodide and 60 kb for webr... so not worth it. The bulk of the source code seems to be dynamically loaded at runtime.

Pyodide is currently blocking the main thread. It has web worker support, but it seems to be something you need to do manually: https://pyodide.org/en/stable/usage/webworker.html -- WebR seems to be in a worker already/by default.

I agree it would be better to do pyodide in a web worker. But I don't think this should "block" us right now since I wouldn't expect user scripts to be very computationally intensive. You can correct me on that.

WardBrian · 2024-07-01T19:02:00Z

Computationally intensive in terms of number-crunching no, but I certainly think a common use case for data.py will be "fetch this existing dataset from the internet and munge it", so freezing during that fetch could be quite annoying

magland · 2024-07-01T19:23:21Z

Computationally intensive in terms of number-crunching no, but I certainly think a common use case for data.py will be "fetch this existing dataset from the internet and munge it", so freezing during that fetch could be quite annoying

If you feel this is important, I can take a crack at it for data.py. But for analysis.py I think this will prove difficult (if not very difficult) because of the need for this magic:

(document as any).pyodideMplTarget = outputDiv;

WardBrian · 2024-07-01T19:31:00Z

I think it is at least worth figuring out if it is possible during this prototyping phase. The fact that jupyterlite doesn't freeze its UI means there must be something there, but it definitely seems possible that if we see what the worker solution requires we may decide that the cure is worse than the disease

magland · 2024-07-01T19:42:08Z

I think it is at least worth figuring out if it is possible during this prototyping phase. The fact that jupyterlite doesn't freeze its UI means there must be something there, but it definitely seems possible that if we see what the worker solution requires we may decide that the cure is worse than the disease

Okay, I'll do it for data.py. (I'm almost positive that jupyterlite use a much more complex system to produce graphical outputs than simply a document.pyodideMplTarget = div ... so I'll hold off on the analysis.py part)

WardBrian · 2024-07-01T20:03:10Z

It seems like there are a few different solutions out there, see pyodide/matplotlib-pyodide#6 (comment)

Most of them look like something not too dissimilar to the webR code where you can end up with a list of pictures and separately a list of text outputs from a given program, and then you need to draw them back on the main thread.

magland · 2024-07-01T20:29:37Z

@WardBrian I've got data.py running with a worker.

Do you imagine a need for worker on analysis.py at this point?

WardBrian · 2024-07-01T20:52:32Z

Here's a working (but quite messy) example of pyodide + matplotlib in a worker, in a similar style to how we do webR now: https://gist.github.com/WardBrian/d89e939c43134ea405fd68084e702ddf

I think it is possible that analysis scripts could take a long time to run (in particular, big plots take a long time sometimes), and this also lets us add a 'cancel' button which seems valuable. This should also work for any package which is built on top of matplotlib like seaborne, though it will be worth testing for that later.

magland · 2024-07-02T00:22:29Z

@WardBrian @jsoules wonder if you could try to track down the build error here. I'm stuck.

There seems to be a problem importing loadPyodide from the web worker dataPyWorker.ts

[vite:worker] Invalid value "iife" for option "output.format" - UMD and IIFE output formats are not supported for code-splitting builds.

This works in vite dev, but fails on yarn build.

WardBrian · 2024-07-02T00:38:19Z

Google suggests adding

  worker: {
    format: 'es',
  },

to the vite config, but I'd be at least a little nervous about this causing issues with the other workers we already have. It's also possible to have the worker be a non-module worker and use importScripts() for pyodide, but that loses some other niceties

magland · 2024-07-02T00:50:22Z

worker: {
format: 'es',
},

This seems to have worked! You definitely have a knack.

WardBrian · 2024-07-02T01:12:16Z

This seems to have worked! You definitely have a knack.

I don't think I'm that far away from having Tetris Dreams for the MDN web worker documentation at this point

magland · 2024-07-02T01:50:23Z

Okay I've implemented analysis.py in a worker... and I've also left in the non-worker case for comparison, and in case we want to fall back to that.

WardBrian

Some notes on the worker -- overall a much cleaner implementation than my hacky thing, and I think it's definitely the way to go. Even just playing around with the small example program we have loaded, it's a much smoother experience.

With an eye toward this becoming a non-prototype, two questions:

Should data.py and analysis.py be using the same Worker code? (I lean toward yes, it seems like a reasonable thing to do)
Should data.py and analysis.py use the same Worker instance? (I lean toward no for an initial implementation)