Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a roadmap #4

Open
13 of 40 tasks
patcon opened this issue Feb 3, 2025 · 2 comments
Open
13 of 40 tasks

Create a roadmap #4

patcon opened this issue Feb 3, 2025 · 2 comments

Comments

@patcon
Copy link
Member

patcon commented Feb 3, 2025

Going to start scaffolding out a loose roadmap. For now, it will be a single issue list of tasks, just as things take shape.

This is a work-in-progress body of text.

  • admin
  • data loading
    • from conversation_id
    • from report_id
    • from local files
    • from remote directory path
    • infer moderation type when importing from API
    • infer moderation type from API when loading local or remote CSVs
      • allow override, in case API mismatches state when archived
  • dimensional reduction steps
    • add support for minimal basic PCA for dimensional reduction algorithm
    • properly account for meta-tids in PCA step
    • add support for power iteration method PCA
    • research and/or add support for more advanced dimensional reduction algorithms (see below)
  • clustering steps
    • implement basic kmeans (with random initialization or kmeans++)
    • find best k via silhouette scores
    • initialize kmeans centers from existing centers output from polismath (these should almost always match, barring k-smoothing effects)
    • figure out how often math ticks occur (number of votes? passing of time? every vote capped to minimum time?)
    • implement k-smoothing functionality
    • reproduce last 4 math_tick pca results from raw votes, and confirm clusters match with k-smoothing
  • implement presentation helpers to simplify common analysis from jupyter notebooks (see below)
    • reproduce all the analysis notebooks using this library
  • add support for bucketizing participants with kmeans (default k=100)
  • implement comment projection on graph
  • Add function to calculate representativeness of statement for groups #22
  • presentation
    • render participants on plot
    • render projected comments on plot
    • render concave hulls around clusters
  • Add fundtion to calculate group aware consensus (gca) scores on statements #23
  • calculate comment priority on statements
  • migrate presentation helpers from matplotlib to plot.ly
  • migrate from pandas to polars for dataframes (faster rust implementation)
  • bonus

Meta

@patcon patcon changed the title Roadmap: work-in-progress Roadmap Feb 4, 2025
@patcon
Copy link
Member Author

patcon commented Feb 14, 2025

Got a doc generation site auto-deploying: https://polis-community.github.io/red-dwarf/

@nicobao
Copy link
Member

nicobao commented Feb 18, 2025

It's nice you wrote all this. It would be nice to prioritize now.
In general, I think priority should be about making a simple library, a la polislite (Eric's work). So, no UI, no internal state, no complex metadata / math_tick, etc.

Some notes:

properly account for meta-tids in PCA step

=> I think metadata are orthogonal to the clustering. At least for a first version.

figure out how often math ticks occur (number of votes? passing of time? every vote capped to minimum time?)

=> I don't think we should care, since it's an advanced feature that requires holding state. I don't think the lib function should hold any state. The user of the library can remember when they last updated the conversation polis data by using the lib.

What's interesting is incremental re-calculation for performance purpose, but this should come only later, and only if really needed, for performance reason. We can simply make a specific API for incremental recalculation, and it's up to the library user to store the state of the incremental results for each conversation.

I will open an issue to describe the functions API I have in mind for the basic functionalities of the library.

@patcon patcon changed the title Roadmap Create a roadmap Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants