Algorithm Inventory

There are 3 main types of algorithms used in Polislike pipelines:

an imputation algorithm, for "filling in" all missing values in the vote matrix due to non-voted statements.
a dimensional reduction algorithm, for projecting the high-dimensional structure into a 2D/3D plot.
a clustering algorithm, for automatically assigning labels to participants, placing them in groups.

extended dimensional reduction algos of interest
- comparison spreadsheet
- t-SNE
  - 🙁 hyperparams matter a lot, but really hard to set
    - two simple clusters can look like weird objects or random noise if mis-set
  - 🙁 cluster sizes don't mean anything. (big, small, rods, donuts)
  - 🙁 "likes to split data up into false clusters"
  - code docs: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
- UMAP
  - preserves local structure. loses global structure.
  - depends heavily on PCA initialization (not random)
  - code: https://github.com/lmcinnes/umap
- LargeVis
- TriMAP
  - works with triplets of points
  - depends heavily on PCA initialization (not random)
  - 🙂 beautiful results for 3D mammoth sample set
- weighted EMPCA
  - https://github.com/sbailey/empca/
  - https://github.com/jakevdp/wpca
- PaCMAP
  - non-parameterized
  - alternative to t-SNE
  - preserves both local and global structure, usually when each is important.
    - global structure preservation comparable with TriMAP
    - local structure preservation comparable with UMAP & t-SNE
  - works well with random initialization (no prelim PCA needed)
  - modelled on attractive and repulsive forces. aligns with plurality research.
  - very computationally efficient
  - code: https://github.com/YingfanWang/PaCMAP
  - video: https://www.youtube.com/watch?v=sD-uDZ8zXkc
- LocalMAP
  - non-parameterized
  - improvement on PaCMAP
  - code: https://github.com/williamsyy/LocalMAP
- ParamRepulsor
  - parameterized PaCMAP
  - code: https://github.com/hyhuang00/ParamRepulsor

Note: Most dimensional reduction methods don't work on sparse data, and PCA and UMAP have specifically been used because they can accomodate it. There is a possibility that TabPFN, a foundation model for tabular data, could be used to fill in the DNA of missing values, and allow other dimensional reduction algos to be used.

See:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Algorithm Inventory

Uh oh!

Uh oh!

Clone this wiki locally