-
Notifications
You must be signed in to change notification settings - Fork 6
Algorithm Inventory
Patrick Connolly edited this page Aug 21, 2025
·
1 revision
There are 3 main types of algorithms used in Polislike pipelines:
- an imputation algorithm, for "filling in" all missing values in the vote matrix due to non-voted statements.
- a dimensional reduction algorithm, for projecting the high-dimensional structure into a 2D/3D plot.
- a clustering algorithm, for automatically assigning labels to participants, placing them in groups.
- extended dimensional reduction algos of interest
- comparison spreadsheet
- t-SNE
- 🙁 hyperparams matter a lot, but really hard to set
- two simple clusters can look like weird objects or random noise if mis-set
- 🙁 cluster sizes don't mean anything. (big, small, rods, donuts)
- 🙁 "likes to split data up into false clusters"
- code docs: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
- 🙁 hyperparams matter a lot, but really hard to set
- UMAP
- preserves local structure. loses global structure.
- depends heavily on PCA initialization (not random)
- code: https://github.com/lmcinnes/umap
- LargeVis
- TriMAP
- works with triplets of points
- depends heavily on PCA initialization (not random)
- 🙂 beautiful results for 3D mammoth sample set
- weighted EMPCA
- PaCMAP
- non-parameterized
- alternative to t-SNE
- preserves both local and global structure, usually when each is important.
- global structure preservation comparable with TriMAP
- local structure preservation comparable with UMAP & t-SNE
- works well with random initialization (no prelim PCA needed)
- modelled on attractive and repulsive forces. aligns with plurality research.
- very computationally efficient
- code: https://github.com/YingfanWang/PaCMAP
- video: https://www.youtube.com/watch?v=sD-uDZ8zXkc
- LocalMAP
- non-parameterized
- improvement on PaCMAP
- code: https://github.com/williamsyy/LocalMAP
- ParamRepulsor
- parameterized PaCMAP
- code: https://github.com/hyhuang00/ParamRepulsor
Note: Most dimensional reduction methods don't work on sparse data, and PCA and UMAP have specifically been used because they can accomodate it. There is a possibility that TabPFN, a foundation model for tabular data, could be used to fill in the DNA of missing values, and allow other dimensional reduction algos to be used.
See: