Simplified clustering for Agora use-case #7

patcon · 2025-02-22T09:13:27Z

Addresses #6

Things to note:

differences from the provided spec in feat: create a minimal clustering library, tested with unit tests based on real conversations #6
1. run_clustering takes a single Conversation instead of List[Conversation]. this allows calculations on a more minimal interface that doesn't need anything beyond a list of votes. running the method for each conversation seemed reasonable to leave to the implementer. (and there is no performance benefit to batching)
2. statement is the term I prefer over opinion, because in theory these algorithms can be run on metadata and factual beliefs.
3. Conversation. votes_by_participants and votes_by_opinion are less ideal keys on Conversation object than simply votes. If users arriving with data of that shape, we can provide helpers to deconstruct back into vote records.
4. re: projection in Cluster. I prefer the demdis approach of treating output ClusteredParticipant (with projection coords attached to returned participant) as different from any input Participant. My reasoning for this is because participants can't even be grouped without projected coords, so projected data are basic and should be returned on the participant, not a separate key. (The way polis deconstructs all data into separate keys is not something I'd like to emulate, as it's complicated to make sense of when interpreting the data -- counting length of arrays etc.)
5. ClusteringResult object instead of Clusters leaves more room for expanded metadata
6. for now, opting for participants over members (less disjoint terminology seems better)
misc other details
1. this method only uses the stateless methods from reddwarf.utils.
2. all methods in reddwarf.utils are fully documented on docs website
3. unit test cases are partially written for these reddwarf.utils methods (see "Todos" section)
4. unit tests are now run on each commit

Todos

reddwarf/types/agora.py

patcon · 2025-02-22T19:44:27Z

As of right now, I'm thinking that more unit test coverage is the bulk of the remaining work before I'd feel good about merging this (obviously pending your feedback!)

nicobao · 2025-02-22T20:36:10Z

Thank you I will take a look soon!

(Sorry I clicked ready for review by mistake)

patcon · 2025-02-22T23:43:53Z

Would also like to add code coverage repos in PRs, and for this to show full coverage of reddwarf.utils :) will add above for todos

patcon · 2025-02-23T07:36:25Z

Oh! One more thing I'm thinking on.

I swapped from Enum with str for agree/disagree/pass to IntEnum for now, just because the internal algos expect -1, 0, 1 for votes, and the missing values being nan/None. Implementers can use any strings or numbers they want, but they'd need to transform their vote data to pass in data like this:

from enum import IntEnum

class VoteValueEnum(IntEnum):
    UP: 1
    NEUTRAL: 0
    DOWN: -1

We can keep the internal language neutral, and people can transform whatever data they like into votes in this format. For example:

## DEMDIS votes could map like this before passing in

# See: https://github.com/Demdis/Clustering-types/blob/main/types.py
class DemdisVoteValueEnum(str):
    AGREE = "agree"
    DISAGREE = "disagree"
    SKIP = "skip"

DEMDIS_VOTE_MAPPING = {
    DemdisVoteValueEnum.AGREE: VoteValueEnum.UP,
    DemdisVoteValueEnum.SKIP: VoteValueEnum.NEUTRAL,
    DemdisVoteValueEnum.DISAGREE: VoteValueEnum.DOWN,
}
votes_from_demdis = [{k: DEMDIS_VOTE_MAPPING[val] if k == "vote" else val for k, val in v.items()} for v in votes]

## AGORA could map like this

class AgoraVoteEnum(Enum):
    AGREE = "agree"
    DISAGREE = "disagree"

AGORA_VOTE_MAPPING = {
    AgoraVoteEnum.AGREE: VoteValueEnum.UP,
    AgoraVoteEnum.DISAGREE: VoteValueEnum.DOWN,
}
votes_from_agora = [{k: AGORA_VOTE_MAPPING[val] if k == "vote" else val for k, val in v.items()} for v in votes]

## AGORA could test Polis data without pass values like this

SIMULATED_AGORA_VOTE_MAPPING = {
    VoteValueEnum.UP: VoteValueEnum.UP,
    VoteValueEnum.NEUTRAL: None,
    VoteValueEnum.DOWN: VoteValueEnum.DOWN,
}
votes_from_polis_agora = [{k: SIMULATED_AGORA_VOTE_MAPPING[val] if k == "vote" else val for k, val in v.items()} for v in votes]

## Someone could map LIKERT scale data to the algos one of these ways
# (like was done for JapanChoice.jp/polis recently)

class LikertScale(IntEnum):
    STRONGLY_AGREE: 1
    AGREE: 2
    NEUTRAL: 3
    DISAGREE: 4
    STRONGLY_DISAGREE: 5

# One option
LIKERT_LIBERAL_MAPPING = {
    LikertScale.STRONGLY_AGREE:    VoteValueEnum.UP,
    LikertScale.AGREE:             VoteValueEnum.UP,
    LikertScale.NEUTRAL:           VoteValueEnum.NEUTRAL,
    LikertScale.DISAGREE:          VoteValueEnum.DOWN,
    LikertScale.STRONGLY_DISAGREE: VoteValueEnum.DOWN,
}

# Another option
LIKERT_CONSERVATIVE_MAPPING = {
    LikertScale.STRONGLY_AGREE:    VoteValueEnum.UP,
    LikertScale.AGREE:             VoteValueEnum.NEUTRAL,
    LikertScale.NEUTRAL:           VoteValueEnum.NEUTRAL,
    LikertScale.DISAGREE:          VoteValueEnum.NEUTRAL,
    LikertScale.STRONGLY_DISAGREE: VoteValueEnum.DOWN,
}
votes_from_likert = [{k: LIKERT_CONSERVATIVE_MAPPING[val] if k == "vote" else val for k, val in v.items()} for v in votes]

reddwarf/types/agora.py

nicobao · 2025-02-24T16:23:15Z

Addresses #6

cc: @nicobao

Things to note:

differences from the provided spec in feat: create a minimal clustering library, tested with unit tests based on real conversations #6

run_clustering takes a single Conversation instead of List[Conversation]. this allows calculations on a more minimal interface that doesn't need anything beyond a list of votes. running the method for each conversation seemed reasonable to leave to the implementer. (and there is no performance benefit to batching)

Makes sense!

statement is the term I prefer over opinion, because in theory these algorithms can be run on metadata and factual beliefs.

Makes sense!

Conversation. votes_by_participants and votes_by_opinion are less ideal keys on Conversation object than simply votes. If users arriving with data of that shape, we can provide helpers to deconstruct back into vote records.

Makes sense. List[Vote] is sufficient.

re: projection in Cluster. I prefer the demdis approach of treating output ClusteredParticipant (with projection coords attached to returned participant) as different from any input Participant. My reasoning for this is because participants can't even be grouped without projected coords, so projected data are basic and should be returned on the participant, not a separate key. (The way polis deconstructs all data into separate keys is not something I'd like to emulate, as it's complicated to make sense of when interpreting the data -- counting length of arrays etc.)

Agreed.

ClusteringResult object instead of Clusters leaves more room for expanded metadata

Up to you. Not sure what ClusteringResult[0].label means though, since it's just a random value, not a human-understandable label, right? I would call it id? Do we need it at all? What plan do you have for it? Following the same cluster id through multiple iteration of a conversation?

for now, opting for participants over members (less disjoint terminology seems better)

Up to you!

misc other details

this method only uses the stateless methods from reddwarf.utils.

👍

all methods in reddwarf.utils are fully documented on docs website

👍

unit test cases are partially written for these reddwarf.utils methods (see "Todos" section)

👍

unit tests are now run on each commit

👍

Todos

generate unit tests for all reddwarf.utils methods used

generate_raw_matrix()

get_unvotes_statement_ids()

filter_matrix()

run_pca()

run_kmeans()

find_optimal_k()

scale_projected_data()

run code coverage report for any PRs

ensure full coverage for all reddwarf.utils methods

Amazing work! Is it already usable?
I'm leaving few comments. The most important blocking issue for us is the type of the expected participant_id and statement_id. It is too restrictive. We use strings! So I think it should be string | number at least. That would avoid unnecessary headaches for the library user!

nicobao · 2025-02-24T16:29:31Z

As of right now, I'm thinking that more unit test coverage is the bulk of the remaining work before I'd feel good about merging this (obviously pending your feedback!)

yeah go ahead for merging I think you can work on a subsequent PR for unit tests later! :).
Just the string | number thing, I would say I would like to discuss it!

reddwarf/types/agora.py

patcon · 2025-02-24T18:41:50Z

ClusteringResult object instead of Clusters leaves more room for expanded metadata

Up to you. Not sure what ClusteringResult[0].label means though, since it's just a random value, not a human-understandable label, right? I would call it id? Do we need it at all? What plan do you have for it? Following the same cluster id through multiple iteration of a conversation?

You're right, ClusteringResult[0].id is better. label was just me leaking the term from data returned by sklearn's kmeans method, but on second thought label implies too much, so id is better). It's there so that being passed a cluster object (without its list index in the clusters key) still allows orienting and processing.

Amazing work! Is it already usable?

It should be, though reddwarf (and I believe polis) rely on assumption of [incrementing?] numeric statement_id and participant_id for now. Shouldn't be hard to fix though, so I'll take a look.

I agree that strings are ideal, so this makes total sense.

patcon · 2025-02-24T18:46:13Z

yeah go ahead for merging I think you can work on a subsequent PR for unit tests later! :).
Just the string | number thing, I would say I would like to discuss it!

I'll just write unit tests for getting int | str working, and leave the remaining ones for another PR.

(To reassure anyone watching: I have been way out of character for my normal development style by pushing to mainline so far, but once i go into PR mode, it's easy for me to commit to it)

nicobao · 2025-02-24T20:44:09Z

Amazing!
Oh something else I forgot, maybe it's worth renaming the module from agora to simplified or lightweight, since even if we'll probably be the first user of the tool, I don't intend us to be the last!!

patcon · 2025-02-24T22:35:35Z

mind if we keep this as a versioned agora provider for a while longer? What if we revisit in a month?

Before namespacing the true run_clustering, I'd like to (1) have another implementation created to match DemDis' needs, and (2) finish implementing the rest of the Polis algorithms.

An Agora provider also allows me to just favour a "yes" response to your needs, without feeling like I'm committing to larger considerations that maybe I wish for time to consider. Basically, helps us move quicker by letting the interface essentially be owned by you for now.

If you're alright with that, there would be 4 options (in order of personal pref, top to bottom):

rename as run_clustering_v1() method in reddwarf.agora
rename as run_clustering() method in reddward.agora_v1
keep name as run_clustering() method in reddwarf.agora
start versioning the repository at v0.1.0 tag (I don't prefer this, because I think the rest of the repo is essentially in v0 mode, with culture to match -- I prefer not yet committing to every interface being stable without a major version bump, and instead just commit to agora's being so :) )

nicobao · 2025-02-24T22:37:01Z

mind if we keep this as a versioned agora provider for a while longer? What if we revisit in a month?

Before namespacing the true run_clustering, I'd like to (1) have another implementation created to match DemDis' needs, and (2) finish implementing the rest of the Polis algorithms.

Sounds good to me :).

An Agora provider also allows me to just favour a "yes" response to your needs, without feeling like I'm committing to larger considerations that maybe I wish for time to consider. Basically, helps us move quicker by letting the interface essentially be owned by you for now.

I get your point.

If you're alright with that, there would be 4 options (in order of personal pref, top to bottom):

rename as run_clustering_v1() method in reddwarf.agora

rename as run_clustering() method in reddward.agora_v1

keep name as run_clustering() method in reddwarf.agora

start versioning the repository at v0.1.0 tag (I don't prefer this, because I think the rest of the repo is essentially in v0 mode, with culture to match -- I prefer not yet committing to every interface being stable without a major version bump, and instead just commit to agora's being so :) )

Whatever you prefer suits me!

patcon · 2025-02-24T22:51:28Z

Thanks nico! I'll go with (1) run_clustering_v1() 🙏

patcon · 2025-02-25T05:40:47Z

Ready for review @nicobao

nicobao · 2025-02-25T22:51:31Z

reddwarf/agora.py

+    This does the following:
+
+    1. builds a vote matrix (includes as statement with at least 1 participant vote),
+    2. filters out any participants with less than 7 votes,


Can we make this 2. filtering optional via a a config passed in options?
Stock polis provides clustering even with 2 statements and 2 votes on each statements. It's just not displayed in the app. So I think it's should be left to be the discretion of the user.
I understand you may have other requirements, so at least provide this as an option.

Another question: are "PASS" taken into account in the 7 votes? In other words, will someone with 6 pass and one agree be placed in a cluster?

Can we make this 2. filtering optional via a a config passed in options?

Ah it actually is -- it's just awkwardly documented in the ClusteringOptions section of reddwarf.types.agora below. I've posted a comment there to call it out. It will be more obvious in the built docs site, because it's linked.

Stock polis has a more complicated heuristic where any participant that either (1) hits the min vote threshold or (2) votes on all active statements (even if below the that minimum early on), gets "activated" as a user, and then is never taken out.

This leads some edge-cases of Polis where participants who only voted on 4 things are technically in the PCA calculation a month later when there are 50 statements way more participants.

As I recall, there's not strong reason to keep these participants in after they've been included, Team Polis just felt it more consistent to not remove ppl once they're part of the calculation.

Maybe the simpler stateless solution for now is to include people who either (a) vote on 7 statements or (b) vote on all statements (for when there are less than 7 statements).

In the simplest stateless algorithm, that means there's an weird UX edge-case: if one run x has 6 statements and y participants, but then participant y+1 arrives (and votes) and then adds a new statement, now everyone will disappear from the next run x+1, at least until they each come back to vote on that final 7th statement. I think that strange edge-case is why Team Polis tracks the state of "who was in this before".

This all doesn't matter very much if every conversation starts with 7 seed statements, but it smooths out the user experience when allowing a convo that starts with less than 7 statements.

So what's your preference for the Agora implementation? I'm pretty relaxed on what we start with

For our use-case, we currenly focus on being as "inclusive" as possible, even if it means compromising on "purity". Could change down the line if/when we introduce "wikisurvey" a la Polis-the-product.

It must "feel" good and fun. So the more participants are included, and the earliest, the best.

I think this option is very dependent on the library consumer's use-case, so I would leave it entirely at the discretion of the library consumer product.

But I also understand the constraints: if a user hasn't participated much, the "clustering" may not be very precise, or mean a lot.

I'd say, "7 votes needed" feels a bit arbitrary. Instead, I'd say in theory, we should cluster anyone who voted on "enough" "decisive" statements, which is, I understand difficult to measure, and it is evolving over time.

~~So for now I'd like to keep it simple and modular, keeping our options open by... adding these three options:~~

~~number_vote_threshold: your "7" in "filters out any participants with less than 7 votes"~~

~~percentage_vote_threshold: "100%" in "the participant voted on 100% of the statements"~~

condition: possible values: "or" | "and". Corresponds to "or" in: "any participant that either (1) hits the min vote threshold OR (2) votes on {percentage_vote_threshold} active statements gets 'activated' as a user"

~~Then I'd tinker with it in unit tests, and in our own product. And adapt over time with user feedback.~~

~~(Feel free to rename the options as you see fit.)~~

EDIT: actually, even better than these arbitrary three options, we should just allow the users to pass an optional predicate in the options, that is being called by the underlying library to identify whether the participated is activated or not. The predicate could look like so, taking inspiration from filter in Python:

# should return True if the participant is "activated", else returns "False" # if the predicate is not passed as param to `run_clustering`, then all the participants are activated def filter_participants(participant_id: int | str, vote_count: int, vote_coverage_percentage: int) -> bool

something like this (specific API could change, but this is a good start).

If you want, you could later program and expose to library consumers a few pre-made "filter_participants" predicate implementation containing the default config that you think is "relevant" for various use-cases:

stock_polis_filter_participants: Predicate // default existing Polis behavior

def only_vote_threshold(threshold: int = 7) -> Predicate // implementaiton you suggested

...etc

Then it's up to the library consumer to use them or not.

But thanks to this, you can easily provide a sensible "batteries-included" starting point, e.g.:

from reddwarf import only_vote_threshold, run_clustering, stock_polis_filter_participants db = init_db() votes = select_votes(db) options_0 = { filter_participants: only_vote_threshold() # default to 7 } clusters_0 = run_clustering (votes, options_0) options_1 = { filter_participants: only_vote_threshold(6) } clusters_1 = run_clustering (votes, options_1) options_2 = { filter_participants: stock_polis_filter_participants } clusters_2 = run_clustering (votes, options_2) clusters_3 = run_clustering (votes) # no participants are filtered def custom_filter_participants(participant_id: int | str, vote_count: int, vote_coverage_percentage: int) -> bool: if participant_id = 3: return False else: return True custom_filter = custom_filter_participants options_4 = { filter_participants: custom_filter } clusters_4 = run_clustering (votes, options_4)

I like the predicate idea, though am on-the-fence on whether it's the right time to implement just yet.

I'd say, "7 votes needed" feels a bit arbitrary.

Strongly agree. So many of these choices are arbitrary. The 2021 paper even admits:

Participants who voted on fewer than seven comments are removed from the conversation to avoid the “clumping up” of participants around the center of the conversation. This number is somewhat arbitrary but tuned as a hyperparameter based on experience with the domain. We will leave discussion of better metrics and functions for deciding whether to keep or drop a participant to a future paper dealing with missing data.

https://hyp.is/JbNMus5gEe-cQpfc6eVIlg/gwern.net/doc/sociology/2021-small.pdf

Having said this, I'm just now fully realizing how much you're working in a whole other domain with your reddit-style UI. Shared defaults won't always be possible between these domains. So the above point I perhaps started out trying to make (about Polis encoding domain knowledge) is kinda inappropriate.

For example, "domain knowledge" in polis/wikisurveys might say that if someone only votes on 3 of 6 comments in a new Polis convo, it makes total sense to exclude them, because essentially no reasonable participant would do that. That participant's votes are almost certainly junk data -- someone just cruising through and straight-voting agree 3 times to see how it works and whether the statements are interesting, which will have unreasonable influence on early conversation. But in Agora's list, votes on 3 of 6 statements would be very common, and perhaps even the main way participants will show up in the data, given the affordances of the reddit-style UI :)

I've just now realized that we're prob going to need (not real name suggestions) redditlike.run_clustering() and wikisurvey.run_clustering() bc the sensible defaults just aren't the same between them. We have a wealth of recommendation/guardrails for the latter via Polis, but its kinda terra nova for the former.

I would like this library to be highly supportive of someone operating in the wikisurvey scenario (overridable guardrails etc), since that's a very rigorously understood domain, and we already know what holds pretty well.

I wonder if maybe the answer for these two domains is to have two implementatations (eventually predicate-like? or now?). We'd have one for wikisurveys/polis, and another for agora (explicitly named for now). agora.run_clustering() would allow tune-able defaults suiting your domain.

For clarity, going to spec out what's needed for wikisurvey.run_clustering(), though it need not be implemented in this PR:

wikisurvey.run_clustering()

⭐ needs to be added

below the min_user_vote_threshold (default: 7), include any users who's voted on every statement. ⭐

beyond the min_user_vote_threshold, only include users who meet the threshold.

add a keep_participant_ids field to ClusteringOptions that takes a list of participant_ids to be kept irregardless of (1) and (2). ⭐

This allows a sensible default for processing wikisurveys, and if implementers wish to smooth the edge-cases (i.e. not have users jump in and out of the calculation when it's near the threshold), then they can track participants who were included already and pass them into keep_participant_ids to smooth things out. Polis behavior (as I currently understand it) could easily be reproduced with this function definition.

class ClusteringOptions(TypedDict): min_user_vote_threshold: Optional[int] keep_participant_ids: Optional[int] # new max_clusters: Optional[int]

We could later (beyond MVP?) move min_user_vote_threshold and keep_participant_ids into a predicate. (I suspect the eventual predicate will need access to not just participants, but statement data, because it's not totally clear to me whether filtering of statements vs participants should happen first

Back to Agora

To be clear, I love the predicate idea, but it still feels more than MVP here. Can we bank that for later, and add it as we see the shared shape of needs between implementations? For now, just keep agora.run_clustering() as options keys for your specific needs? If you need a predicate, I'm happy to do that now, just always prefer to wait for the second implementation to appear

What does Agora specifically want, nevermind the abstraction of predicates? Is it essentially min_user_vote_threshold=0?

below the min_user_vote_threshold (default: 7), include any users who's voted on every statement.

For now, I'm realizing the current internal utils.filter_matrix() needs the above change at minimum to even make sense, so going to make a PR for that regardless of the needs of agora implementation.

Isn't what we have already? agora.run_clustering() is already distinct from the other one.

Despite its package name (which I already mention I think we should change to something more neutral in the long term, or unify!), I don't think this function should explicitly optimize for any specific product.
The way I see it:
(1): core reddwarf clustering library, configurable with low-level stuff => used by (2): specialized-wikisurvey-library-that-configure-the-core-library-with-sensible-default-such-as-filtering-participants-or-statements => used by (3): visualization library => ...etc

run_clustering is part of (1), so product-specific stuff shouldn't be present. Product-specific stuff should be in (2), in other words, in how (1) is configured before being run.

What does Agora specifically want, nevermind the abstraction of predicates? Is it essentially min_user_vote_threshold=0?

=> The short answer is I don't know. I would need to do some testing. Instinctively, with how it works now, I think I wouldn't put any restrictions on participants whatsover.

If you do add some options, I think it's actually more work to do it with multiple params than to do it via a predicate. And the added benefit of predicate is allowing library users to experiment on their own and provide you with feedback.

Afaik, filter_statements is unnecessary because we don't have the incremental API yet. So the library consumer will know already what statement they don't want to use for clustering (moderation, etc), and filter them out before passing it to the library votes param.

I thought filter_participants was more than just blindly ignoring the said participants. I thought it was meant to remove participants to get "activated" (so more like filter_activated_participants), which doesn't mean they are not used at all? If inactivated participants aren't used at all, then since the library consumer knows all the data already, it's straightforward for the library consumer to simply filter the participants in the votes input of run_clustering however they see fit, and filter_participants is unnecessary.

On a side note, trying to confine product use cases into a box (reddit-like vs wikisurvey-like) on the lowest-level library is a slippery slope. Right now, we're discussing Polis and Agora, but the reality is we can’t predict how library consumers will use this. There may be countless use cases beyond what we can imagine, so it's crucial to keep our mantra open. This means avoiding arbitrary decisions whenever possible for the lowest-level functions and instead prioritizing maximum flexibility.

Need to digest, but reticketed the naming conversation here (sorry to miss out on responding earlier): #10

reddwarf/agora.py

nicobao · 2025-02-25T22:57:47Z

As of right now, I'm thinking that more unit test coverage is the bulk of the remaining work before I'd feel good about merging this (obviously pending your feedback!)

For the test coverage, what do you have in mind?

patcon · 2025-02-27T07:43:01Z

For the test coverage, what do you have in mind?

ah, wasn't the feedback to leave this for another PR? Re: #7 (comment)

nicobao · 2025-02-27T10:44:55Z

For the test coverage, what do you have in mind?

ah, wasn't the feedback to leave this for another PR? Re: #7 (comment)

Sure 💯.
I was asking how you plan to test this in the next PRs. We can merge this as long as participant/statement filtering is absent or remains optional in some form, and continue the conversation (#7 (comment)) about the API details in a follow-up PR.

Do you think I can already replace stock polis with this library in Agora? I'd like to transition as soon as possible so:

I can remove inefficient infrastructure.
I can enable vote/statement editing and deletion in Agora.
I can provide you with real-world user feedback.

patcon · 2025-02-27T23:12:50Z

We can merge this as long as participant/statement filtering is absent or remains optional in some form

It sounds like you just want to set min_user_vote_threshold to 0 or to the conversation's participant count until it hits 7, and if that's the case, then it will work fine

and continue the conversation (#7 (comment)) about the API details in a follow-up PR.

I hope to not draw us into this approach too much in the future, but I'd take you up on just merging and refining! (lengthy PRs are a slog)

Do you think I can already replace stock polis with this library in Agora?

If everything in this comment feels correct, then yes!

If you approve this PR, I'll quickly draft an issue to cut a release, so you can pin things 🎉

nicobao · 2025-02-27T23:21:20Z

We can merge this as long as participant/statement filtering is absent or remains optional in some form

It sounds like you just want to set min_user_vote_threshold to 0 or to the conversation's participant count until it hits 7, and if that's the case, then it will work fine

and continue the conversation (#7 (comment)) about the API details in a follow-up PR.

I hope to not draw us into this approach too much in the future, but I'd take you up on just merging and refining! (lengthy PRs are a slog)

Do you think I can already replace stock polis with this library in Agora?

If everything in this comment feels correct, then yes!

If you approve this PR, I'll quickly draft an issue to cut a release, so you can pin things 🎉

Nice, merged this then.

Feel free to merge yourself too!

patcon added 10 commits February 22, 2025 01:10

Added types that Agora suggested, basically as originally stated.

3faf398

Fixed typo.

5d559b8

Making changes to agora types.

45b7342

Removed demdis types from this branch.

a8aab4a

Update agora types.

0bd10bc

Added run_clustering method suitable to agora specs.

344e161

Added agora debug task.

a5ed338

Added max_clusters to ClusteringOptions.

80f24c9

Use IntEnum for VoteValueEnum.

7b459d7

Updated utils method to filter_matrix.

468e8a1

patcon marked this pull request as draft February 22, 2025 09:13

patcon commented Feb 22, 2025

View reviewed changes

reddwarf/types/agora.py Outdated Show resolved Hide resolved

nicobao marked this pull request as ready for review February 22, 2025 20:35

patcon commented Feb 23, 2025

View reviewed changes

reddwarf/types/agora.py Show resolved Hide resolved

nicobao reviewed Feb 24, 2025

View reviewed changes

reddwarf/types/agora.py Outdated Show resolved Hide resolved

patcon added 3 commits February 24, 2025 19:52

Merge branch 'main' into simple-agora-clustering

00c0340

Use cluster_id var instead of cluster_label.

84f81d6

Updated agora types to reflect that str on int identifiers will work.

d10771b

Added documentation for reddwarf.agora.run_clustering_v1.

0df8409

patcon requested a review from nicobao February 25, 2025 05:33

nicobao requested changes Feb 25, 2025

View reviewed changes

patcon mentioned this pull request Feb 27, 2025

Rename project/package #10

Open

12 tasks

nicobao merged commit 67d9533 into main Feb 27, 2025
2 checks passed

This was referenced Feb 27, 2025

Cut a first dev release #12

Closed

Figure out sustainable way to do Polis-filtering (without being too opinionated) #13

Open

patcon deleted the simple-agora-clustering branch March 17, 2025 05:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified clustering for Agora use-case #7

Simplified clustering for Agora use-case #7

patcon commented Feb 22, 2025 •

edited

Loading

patcon commented Feb 22, 2025

nicobao commented Feb 22, 2025

patcon commented Feb 22, 2025

patcon commented Feb 23, 2025 •

edited

Loading

nicobao commented Feb 24, 2025 •

edited

Loading

Todos

nicobao commented Feb 24, 2025 •

edited

Loading

patcon commented Feb 24, 2025

patcon commented Feb 24, 2025

nicobao commented Feb 24, 2025

patcon commented Feb 24, 2025 •

edited

Loading

nicobao commented Feb 24, 2025 •

edited

Loading

patcon commented Feb 24, 2025

patcon commented Feb 25, 2025

nicobao Feb 25, 2025

nicobao Feb 25, 2025

patcon Feb 26, 2025

patcon Feb 26, 2025

nicobao Feb 26, 2025 •

edited

Loading

patcon Feb 26, 2025

patcon Feb 26, 2025 •

edited

Loading

nicobao Feb 27, 2025 •

edited

Loading

nicobao Feb 27, 2025 •

edited

Loading

patcon Feb 27, 2025

nicobao commented Feb 25, 2025

patcon commented Feb 27, 2025

nicobao commented Feb 27, 2025 •

edited

Loading

patcon commented Feb 27, 2025 •

edited

Loading

nicobao commented Feb 27, 2025

Simplified clustering for Agora use-case #7

Simplified clustering for Agora use-case #7

Conversation

patcon commented Feb 22, 2025 • edited Loading

Todos

patcon commented Feb 22, 2025

nicobao commented Feb 22, 2025

patcon commented Feb 22, 2025

patcon commented Feb 23, 2025 • edited Loading

nicobao commented Feb 24, 2025 • edited Loading

Todos

nicobao commented Feb 24, 2025 • edited Loading

patcon commented Feb 24, 2025

patcon commented Feb 24, 2025

nicobao commented Feb 24, 2025

patcon commented Feb 24, 2025 • edited Loading

nicobao commented Feb 24, 2025 • edited Loading

patcon commented Feb 24, 2025

patcon commented Feb 25, 2025

nicobao Feb 25, 2025

Choose a reason for hiding this comment

nicobao Feb 25, 2025

Choose a reason for hiding this comment

patcon Feb 26, 2025

Choose a reason for hiding this comment

patcon Feb 26, 2025

Choose a reason for hiding this comment

nicobao Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

patcon Feb 26, 2025

Choose a reason for hiding this comment

wikisurvey.run_clustering()

Back to Agora

patcon Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

nicobao Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

nicobao Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

patcon Feb 27, 2025

Choose a reason for hiding this comment

nicobao commented Feb 25, 2025

patcon commented Feb 27, 2025

nicobao commented Feb 27, 2025 • edited Loading

patcon commented Feb 27, 2025 • edited Loading

nicobao commented Feb 27, 2025

patcon commented Feb 22, 2025 •

edited

Loading

patcon commented Feb 23, 2025 •

edited

Loading

nicobao commented Feb 24, 2025 •

edited

Loading

nicobao commented Feb 24, 2025 •

edited

Loading

patcon commented Feb 24, 2025 •

edited

Loading

nicobao commented Feb 24, 2025 •

edited

Loading

nicobao Feb 26, 2025 •

edited

Loading

`wikisurvey.run_clustering()`

patcon Feb 26, 2025 •

edited

Loading

nicobao Feb 27, 2025 •

edited

Loading

nicobao Feb 27, 2025 •

edited

Loading

nicobao commented Feb 27, 2025 •

edited

Loading

patcon commented Feb 27, 2025 •

edited

Loading