Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified clustering for Agora use-case #7

Merged
merged 14 commits into from
Feb 27, 2025
Merged

Conversation

patcon
Copy link
Member

@patcon patcon commented Feb 22, 2025

Addresses #6

cc: @nicobao

Things to note:

  1. differences from the provided spec in feat: create a minimal clustering library, tested with unit tests based on real conversations #6
    1. run_clustering takes a single Conversation instead of List[Conversation]. this allows calculations on a more minimal interface that doesn't need anything beyond a list of votes. running the method for each conversation seemed reasonable to leave to the implementer. (and there is no performance benefit to batching)
    2. statement is the term I prefer over opinion, because in theory these algorithms can be run on metadata and factual beliefs.
    3. Conversation. votes_by_participants and votes_by_opinion are less ideal keys on Conversation object than simply votes. If users arriving with data of that shape, we can provide helpers to deconstruct back into vote records.
    4. re: projection in Cluster. I prefer the demdis approach of treating output ClusteredParticipant (with projection coords attached to returned participant) as different from any input Participant. My reasoning for this is because participants can't even be grouped without projected coords, so projected data are basic and should be returned on the participant, not a separate key. (The way polis deconstructs all data into separate keys is not something I'd like to emulate, as it's complicated to make sense of when interpreting the data -- counting length of arrays etc.)
    5. ClusteringResult object instead of Clusters leaves more room for expanded metadata
    6. for now, opting for participants over members (less disjoint terminology seems better)
  2. misc other details
    1. this method only uses the stateless methods from reddwarf.utils.
    2. all methods in reddwarf.utils are fully documented on docs website
    3. unit test cases are partially written for these reddwarf.utils methods (see "Todos" section)
    4. unit tests are now run on each commit

Todos

  • generate unit tests for all reddwarf.utils methods used
    • generate_raw_matrix()
    • get_unvotes_statement_ids()
    • filter_matrix()
    • run_pca()
    • run_kmeans() #wontdo
    • find_optimal_k() #wontdo
    • scale_projected_data() #wontdo
  • run code coverage report for any PRs #wontdo
    • ensure full coverage for all reddwarf.utils methods #wontdo
  • allow IDs to be int or str d10771b
    • added unit tests to confirm works 00c0340
  • renamed cluster_label to cluster_id 84f81d6
  • documented reddwarf.agora.run_clustering_v1() and reddwarf.types.agora (run make docs-serve to view) 0df8409

@patcon patcon marked this pull request as draft February 22, 2025 09:13
@patcon
Copy link
Member Author

patcon commented Feb 22, 2025

As of right now, I'm thinking that more unit test coverage is the bulk of the remaining work before I'd feel good about merging this (obviously pending your feedback!)

@nicobao nicobao marked this pull request as ready for review February 22, 2025 20:35
@nicobao
Copy link
Member

nicobao commented Feb 22, 2025

Thank you I will take a look soon!

(Sorry I clicked ready for review by mistake)

@patcon
Copy link
Member Author

patcon commented Feb 22, 2025

Would also like to add code coverage repos in PRs, and for this to show full coverage of reddwarf.utils :) will add above for todos

@patcon
Copy link
Member Author

patcon commented Feb 23, 2025

Oh! One more thing I'm thinking on.

I swapped from Enum with str for agree/disagree/pass to IntEnum for now, just because the internal algos expect -1, 0, 1 for votes, and the missing values being nan/None. Implementers can use any strings or numbers they want, but they'd need to transform their vote data to pass in data like this:

from enum import IntEnum

class VoteValueEnum(IntEnum):
    UP: 1
    NEUTRAL: 0
    DOWN: -1

We can keep the internal language neutral, and people can transform whatever data they like into votes in this format. For example:

## DEMDIS votes could map like this before passing in

# See: https://github.com/Demdis/Clustering-types/blob/main/types.py
class DemdisVoteValueEnum(str):
    AGREE = "agree"
    DISAGREE = "disagree"
    SKIP = "skip"

DEMDIS_VOTE_MAPPING = {
    DemdisVoteValueEnum.AGREE: VoteValueEnum.UP,
    DemdisVoteValueEnum.SKIP: VoteValueEnum.NEUTRAL,
    DemdisVoteValueEnum.DISAGREE: VoteValueEnum.DOWN,
}
votes_from_demdis = [{k: DEMDIS_VOTE_MAPPING[val] if k == "vote" else val for k, val in v.items()} for v in votes]

## AGORA could map like this

class AgoraVoteEnum(Enum):
    AGREE = "agree"
    DISAGREE = "disagree"

AGORA_VOTE_MAPPING = {
    AgoraVoteEnum.AGREE: VoteValueEnum.UP,
    AgoraVoteEnum.DISAGREE: VoteValueEnum.DOWN,
}
votes_from_agora = [{k: AGORA_VOTE_MAPPING[val] if k == "vote" else val for k, val in v.items()} for v in votes]

## AGORA could test Polis data without pass values like this

SIMULATED_AGORA_VOTE_MAPPING = {
    VoteValueEnum.UP: VoteValueEnum.UP,
    VoteValueEnum.NEUTRAL: None,
    VoteValueEnum.DOWN: VoteValueEnum.DOWN,
}
votes_from_polis_agora = [{k: SIMULATED_AGORA_VOTE_MAPPING[val] if k == "vote" else val for k, val in v.items()} for v in votes]

## Someone could map LIKERT scale data to the algos one of these ways
# (like was done for JapanChoice.jp/polis recently)

class LikertScale(IntEnum):
    STRONGLY_AGREE: 1
    AGREE: 2
    NEUTRAL: 3
    DISAGREE: 4
    STRONGLY_DISAGREE: 5

# One option
LIKERT_LIBERAL_MAPPING = {
    LikertScale.STRONGLY_AGREE:    VoteValueEnum.UP,
    LikertScale.AGREE:             VoteValueEnum.UP,
    LikertScale.NEUTRAL:           VoteValueEnum.NEUTRAL,
    LikertScale.DISAGREE:          VoteValueEnum.DOWN,
    LikertScale.STRONGLY_DISAGREE: VoteValueEnum.DOWN,
}

# Another option
LIKERT_CONSERVATIVE_MAPPING = {
    LikertScale.STRONGLY_AGREE:    VoteValueEnum.UP,
    LikertScale.AGREE:             VoteValueEnum.NEUTRAL,
    LikertScale.NEUTRAL:           VoteValueEnum.NEUTRAL,
    LikertScale.DISAGREE:          VoteValueEnum.NEUTRAL,
    LikertScale.STRONGLY_DISAGREE: VoteValueEnum.DOWN,
}
votes_from_likert = [{k: LIKERT_CONSERVATIVE_MAPPING[val] if k == "vote" else val for k, val in v.items()} for v in votes]

@nicobao
Copy link
Member

nicobao commented Feb 24, 2025

Addresses #6

cc: @nicobao

Things to note:

  1. differences from the provided spec in feat: create a minimal clustering library, tested with unit tests based on real conversations #6

    1. run_clustering takes a single Conversation instead of List[Conversation]. this allows calculations on a more minimal interface that doesn't need anything beyond a list of votes. running the method for each conversation seemed reasonable to leave to the implementer. (and there is no performance benefit to batching)

Makes sense!

  1. statement is the term I prefer over opinion, because in theory these algorithms can be run on metadata and factual beliefs.

Makes sense!

  1. Conversation. votes_by_participants and votes_by_opinion are less ideal keys on Conversation object than simply votes. If users arriving with data of that shape, we can provide helpers to deconstruct back into vote records.

Makes sense. List[Vote] is sufficient.

  1. re: projection in Cluster. I prefer the demdis approach of treating output ClusteredParticipant (with projection coords attached to returned participant) as different from any input Participant. My reasoning for this is because participants can't even be grouped without projected coords, so projected data are basic and should be returned on the participant, not a separate key. (The way polis deconstructs all data into separate keys is not something I'd like to emulate, as it's complicated to make sense of when interpreting the data -- counting length of arrays etc.)

Agreed.

  1. ClusteringResult object instead of Clusters leaves more room for expanded metadata

Up to you. Not sure what ClusteringResult[0].label means though, since it's just a random value, not a human-understandable label, right? I would call it id? Do we need it at all? What plan do you have for it? Following the same cluster id through multiple iteration of a conversation?

  1. for now, opting for participants over members (less disjoint terminology seems better)

Up to you!

  1. misc other details

    1. this method only uses the stateless methods from reddwarf.utils.

👍

  1. all methods in reddwarf.utils are fully documented on docs website

👍

  1. unit test cases are partially written for these reddwarf.utils methods (see "Todos" section)

👍

  1. unit tests are now run on each commit

👍

Todos

  • generate unit tests for all reddwarf.utils methods used

    • generate_raw_matrix()
    • get_unvotes_statement_ids()
    • filter_matrix()
    • run_pca()
    • run_kmeans()
    • find_optimal_k()
    • scale_projected_data()
  • run code coverage report for any PRs

    • ensure full coverage for all reddwarf.utils methods

Amazing work! Is it already usable?
I'm leaving few comments. The most important blocking issue for us is the type of the expected participant_id and statement_id. It is too restrictive. We use strings! So I think it should be string | number at least. That would avoid unnecessary headaches for the library user!

@nicobao
Copy link
Member

nicobao commented Feb 24, 2025

As of right now, I'm thinking that more unit test coverage is the bulk of the remaining work before I'd feel good about merging this (obviously pending your feedback!)

yeah go ahead for merging I think you can work on a subsequent PR for unit tests later! :).
Just the string | number thing, I would say I would like to discuss it!

@patcon
Copy link
Member Author

patcon commented Feb 24, 2025

ClusteringResult object instead of Clusters leaves more room for expanded metadata

Up to you. Not sure what ClusteringResult[0].label means though, since it's just a random value, not a human-understandable label, right? I would call it id? Do we need it at all? What plan do you have for it? Following the same cluster id through multiple iteration of a conversation?

You're right, ClusteringResult[0].id is better. label was just me leaking the term from data returned by sklearn's kmeans method, but on second thought label implies too much, so id is better). It's there so that being passed a cluster object (without its list index in the clusters key) still allows orienting and processing.

Amazing work! Is it already usable?

It should be, though reddwarf (and I believe polis) rely on assumption of [incrementing?] numeric statement_id and participant_id for now. Shouldn't be hard to fix though, so I'll take a look.

I agree that strings are ideal, so this makes total sense.

@patcon
Copy link
Member Author

patcon commented Feb 24, 2025

yeah go ahead for merging I think you can work on a subsequent PR for unit tests later! :).
Just the string | number thing, I would say I would like to discuss it!

I'll just write unit tests for getting int | str working, and leave the remaining ones for another PR.

(To reassure anyone watching: I have been way out of character for my normal development style by pushing to mainline so far, but once i go into PR mode, it's easy for me to commit to it)

@nicobao
Copy link
Member

nicobao commented Feb 24, 2025

Amazing!
Oh something else I forgot, maybe it's worth renaming the module from agora to simplified or lightweight, since even if we'll probably be the first user of the tool, I don't intend us to be the last!!

@patcon
Copy link
Member Author

patcon commented Feb 24, 2025

mind if we keep this as a versioned agora provider for a while longer? What if we revisit in a month?

Before namespacing the true run_clustering, I'd like to (1) have another implementation created to match DemDis' needs, and (2) finish implementing the rest of the Polis algorithms.

An Agora provider also allows me to just favour a "yes" response to your needs, without feeling like I'm committing to larger considerations that maybe I wish for time to consider. Basically, helps us move quicker by letting the interface essentially be owned by you for now.

If you're alright with that, there would be 4 options (in order of personal pref, top to bottom):

  1. rename as run_clustering_v1() method in reddwarf.agora
  2. rename as run_clustering() method in reddward.agora_v1
  3. keep name as run_clustering() method in reddwarf.agora
  4. start versioning the repository at v0.1.0 tag (I don't prefer this, because I think the rest of the repo is essentially in v0 mode, with culture to match -- I prefer not yet committing to every interface being stable without a major version bump, and instead just commit to agora's being so :) )

@nicobao
Copy link
Member

nicobao commented Feb 24, 2025

mind if we keep this as a versioned agora provider for a while longer? What if we revisit in a month?

Before namespacing the true run_clustering, I'd like to (1) have another implementation created to match DemDis' needs, and (2) finish implementing the rest of the Polis algorithms.

Sounds good to me :).

An Agora provider also allows me to just favour a "yes" response to your needs, without feeling like I'm committing to larger considerations that maybe I wish for time to consider. Basically, helps us move quicker by letting the interface essentially be owned by you for now.

I get your point.

If you're alright with that, there would be 4 options (in order of personal pref, top to bottom):

  1. rename as run_clustering_v1() method in reddwarf.agora
  2. rename as run_clustering() method in reddward.agora_v1
  3. keep name as run_clustering() method in reddwarf.agora
  4. start versioning the repository at v0.1.0 tag (I don't prefer this, because I think the rest of the repo is essentially in v0 mode, with culture to match -- I prefer not yet committing to every interface being stable without a major version bump, and instead just commit to agora's being so :) )

Whatever you prefer suits me!

@patcon
Copy link
Member Author

patcon commented Feb 24, 2025

Thanks nico! I'll go with (1) run_clustering_v1() 🙏

@patcon patcon requested a review from nicobao February 25, 2025 05:33
@patcon
Copy link
Member Author

patcon commented Feb 25, 2025

Ready for review @nicobao

This does the following:

1. builds a vote matrix (includes as statement with at least 1 participant vote),
2. filters out any participants with less than 7 votes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this 2. filtering optional via a a config passed in options?
Stock polis provides clustering even with 2 statements and 2 votes on each statements. It's just not displayed in the app. So I think it's should be left to be the discretion of the user.
I understand you may have other requirements, so at least provide this as an option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question: are "PASS" taken into account in the 7 votes? In other words, will someone with 6 pass and one agree be placed in a cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this 2. filtering optional via a a config passed in options?

Ah it actually is -- it's just awkwardly documented in the ClusteringOptions section of reddwarf.types.agora below. I've posted a comment there to call it out. It will be more obvious in the built docs site, because it's linked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stock polis has a more complicated heuristic where any participant that either (1) hits the min vote threshold or (2) votes on all active statements (even if below the that minimum early on), gets "activated" as a user, and then is never taken out.

This leads some edge-cases of Polis where participants who only voted on 4 things are technically in the PCA calculation a month later when there are 50 statements way more participants.

As I recall, there's not strong reason to keep these participants in after they've been included, Team Polis just felt it more consistent to not remove ppl once they're part of the calculation.

Maybe the simpler stateless solution for now is to include people who either (a) vote on 7 statements or (b) vote on all statements (for when there are less than 7 statements).

In the simplest stateless algorithm, that means there's an weird UX edge-case: if one run x has 6 statements and y participants, but then participant y+1 arrives (and votes) and then adds a new statement, now everyone will disappear from the next run x+1, at least until they each come back to vote on that final 7th statement. I think that strange edge-case is why Team Polis tracks the state of "who was in this before".

This all doesn't matter very much if every conversation starts with 7 seed statements, but it smooths out the user experience when allowing a convo that starts with less than 7 statements.

So what's your preference for the Agora implementation? I'm pretty relaxed on what we start with

Copy link
Member

@nicobao nicobao Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our use-case, we currenly focus on being as "inclusive" as possible, even if it means compromising on "purity". Could change down the line if/when we introduce "wikisurvey" a la Polis-the-product.

It must "feel" good and fun. So the more participants are included, and the earliest, the best.

I think this option is very dependent on the library consumer's use-case, so I would leave it entirely at the discretion of the library consumer product.

But I also understand the constraints: if a user hasn't participated much, the "clustering" may not be very precise, or mean a lot.

I'd say, "7 votes needed" feels a bit arbitrary. Instead, I'd say in theory, we should cluster anyone who voted on "enough" "decisive" statements, which is, I understand difficult to measure, and it is evolving over time.

So for now I'd like to keep it simple and modular, keeping our options open by... adding these three options:

  • number_vote_threshold: your "7" in "filters out any participants with less than 7 votes"
  • percentage_vote_threshold: "100%" in "the participant voted on 100% of the statements"
  • condition: possible values: "or" | "and". Corresponds to "or" in: "any participant that either (1) hits the min vote threshold OR (2) votes on {percentage_vote_threshold} active statements gets 'activated' as a user"

Then I'd tinker with it in unit tests, and in our own product. And adapt over time with user feedback.

(Feel free to rename the options as you see fit.)

EDIT: actually, even better than these arbitrary three options, we should just allow the users to pass an optional predicate in the options, that is being called by the underlying library to identify whether the participated is activated or not. The predicate could look like so, taking inspiration from filter in Python:

# should return True if the participant is "activated", else returns "False"
# if the predicate is not passed as param to `run_clustering`, then all the participants are activated
def filter_participants(participant_id: int | str, vote_count: int, vote_coverage_percentage: int) -> bool

something like this (specific API could change, but this is a good start).

If you want, you could later program and expose to library consumers a few pre-made "filter_participants" predicate implementation containing the default config that you think is "relevant" for various use-cases:

  • stock_polis_filter_participants: Predicate // default existing Polis behavior
  • def only_vote_threshold(threshold: int = 7) -> Predicate // implementaiton you suggested
  • ...etc

Then it's up to the library consumer to use them or not.

But thanks to this, you can easily provide a sensible "batteries-included" starting point, e.g.:

from reddwarf import only_vote_threshold, run_clustering, stock_polis_filter_participants

db = init_db()
votes = select_votes(db)

options_0 = {
    filter_participants: only_vote_threshold() # default to 7
}
clusters_0 = run_clustering (votes, options_0)

options_1 = {
    filter_participants: only_vote_threshold(6)
}
clusters_1 = run_clustering (votes, options_1)


options_2 = {
    filter_participants: stock_polis_filter_participants
}
clusters_2 = run_clustering (votes, options_2)


clusters_3 = run_clustering (votes) # no participants are filtered


def custom_filter_participants(participant_id: int | str, vote_count: int, vote_coverage_percentage: int) -> bool:
    if participant_id = 3:
       return False
    else:
       return True

custom_filter = custom_filter_participants

options_4 = {
    filter_participants: custom_filter  
}
clusters_4 = run_clustering (votes, options_4)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the predicate idea, though am on-the-fence on whether it's the right time to implement just yet.

I'd say, "7 votes needed" feels a bit arbitrary.

Strongly agree. So many of these choices are arbitrary. The 2021 paper even admits:

Participants who voted on fewer than seven comments are removed from the conversation to avoid the “clumping up” of participants around the center of the conversation. This number is somewhat arbitrary but tuned as a hyperparameter based on experience with the domain. We will leave discussion of better metrics and functions for deciding whether to keep or drop a participant to a future paper dealing with missing data.

https://hyp.is/JbNMus5gEe-cQpfc6eVIlg/gwern.net/doc/sociology/2021-small.pdf

Having said this, I'm just now fully realizing how much you're working in a whole other domain with your reddit-style UI. Shared defaults won't always be possible between these domains. So the above point I perhaps started out trying to make (about Polis encoding domain knowledge) is kinda inappropriate.

For example, "domain knowledge" in polis/wikisurveys might say that if someone only votes on 3 of 6 comments in a new Polis convo, it makes total sense to exclude them, because essentially no reasonable participant would do that. That participant's votes are almost certainly junk data -- someone just cruising through and straight-voting agree 3 times to see how it works and whether the statements are interesting, which will have unreasonable influence on early conversation. But in Agora's list, votes on 3 of 6 statements would be very common, and perhaps even the main way participants will show up in the data, given the affordances of the reddit-style UI :)

I've just now realized that we're prob going to need (not real name suggestions) redditlike.run_clustering() and wikisurvey.run_clustering() bc the sensible defaults just aren't the same between them. We have a wealth of recommendation/guardrails for the latter via Polis, but its kinda terra nova for the former.

I would like this library to be highly supportive of someone operating in the wikisurvey scenario (overridable guardrails etc), since that's a very rigorously understood domain, and we already know what holds pretty well.

I wonder if maybe the answer for these two domains is to have two implementatations (eventually predicate-like? or now?). We'd have one for wikisurveys/polis, and another for agora (explicitly named for now). agora.run_clustering() would allow tune-able defaults suiting your domain.

For clarity, going to spec out what's needed for wikisurvey.run_clustering(), though it need not be implemented in this PR:

wikisurvey.run_clustering()

⭐ needs to be added

  1. below the min_user_vote_threshold (default: 7), include any users who's voted on every statement. ⭐
  2. beyond the min_user_vote_threshold, only include users who meet the threshold.
  3. add a keep_participant_ids field to ClusteringOptions that takes a list of participant_ids to be kept irregardless of (1) and (2). ⭐

This allows a sensible default for processing wikisurveys, and if implementers wish to smooth the edge-cases (i.e. not have users jump in and out of the calculation when it's near the threshold), then they can track participants who were included already and pass them into keep_participant_ids to smooth things out. Polis behavior (as I currently understand it) could easily be reproduced with this function definition.

class ClusteringOptions(TypedDict):
    min_user_vote_threshold: Optional[int]
    keep_participant_ids: Optional[int] # new
    max_clusters: Optional[int]

We could later (beyond MVP?) move min_user_vote_threshold and keep_participant_ids into a predicate. (I suspect the eventual predicate will need access to not just participants, but statement data, because it's not totally clear to me whether filtering of statements vs participants should happen first

Back to Agora

To be clear, I love the predicate idea, but it still feels more than MVP here. Can we bank that for later, and add it as we see the shared shape of needs between implementations? For now, just keep agora.run_clustering() as options keys for your specific needs? If you need a predicate, I'm happy to do that now, just always prefer to wait for the second implementation to appear

What does Agora specifically want, nevermind the abstraction of predicates? Is it essentially min_user_vote_threshold=0?

Copy link
Member Author

@patcon patcon Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. below the min_user_vote_threshold (default: 7), include any users who's voted on every statement.

For now, I'm realizing the current internal utils.filter_matrix() needs the above change at minimum to even make sense, so going to make a PR for that regardless of the needs of agora implementation.

Copy link
Member

@nicobao nicobao Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't what we have already? agora.run_clustering() is already distinct from the other one.

Despite its package name (which I already mention I think we should change to something more neutral in the long term, or unify!), I don't think this function should explicitly optimize for any specific product.
The way I see it:
(1): core reddwarf clustering library, configurable with low-level stuff => used by (2): specialized-wikisurvey-library-that-configure-the-core-library-with-sensible-default-such-as-filtering-participants-or-statements => used by (3): visualization library => ...etc

run_clustering is part of (1), so product-specific stuff shouldn't be present. Product-specific stuff should be in (2), in other words, in how (1) is configured before being run.

What does Agora specifically want, nevermind the abstraction of predicates? Is it essentially min_user_vote_threshold=0?

=> The short answer is I don't know. I would need to do some testing. Instinctively, with how it works now, I think I wouldn't put any restrictions on participants whatsover.

If you do add some options, I think it's actually more work to do it with multiple params than to do it via a predicate. And the added benefit of predicate is allowing library users to experiment on their own and provide you with feedback.

Afaik, filter_statements is unnecessary because we don't have the incremental API yet. So the library consumer will know already what statement they don't want to use for clustering (moderation, etc), and filter them out before passing it to the library votes param.

I thought filter_participants was more than just blindly ignoring the said participants. I thought it was meant to remove participants to get "activated" (so more like filter_activated_participants), which doesn't mean they are not used at all? If inactivated participants aren't used at all, then since the library consumer knows all the data already, it's straightforward for the library consumer to simply filter the participants in the votes input of run_clustering however they see fit, and filter_participants is unnecessary.

Copy link
Member

@nicobao nicobao Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a side note, trying to confine product use cases into a box (reddit-like vs wikisurvey-like) on the lowest-level library is a slippery slope. Right now, we're discussing Polis and Agora, but the reality is we can’t predict how library consumers will use this. There may be countless use cases beyond what we can imagine, so it's crucial to keep our mantra open. This means avoiding arbitrary decisions whenever possible for the lowest-level functions and instead prioritizing maximum flexibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to digest, but reticketed the naming conversation here (sorry to miss out on responding earlier): #10

@nicobao
Copy link
Member

nicobao commented Feb 25, 2025

As of right now, I'm thinking that more unit test coverage is the bulk of the remaining work before I'd feel good about merging this (obviously pending your feedback!)

For the test coverage, what do you have in mind?

@patcon
Copy link
Member Author

patcon commented Feb 27, 2025

For the test coverage, what do you have in mind?

ah, wasn't the feedback to leave this for another PR? Re: #7 (comment)

@nicobao
Copy link
Member

nicobao commented Feb 27, 2025

For the test coverage, what do you have in mind?

ah, wasn't the feedback to leave this for another PR? Re: #7 (comment)

Sure 💯.
I was asking how you plan to test this in the next PRs. We can merge this as long as participant/statement filtering is absent or remains optional in some form, and continue the conversation (#7 (comment)) about the API details in a follow-up PR.

Do you think I can already replace stock polis with this library in Agora? I'd like to transition as soon as possible so:

  • I can remove inefficient infrastructure.
  • I can enable vote/statement editing and deletion in Agora.
  • I can provide you with real-world user feedback.

@patcon patcon mentioned this pull request Feb 27, 2025
12 tasks
@patcon
Copy link
Member Author

patcon commented Feb 27, 2025

We can merge this as long as participant/statement filtering is absent or remains optional in some form

It sounds like you just want to set min_user_vote_threshold to 0 or to the conversation's participant count until it hits 7, and if that's the case, then it will work fine

and continue the conversation (#7 (comment)) about the API details in a follow-up PR.

I hope to not draw us into this approach too much in the future, but I'd take you up on just merging and refining! (lengthy PRs are a slog)

Do you think I can already replace stock polis with this library in Agora?

If everything in this comment feels correct, then yes!

If you approve this PR, I'll quickly draft an issue to cut a release, so you can pin things 🎉

@nicobao nicobao merged commit 67d9533 into main Feb 27, 2025
2 checks passed
@nicobao
Copy link
Member

nicobao commented Feb 27, 2025

We can merge this as long as participant/statement filtering is absent or remains optional in some form

It sounds like you just want to set min_user_vote_threshold to 0 or to the conversation's participant count until it hits 7, and if that's the case, then it will work fine

and continue the conversation (#7 (comment)) about the API details in a follow-up PR.

I hope to not draw us into this approach too much in the future, but I'd take you up on just merging and refining! (lengthy PRs are a slog)

Do you think I can already replace stock polis with this library in Agora?

If everything in this comment feels correct, then yes!

If you approve this PR, I'll quickly draft an issue to cut a release, so you can pin things 🎉

Nice, merged this then.

Feel free to merge yourself too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants