Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(polis): add AI labels and AI summaries #157

Open
nicobao opened this issue Feb 12, 2025 · 7 comments
Open

feat(polis): add AI labels and AI summaries #157

nicobao opened this issue Feb 12, 2025 · 7 comments

Comments

@nicobao
Copy link
Member

nicobao commented Feb 12, 2025

Expected LLM "prompts":

  1. A first "configuration" prompt to explain the task to the LLM. Upon reception of this prompt, The LLM should wait for JSON files inputed as prompt, and systematically respond with a JSON file in the expected format (the LLM is therefore essentially acting as a JSON HTTP API). In this prompt, we should explain the expected input and output format to the LLM.
  2. the input JSON file format look like:
    • conversation_slug_id
    • conversation title
    • conversation body (optional)
    • the number of total participants to the conversation
    • the list of core majority opinions in total (>50 % of the total number of participant to this conversation agree on this opinion OR >50 % of the total number of participant to this conversation disagree on this opinion)
    • the list of core controversial opinions for the whole conversation (~equal amount of members of this conversation agree and disagree, while more than 50% of participants to the whole conversation have voted on this given opinion)
    • a list of "cluster" object, which contain:
      • the "key" attribute for the cluster ("0", "1", "2", "3", "4" or "5"),
      • the number of members belonging to this cluster (subset of the number of total participants)
      • the list of core majority opinions for the cluster (> 50% of the members of this cluster agree OR > 50% of the members of this cluster disagree)
      • the list of core controversial opinions for the cluster (~equal amount of members of this cluster agree and disagree, and more than 50% of members of this cluster have voted on this given opinion).
    • Example input:
{
    "conversation_slug_id": "conversation_12345",
    "conversation_title": "The Future of Remote Work",
    "conversation_body": "A discussion about the long-term impact of remote work on productivity, work-life balance, and the economy.",
    "num_participants": 500, // there may participants who are not a member of any cluster
    "majority_opinions": [
        {
            "opinion_slug_id": "opinion_001",
            "opinion_content": "Remote work increases productivity for most employees.",
            "percentage_agree": 5,
            "percentage_disagree": 72 // does not necessarily add to max, since all participants may not have voted
        },
        {
            "opinion_slug_id": "opinion_002",
            "opinion_content": "Companies should offer a hybrid work model.",
            "percentage_agree": 65,
            "percentage_disagree": 10

        }
    ],
    "controversial_opinions": [
        {
            "opinion_slug_id": "opinion_003",
            "opinion_content": "Fully remote work reduces team collaboration.",
            "percentage_agree": 52,
            "percentage_disagree": 46
        }
    ],
    "clusters": {
        "0": {
            "num_members": 50, 
            "majority_opinions": [
                {
                    "opinion_slug_id": "opinion_001",
                    "opinion_content": "Remote work increases productivity for most employees.",
                    "percentage_disagree": 2,
                    "percentage_agree": 80
                }
            ],
            "controversial_opinions": [
                {
                    "opinion_slug_id": "opinion_004",
                    "opinion_content": "Office work is outdated and unnecessary.",
                    "percentage_agree": 50,
                    "percentage_disagree": 48
                }
            ]
        },
        "1": {
            "num_members": 150, 
            "majority_opinions": [
                {
                    "opinion_slug_id": "opinion_002",
                    "opinion_content": "Companies should offer a hybrid work model.",
                    "percentage_disagree": "30",
                    "percentage_agree": 70
                }
            ],
            "controversial_opinions": [
                {
                    "opinion_slug_id": "opinion_003",
                    "opinion_content": "Fully remote work reduces team collaboration.",
                    "percentage_agree": 54,
                    "percentage_disagree": 40
                }
            ]
        }
    }
}

Expected output to the input JSON defined in 2. is a JSON file of the following format:

{
    "label": "<AI-generated label for the whole conversation>",
    "summary": "<AI-generated summary for the whole conversation>",
    "clusters": {
        "0": {
            "label": "<AI-generated label based on input>",
            "summary": "<AI-generated summary based on input>"
        },
        "1": {
            "label": "<AI-generated label based on input>",
            "summary": "<AI-generated summary based on input>"
        }
        // etc, can be many (up to 6 clusters in total, key = "0" to "5")
    } // May be empty if no clusters are present in the input file
}

IMPORTANT NOTE: tell the LLM to be succint for the AI label and summary. Prefer a simple to understand and non-verbose style. For the AI labels: max 2 words and max ~60 characters in total including space.
For the AI summaries, max 300 characters.

@nicobao
Copy link
Member Author

nicobao commented Feb 12, 2025

To test the prompt, we can use a chatbot to convert conversation inputs from a polis-style dialogue into the required JSON format. Simply copy and paste the relevant data into the chatbot.

Next, we can train the LLM to automatically convert these vague text inputs into the expected JSON format. Once the chatbot has processed numerous examples, we can ask the LLM to generate synthetic input examples on its own. Additionally, the LLM can be instructed to search the internet for inspiration and real-world examples to further enhance the "test" dataset.

When we manage to do that, save the prompts! So we can reuse them.

@nicobao nicobao changed the title feat: add AI labels, and AI summaries feat: add AI labels and AI summaries Feb 12, 2025
@nicobao nicobao changed the title feat: add AI labels and AI summaries feat(polis): add AI labels and AI summaries Feb 12, 2025
@hanaanr
Copy link

hanaanr commented Feb 15, 2025

@nicobao hello, could you pls provide a link to the polis conversation that was used in the sample json input? can't seem to find this specific convo -- thank you!

@hanaanr
Copy link

hanaanr commented Feb 15, 2025

Quick question: It seems like pol.is defines majority opinions as >60% agree/disagree, while the above guidelines says >50%. Currently using 60% for testing (since I'm using pol.is convo for testing). Thoughts?

@nicobao
Copy link
Member Author

nicobao commented Feb 15, 2025

@nicobao hello, could you pls provide a link to the polis conversation that was used in the sample json input? can't seem to find this specific convo -- thank you!

The sample JSON input was... Generated by AI 😂 I don't think it's an interesting example to use

@nicobao
Copy link
Member Author

nicobao commented Feb 15, 2025

Quick question: It seems like pol.is defines majority opinions as >60% agree/disagree, while the above guidelines says >50%. Currently using 60% for testing (since I'm using pol.is convo for testing). Thoughts?

Interesting, could you link to the documentation that explains that pol.is uses >60%?
In Agora, we use a >50% threshold, so I think we should stick with that.

@hanaanr
Copy link

hanaanr commented Feb 18, 2025

https://pol.is/report/r326b8eam5cbbadnwrivd For instance, in their conversation reports (scroll down to Majority), it says

" Majority

Here's what most people agreed with.

60% or more of all participants voted one way or the other, regardless of whether large amounts of certain minority opinion groups voted the other way."

@nicobao
Copy link
Member Author

nicobao commented Mar 10, 2025

https://pol.is/report/r326b8eam5cbbadnwrivd For instance, in their conversation reports (scroll down to Majority), it says

" Majority

Here's what most people agreed with.

60% or more of all participants voted one way or the other, regardless of whether large amounts of certain minority opinion groups voted the other way."

Yes, I might adjust to Pol.is interpretation, since it was battle-tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants