Skip to content

Security measure for agentic LLMs using a council of AIs moderted by a veto system. The council judges an agent's actions outputs based on specified categories.

License

Notifications You must be signed in to change notification settings

seanpixel/council-of-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

council-of-ai

Security measure for agentic LLMs using a council of AIs moderted by a veto system. The council judges an agent's actions outputs based on specified categories.

Objective

Implement a system to judge AI Agents outputs using a council of AI models. Decentralize the decision making power to avoid potential disasters.

Sections

How it Works

Language models, acting as a "judge", will rate an AI output out of 10. If any of the judges in the council (formed by a group of judges) vetoes an output (verdict == false), that output will be flagged as being potentially immoral/unjust/harmful/useless.

How to Use

  1. Clone the repository via git clone https://github.com/seanpixel/council-of-ai.git and cd into the cloned repository.
  2. Install required packages by doing: pip install -r requirements.txt
  3. Download the ethics dataset from here and move it into root (same dir as main.py).
  4. Create a .env file or plug in your key in judge.py (line 8), all you need is an OPENAI_API_KEY
  5. Go to main.py and choose the test type using the choice variable (default is commonsense)
  6. Run python main.py and see what kinds of judgements the council makes

Note: For for "commonsense" AITA (Am I the Asshole?) questions, "allowed" means you are the asshole and "blocked" means you are not the asshole (so it's kind of inverted).

More about the Project & Me

After creating Teenage-AGI, I wondered about potential implications of Agentic LLMs and some ways to moderate its unpredictable behaviors. From this, I thought of democracy and how a decentralized system of AIs could monitor other AIs from causing harm. So came council-of-ai. While contributing to the "acceleration" of technology, I still care about AI Safety and believe that safely guiding AI towards the future can be as fun and exciting as accelerating.

I'm a founder currently running a startup called DSNR and also a first-year at USC. Contact me on twitter about anything would love to chat.

What you can do

Create more "setups", these are basically the characteristics of the judges. Play around with more example Agent outputs and possbily use your own by adding them to "actions.yaml". Use more judges or even plug in your own local LLM. Or even better, implement the council on an unaligned base model (Llama?) and experiment. This is a growing initiative so any help would be appreciated.

Credits

Credits to @DanHendrycks for the Ethics dataset used in testing the idea.

About

Security measure for agentic LLMs using a council of AIs moderted by a veto system. The council judges an agent's actions outputs based on specified categories.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages