Add pre-commit with black (#36)

cat-state · web-flow · commit 16a482c335ef · 2022-10-17T17:15:19.000-07:00
* add pre-commit

* format
diff --git a/.gitignore b/.gitignore
@@ -144,4 +144,4 @@ nbs/wandb/
 
 wandb/
 
-OUT/
+OUT/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,17 @@
+# See https://pre-commit.com for more information
+# See https://pre-commit.com/hooks.html for more hooks
+# This should be the _latest_ version of python supported by us
+default_language_version:
+  python: python3.9
+repos:
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v3.2.0
+    hooks:
+    -   id: trailing-whitespace
+    -   id: end-of-file-fixer
+    -   id: check-yaml
+-   repo: https://github.com/psf/black
+    rev: 22.10.0
+    hooks:
+      - id: black
+        files: ^(trlx|examples|unittests|setup.py)/
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ The training pipeline is broken into four pieces:
 - Orchestrator: Handles exploration/rollout collection of online methods. Pushes collected rollouts to the rollout pipeline.
 - Model: Wraps the supplied base model (ex: `gpt2`) and implements the desired training method loss (ex: PPO).
 
-Adding a task for RLHF training depends on the desired training method and pre-existing data. If we are online and have no reward labeled data this is as simple as writing a new prompt pipeline, which supplies prompts for exploration, and a new reward function to be passed into the `PPOOrchestrator` class. 
+Adding a task for RLHF training depends on the desired training method and pre-existing data. If we are online and have no reward labeled data this is as simple as writing a new prompt pipeline, which supplies prompts for exploration, and a new reward function to be passed into the `PPOOrchestrator` class.
 
 ## Example: How to add a task
 
diff --git a/configs/ppo_config.yml b/configs/ppo_config.yml
@@ -49,4 +49,4 @@ method:
     min_length : 48  # LM min sample gen length
     top_k : 0.0  # top k
     top_p : 1.0  # top p
-    do_sample : True  # sample
+    do_sample : True  # sample
diff --git a/configs/ppo_gptj.yml b/configs/ppo_gptj.yml
@@ -50,4 +50,4 @@ method:
     top_k : 0.0  # top k
     top_p : 0.7  # top p
     do_sample : True  # sample
-    temperature: 0.5
+    temperature: 0.5
diff --git a/configs/test_config.yml b/configs/test_config.yml
@@ -49,4 +49,4 @@ method:
     min_length : 48  # LM min sample gen length
     top_k : 0.0  # top k
     top_p : 1.0  # top p
-    do_sample : True  # sample
+    do_sample : True  # sample
diff --git a/docs/source/configs.rst b/docs/source/configs.rst
@@ -6,10 +6,10 @@ Configs
 Training a model in TRL will require you to set several configs:
 ModelConfig, which contains general info on the model being trained. TrainConfig, which contains things like
 training hyperparameters. And finally, MethodConfig, which contains hyperparameters or settings for
-the specific method being used (i.e. ILQL or PPO)  
+the specific method being used (i.e. ILQL or PPO)
 
 
-**General**  
+**General**
 
 .. autoclass:: trlx.data.configs.TRLConfig
     :members:
@@ -21,9 +21,9 @@ the specific method being used (i.e. ILQL or PPO)
     :members:
 
 .. autoclass:: trlx.data.method_configs.MethodConfig
-    :members:  
+    :members:
 
-**PPO**  
+**PPO**
 
 .. autoclass:: trlx.data.method_configs.PPOConfig
     :undoc-members:
diff --git a/docs/source/data.rst b/docs/source/data.rst
@@ -3,15 +3,15 @@
 Data Elements
 ************************
 
-All of the major Carper projects: trlX, CHEESE, and magiCARP use 
+All of the major Carper projects: trlX, CHEESE, and magiCARP use
 dataclasses corresponding to batches of data to communicate data between models and different
 components. trlX is no different, though it has many different dataclasses for
 different components like training or inference. Currently, we support PPO and ILQL, which
 each demand different kinds of data during training.
-  
 
-**Basic Data Elements for Accelerate**  
-   
+
+**Basic Data Elements for Accelerate**
+
 
 .. autoclass:: trlx.data.accelerate_base_datatypes.PromptElement
     :members:
@@ -25,9 +25,9 @@ each demand different kinds of data during training.
 .. autoclass:: trlx.data.accelerate_base_datatypes.AccelerateRLBatchElement
     :members:
 
-  
-**Data Elements for PPO**  
-  
+
+**Data Elements for PPO**
+
 .. autoclass:: trlx.data.ppo_types.PPORLElement
     :members:
 
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -6,7 +6,7 @@
 Welcome to trlX's documentation!
 ================================
 trlX is a library made for training large language models using reinforcement learning. It
-currently supports training using PPO or ILQL for models up to 20B using Accelerate.  
+currently supports training using PPO or ILQL for models up to 20B using Accelerate.
 
 .. toctree::
    :maxdepth: 2
diff --git a/docs/source/models.rst b/docs/source/models.rst
@@ -3,18 +3,18 @@
 RL Models
 *******************
 
-RL Models are what you're training with trlX. Currently, we support PPO and ILQL. 
+RL Models are what you're training with trlX. Currently, we support PPO and ILQL.
 Note that new models must be registered with ``trlx.model.register_model``.
-  
-**General**  
+
+**General**
 
 .. autoclass:: trlx.model.BaseRLModel
     :members:
 
 .. autoclass:: trlx.model.accelerate_base_model.AccelerateRLModel
     :members:
 
-**PPO**  
+**PPO**
 
 .. autoclass:: trlx.model.accelerate_ppo_model.AcceleratePPOModel
     :members:
diff --git a/docs/source/orchestrator.rst b/docs/source/orchestrator.rst
@@ -7,17 +7,17 @@ Orchestrators manage reading data from a pipeline and creating RL data elements
 to push to a models rollout storage. Use the ``trlx.orchestrator.register_orchestrator`` decorator when creating
 new orchestrators.
 
-**General**  
+**General**
 
 .. autoclass:: trlx.orchestrator.Orchestrator
    :members:
 
-**PPO**  
+**PPO**
 
 .. autoclass:: trlx.orchestrator.ppo_orchestrator.PPOOrchestrator
     :members:
 
-**ILQL**  
+**ILQL**
 
 .. autoclass:: trlx.orchestrator.offline_orchestrator.OfflineOrchestrator
     :members:
diff --git a/examples/ilql_randomwalks.py b/examples/ilql_randomwalks.py
@@ -99,9 +99,7 @@ def reward_fn(samples):
         n_layer=4, n_embd=144, vocab_size=logit_mask.shape[0]
     )
 
-    model = ILQLModel(
-        config=config, logit_mask=logit_mask
-    )
+    model = ILQLModel(config=config, logit_mask=logit_mask)
 
     orch = OfflineOrchestrator(
         model=model,
diff --git a/requirements.txt b/requirements.txt
@@ -3,6 +3,7 @@ datasets==2.4.0
 deepspeed==0.7.3
 einops==0.4.1
 numpy==1.23.2
+pre-commit==2.20.0
 tqdm==4.64.0
 transformers==4.21.2
 wandb==0.13.2
diff --git a/setup.cfg b/setup.cfg
@@ -4,7 +4,7 @@ author = Alex Havrilla
 version = 1.0.0
 
 [options]
-install_requires = 
+install_requires =
     accelerate
     datasets
     deepspeed
@@ -13,4 +13,4 @@ install_requires =
     tqdm
     transformers
     wandb
-    torchtyping
+    torchtyping
diff --git a/setup.py b/setup.py
@@ -1,3 +1,3 @@
 from setuptools import setup
 
-setup()
+setup()
diff --git a/trlx/data/accelerate_base_datatypes.py b/trlx/data/accelerate_base_datatypes.py
@@ -16,8 +16,10 @@ class PromptElement:
     :param tokens: The prompt tokens. Should be a long tensor
     :type tokens: torch.Tensor
     """
-    text : str
-    tokens : TensorType["num_tokens"]
+
+    text: str
+    tokens: TensorType["num_tokens"]
+
 
 @dataclass
 class PromptBatch:
@@ -30,8 +32,10 @@ class PromptBatch:
     :param tokens: A long tensor batch of prompt tokens.
     :type tokens: torch.Tensor
     """
-    text : Iterable[str]
-    tokens : TensorType["batch_size", "num_tokens"]
+
+    text: Iterable[str]
+    tokens: TensorType["batch_size", "num_tokens"]
+
 
 @dataclass
 class AccelerateRLElement:
@@ -44,8 +48,10 @@ class AccelerateRLElement:
     :param rewards: The rewards for each token. Should be a float tensor of same size as tokens.
     :type rewards: torch.Tensor
     """
-    output_tokens : TensorType["output_size"]
-    rewards : TensorType["output_size"]
+
+    output_tokens: TensorType["output_size"]
+    rewards: TensorType["output_size"]
+
 
 @dataclass
 class AccelerateRLBatchElement:
@@ -58,5 +64,6 @@ class AccelerateRLBatchElement:
     :param rewards: Batches of float tensors of rewards for each output token.
     :type rewards: torch.Tensor
     """
-    output_tokens : TensorType["batch_size", "output_size"]
-    rewards : TensorType["batch_size", "output_size"]
+
+    output_tokens: TensorType["batch_size", "output_size"]
+    rewards: TensorType["batch_size", "output_size"]
diff --git a/trlx/data/configs.py b/trlx/data/configs.py
@@ -24,11 +24,12 @@ class ModelConfig:
     :param device: Device to use when doing single GPU training. Not needed in most cases.
     :type device: str
     """
-    model_path : str
-    tokenizer_path : str
-    model_type : str # One of the architectures present in framework.model
-    device : str = ''
-    num_layers_unfrozen : int = -1
+
+    model_path: str
+    tokenizer_path: str
+    model_type: str  # One of the architectures present in framework.model
+    device: str = ""
+    num_layers_unfrozen: int = -1
 
     @classmethod
     def from_dict(cls, config: Dict[str, Any]):
@@ -91,11 +92,12 @@ class TrainConfig:
     :param project_name: Project name for wandb
     :type project_name: str
     """
-    n_ctx : int
-    epochs : int
-    total_steps : int
-    batch_size : int
-    grad_clip : float # Clip grad norms to this value
+
+    n_ctx: int
+    epochs: int
+    total_steps: int
+    batch_size: int
+    grad_clip: float  # Clip grad norms to this value
 
     lr_ramp_steps: int
     lr_decay_steps: int
@@ -128,9 +130,10 @@ class TRLConfig:
     """
     Top level config for trlX. Loads configs and can be converted to dictionary.
     """
-    model : ModelConfig
-    train : TrainConfig
-    method : MethodConfig
+
+    model: ModelConfig
+    train: TrainConfig
+    method: MethodConfig
 
     @classmethod
     def load_yaml(cls, yml_fp: str):
diff --git a/trlx/data/ilql_types.py b/trlx/data/ilql_types.py
@@ -20,6 +20,7 @@ class ILQLElement:
     :param rewards: Rewards for each token. Should be a float tensor of same size as tokens.
     :type rewards: torch.Tensor
     """
+
     input_ids: TensorType["query_size"]
     attention_mask: TensorType["query_size"]
     rewards: TensorType["reward_size"]
@@ -39,6 +40,7 @@ class ILQLBatch:
     :param rewards: Batch of rewards for each token in each token batch.
     :type rewards: torch.Tensor
     """
+
     input_ids: TensorType["batch_size", "query_size"]
     attention_mask: TensorType["batch_size", "query_size"]
     rewards: TensorType["batch_size", "reward_size"]
diff --git a/trlx/data/method_configs.py b/trlx/data/method_configs.py
@@ -50,7 +50,8 @@ class MethodConfig:
     :param name: Name of the method
     :type name: str
     """
-    name : str
+
+    name: str
 
     @classmethod
     def from_dict(cls, config: Dict[str, Any]):
diff --git a/trlx/data/ppo_types.py b/trlx/data/ppo_types.py
@@ -25,11 +25,13 @@ class PPORLElement:
     :param rewards: The rewards for each token outputted in response. Should be a float tensor of same size as tokens.
     :type rewards: torch.Tensor
     """
-    query_tensor : TensorType["query_size"]
-    response_tensor : TensorType["response_size"]
-    logprobs : TensorType["response_size", "vocab_size"]
-    values : TensorType["response_size"]
-    rewards : TensorType["response_size"]
+
+    query_tensor: TensorType["query_size"]
+    response_tensor: TensorType["response_size"]
+    logprobs: TensorType["response_size", "vocab_size"]
+    values: TensorType["response_size"]
+    rewards: TensorType["response_size"]
+
 
 @dataclass
 class PPORLBatch:
@@ -51,8 +53,9 @@ class PPORLBatch:
     :param rewards: A batch of rewards
     :type rewards: torch.Tensor
     """
-    query_tensors : TensorType["batch_size", "query_size"]
-    response_tensors : TensorType["batch_size", "response_size"]
-    logprobs : TensorType["batch_size", "response_size", "vocab_size"]
-    values : TensorType["batch_size", "response_size"]
-    rewards : TensorType["batch_size", "response_size"]
+
+    query_tensors: TensorType["batch_size", "query_size"]
+    response_tensors: TensorType["batch_size", "response_size"]
+    logprobs: TensorType["batch_size", "response_size", "vocab_size"]
+    values: TensorType["batch_size", "response_size"]
+    rewards: TensorType["batch_size", "response_size"]
diff --git a/trlx/model/accelerate_base_model.py b/trlx/model/accelerate_base_model.py
diff --git a/trlx/model/accelerate_ilql_model.py b/trlx/model/accelerate_ilql_model.py
diff --git a/trlx/model/accelerate_ppo_model.py b/trlx/model/accelerate_ppo_model.py
diff --git a/trlx/model/nn/ppo_models.py b/trlx/model/nn/ppo_models.py
diff --git a/trlx/orchestrator/offline_orchestrator.py b/trlx/orchestrator/offline_orchestrator.py
diff --git a/trlx/orchestrator/ppo_orchestrator.py b/trlx/orchestrator/ppo_orchestrator.py
diff --git a/trlx/pipeline/accelerate_base_pipeline.py b/trlx/pipeline/accelerate_base_pipeline.py
diff --git a/trlx/pipeline/ppo_pipeline.py b/trlx/pipeline/ppo_pipeline.py
diff --git a/unittests/test_ppo.py b/unittests/test_ppo.py

Original file line number	Diff line number	Diff line change
`@@ -144,4 +144,4 @@ nbs/wandb/`
`144`	`144`
`145`	`145`	`wandb/`
`146`	`146`
`147`		`-OUT/`
	`147`	`+OUT/`
Original file line number	Diff line number	Diff line change
`@@ -99,9 +99,7 @@ def reward_fn(samples):`
`99`	`99`	`n_layer=4, n_embd=144, vocab_size=logit_mask.shape[0]`
`100`	`100`	`)`
`101`	`101`
`102`		`- model = ILQLModel(`
`103`		`- config=config, logit_mask=logit_mask`
`104`		`- )`
	`102`	`+ model = ILQLModel(config=config, logit_mask=logit_mask)`
`105`	`103`
`106`	`104`	`orch = OfflineOrchestrator(`
`107`	`105`	`model=model,`
Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,3 @@`
`1`	`1`	`from setuptools import setup`
`2`	`2`
`3`		`-setup()`
	`3`	`+setup()`