|
| 1 | +# FinRL Environment |
| 2 | + |
| 3 | +A wrapper around [FinRL](https://github.com/AI4Finance-Foundation/FinRL) stock trading environments that conforms to the OpenEnv specification. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +This environment enables reinforcement learning for stock trading tasks using FinRL's powerful StockTradingEnv, exposed through OpenEnv's simple HTTP API. It supports: |
| 8 | + |
| 9 | +- **Stock Trading**: Buy/sell actions across multiple stocks |
| 10 | +- **Portfolio Management**: Track balance, holdings, and portfolio value |
| 11 | +- **Technical Indicators**: MACD, RSI, CCI, DX, and more |
| 12 | +- **Flexible Configuration**: Custom data sources and trading parameters |
| 13 | + |
| 14 | +## Quick Start |
| 15 | + |
| 16 | +### 1. Build the Docker Image |
| 17 | + |
| 18 | +First, build the base image (from OpenEnv root): |
| 19 | + |
| 20 | +```bash |
| 21 | +cd OpenEnv |
| 22 | +docker build -t envtorch-base:latest -f src/core/containers/images/Dockerfile . |
| 23 | +``` |
| 24 | + |
| 25 | +Then build the FinRL environment image: |
| 26 | + |
| 27 | +```bash |
| 28 | +docker build -t finrl-env:latest -f src/envs/finrl_env/server/Dockerfile . |
| 29 | +``` |
| 30 | + |
| 31 | +### 2. Run the Server |
| 32 | + |
| 33 | +#### Option A: With Default Sample Data |
| 34 | + |
| 35 | +```bash |
| 36 | +docker run -p 8000:8000 finrl-env:latest |
| 37 | +``` |
| 38 | + |
| 39 | +This starts the server with synthetic sample data for testing. |
| 40 | + |
| 41 | +#### Option B: With Custom Configuration |
| 42 | + |
| 43 | +Create a configuration file `config.json`: |
| 44 | + |
| 45 | +```json |
| 46 | +{ |
| 47 | + "data_path": "/data/stock_data.csv", |
| 48 | + "stock_dim": 3, |
| 49 | + "hmax": 100, |
| 50 | + "initial_amount": 100000, |
| 51 | + "num_stock_shares": [0, 0, 0], |
| 52 | + "buy_cost_pct": [0.001, 0.001, 0.001], |
| 53 | + "sell_cost_pct": [0.001, 0.001, 0.001], |
| 54 | + "reward_scaling": 0.0001, |
| 55 | + "state_space": 25, |
| 56 | + "action_space": 3, |
| 57 | + "tech_indicator_list": ["macd", "rsi_30", "cci_30", "dx_30"] |
| 58 | +} |
| 59 | +``` |
| 60 | + |
| 61 | +Run with configuration: |
| 62 | + |
| 63 | +```bash |
| 64 | +docker run -p 8000:8000 \ |
| 65 | + -v $(pwd)/config.json:/config/config.json \ |
| 66 | + -v $(pwd)/data:/data \ |
| 67 | + -e FINRL_CONFIG_PATH=/config/config.json \ |
| 68 | + finrl-env:latest |
| 69 | +``` |
| 70 | + |
| 71 | +### 3. Use the Client |
| 72 | + |
| 73 | +```python |
| 74 | +from envs.finrl_env import FinRLEnv, FinRLAction |
| 75 | +import numpy as np |
| 76 | + |
| 77 | +# Connect to server |
| 78 | +client = FinRLEnv(base_url="http://localhost:8000") |
| 79 | + |
| 80 | +# Get configuration |
| 81 | +config = client.get_config() |
| 82 | +print(f"Trading {config['stock_dim']} stocks") |
| 83 | +print(f"Initial capital: ${config['initial_amount']:,.0f}") |
| 84 | + |
| 85 | +# Reset environment |
| 86 | +result = client.reset() |
| 87 | +print(f"Initial portfolio value: ${result.observation.portfolio_value:,.2f}") |
| 88 | + |
| 89 | +# Trading loop |
| 90 | +for step in range(100): |
| 91 | + # Get current state |
| 92 | + state = result.observation.state |
| 93 | + |
| 94 | + # Your RL policy here (example: random actions) |
| 95 | + num_stocks = config['stock_dim'] |
| 96 | + actions = np.random.uniform(-1, 1, size=num_stocks).tolist() |
| 97 | + |
| 98 | + # Execute action |
| 99 | + result = client.step(FinRLAction(actions=actions)) |
| 100 | + |
| 101 | + print(f"Step {step}: Portfolio=${result.observation.portfolio_value:,.2f}, " |
| 102 | + f"Reward={result.reward:.2f}") |
| 103 | + |
| 104 | + if result.done: |
| 105 | + print("Episode finished!") |
| 106 | + break |
| 107 | + |
| 108 | +client.close() |
| 109 | +``` |
| 110 | + |
| 111 | +## Architecture |
| 112 | + |
| 113 | +``` |
| 114 | +┌─────────────────────────────────────────────────────────────┐ |
| 115 | +│ RL Training Framework │ |
| 116 | +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ |
| 117 | +│ │ Policy Net │ │ Value Net │ │ Replay │ │ |
| 118 | +│ │ (PyTorch) │ │ (PyTorch) │ │ Buffer │ │ |
| 119 | +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ |
| 120 | +│ └──────────────────┴──────────────────┘ │ |
| 121 | +│ │ │ |
| 122 | +│ ┌────────▼────────┐ │ |
| 123 | +│ │ FinRLEnv │ ← HTTP Client │ |
| 124 | +│ │ (HTTPEnvClient) │ │ |
| 125 | +│ └────────┬────────┘ │ |
| 126 | +└────────────────────────────┼─────────────────────────────────┘ |
| 127 | + │ HTTP (JSON) |
| 128 | + ┌────────▼────────┐ |
| 129 | + │ Docker Container│ |
| 130 | + │ Port: 8000 │ |
| 131 | + │ │ |
| 132 | + │ ┌─────────────┐ │ |
| 133 | + │ │FastAPI │ │ |
| 134 | + │ │Server │ │ |
| 135 | + │ └──────┬──────┘ │ |
| 136 | + │ │ │ |
| 137 | + │ ┌──────▼──────┐ │ |
| 138 | + │ │ FinRL │ │ |
| 139 | + │ │ Environment │ │ |
| 140 | + │ └──────┬──────┘ │ |
| 141 | + │ │ │ |
| 142 | + │ ┌──────▼──────┐ │ |
| 143 | + │ │ FinRL │ │ |
| 144 | + │ │ StockTrading│ │ |
| 145 | + │ │ Env │ │ |
| 146 | + │ └─────────────┘ │ |
| 147 | + └─────────────────┘ |
| 148 | +``` |
| 149 | + |
| 150 | +## API Reference |
| 151 | + |
| 152 | +### FinRLAction |
| 153 | + |
| 154 | +Trading action for the environment. |
| 155 | + |
| 156 | +**Attributes:** |
| 157 | +- `actions: list[float]` - Array of normalized action values (-1 to 1) for each stock |
| 158 | + - Positive values: Buy |
| 159 | + - Negative values: Sell |
| 160 | + - Magnitude: Relative trade size |
| 161 | + |
| 162 | +**Example:** |
| 163 | +```python |
| 164 | +# Buy stock 0, sell stock 1, hold stock 2 |
| 165 | +action = FinRLAction(actions=[0.5, -0.3, 0.0]) |
| 166 | +``` |
| 167 | + |
| 168 | +### FinRLObservation |
| 169 | + |
| 170 | +Observation returned by the environment. |
| 171 | + |
| 172 | +**Attributes:** |
| 173 | +- `state: list[float]` - Flattened state vector |
| 174 | + - Structure: `[balance, prices..., holdings..., indicators...]` |
| 175 | +- `portfolio_value: float` - Total portfolio value (cash + holdings) |
| 176 | +- `date: str` - Current trading date |
| 177 | +- `done: bool` - Whether episode has ended |
| 178 | +- `reward: float` - Reward for the last action |
| 179 | +- `metadata: dict` - Additional information |
| 180 | + |
| 181 | +**Example:** |
| 182 | +```python |
| 183 | +obs = result.observation |
| 184 | +print(f"Portfolio: ${obs.portfolio_value:,.2f}") |
| 185 | +print(f"Date: {obs.date}") |
| 186 | +print(f"State dimension: {len(obs.state)}") |
| 187 | +``` |
| 188 | + |
| 189 | +### Client Methods |
| 190 | + |
| 191 | +#### `reset() -> StepResult[FinRLObservation]` |
| 192 | + |
| 193 | +Reset the environment to start a new episode. |
| 194 | + |
| 195 | +```python |
| 196 | +result = client.reset() |
| 197 | +``` |
| 198 | + |
| 199 | +#### `step(action: FinRLAction) -> StepResult[FinRLObservation]` |
| 200 | + |
| 201 | +Execute a trading action. |
| 202 | + |
| 203 | +```python |
| 204 | +action = FinRLAction(actions=[0.5, -0.3]) |
| 205 | +result = client.step(action) |
| 206 | +``` |
| 207 | + |
| 208 | +#### `state() -> State` |
| 209 | + |
| 210 | +Get episode metadata (episode_id, step_count). |
| 211 | + |
| 212 | +```python |
| 213 | +state = client.state() |
| 214 | +print(f"Episode: {state.episode_id}, Step: {state.step_count}") |
| 215 | +``` |
| 216 | + |
| 217 | +#### `get_config() -> dict` |
| 218 | + |
| 219 | +Get environment configuration. |
| 220 | + |
| 221 | +```python |
| 222 | +config = client.get_config() |
| 223 | +print(config['stock_dim']) |
| 224 | +print(config['initial_amount']) |
| 225 | +``` |
| 226 | + |
| 227 | +## Data Format |
| 228 | + |
| 229 | +The environment expects stock data in the following CSV format: |
| 230 | + |
| 231 | +| date | tic | close | high | low | open | volume | macd | rsi_30 | cci_30 | dx_30 | |
| 232 | +|------------|--------|--------|--------|--------|--------|---------|-------|--------|--------|-------| |
| 233 | +| 2020-01-01 | AAPL | 100.0 | 102.0 | 98.0 | 99.0 | 1000000 | 0.5 | 55.0 | 10.0 | 15.0 | |
| 234 | +| 2020-01-01 | GOOGL | 1500.0 | 1520.0 | 1480.0 | 1490.0 | 500000 | -0.3 | 48.0 | -5.0 | 20.0 | |
| 235 | + |
| 236 | +**Required columns:** |
| 237 | +- `date`: Trading date |
| 238 | +- `tic`: Stock ticker symbol |
| 239 | +- `close`, `high`, `low`, `open`: Price data |
| 240 | +- `volume`: Trading volume |
| 241 | +- Technical indicators (as specified in `tech_indicator_list`) |
| 242 | + |
| 243 | +## Configuration Parameters |
| 244 | + |
| 245 | +| Parameter | Type | Description | |
| 246 | +|-----------|------|-------------| |
| 247 | +| `data_path` | str | Path to CSV file with stock data | |
| 248 | +| `stock_dim` | int | Number of stocks to trade | |
| 249 | +| `hmax` | int | Maximum shares per trade | |
| 250 | +| `initial_amount` | int | Starting cash balance | |
| 251 | +| `num_stock_shares` | list[int] | Initial holdings for each stock | |
| 252 | +| `buy_cost_pct` | list[float] | Transaction cost for buying (per stock) | |
| 253 | +| `sell_cost_pct` | list[float] | Transaction cost for selling (per stock) | |
| 254 | +| `reward_scaling` | float | Scaling factor for rewards | |
| 255 | +| `state_space` | int | Dimension of state vector | |
| 256 | +| `action_space` | int | Dimension of action space | |
| 257 | +| `tech_indicator_list` | list[str] | Technical indicators to include | |
| 258 | + |
| 259 | +## Integration with RL Frameworks |
| 260 | + |
| 261 | +### Stable Baselines 3 |
| 262 | + |
| 263 | +```python |
| 264 | +from stable_baselines3 import PPO |
| 265 | +from envs.finrl_env import FinRLEnv, FinRLAction |
| 266 | +import numpy as np |
| 267 | + |
| 268 | +# Create custom wrapper for SB3 |
| 269 | +class SB3FinRLWrapper: |
| 270 | + def __init__(self, base_url): |
| 271 | + self.env = FinRLEnv(base_url=base_url) |
| 272 | + config = self.env.get_config() |
| 273 | + self.action_space = spaces.Box( |
| 274 | + low=-1, high=1, |
| 275 | + shape=(config['action_space'],), |
| 276 | + dtype=np.float32 |
| 277 | + ) |
| 278 | + self.observation_space = spaces.Box( |
| 279 | + low=-np.inf, high=np.inf, |
| 280 | + shape=(config['state_space'],), |
| 281 | + dtype=np.float32 |
| 282 | + ) |
| 283 | + |
| 284 | + def reset(self): |
| 285 | + result = self.env.reset() |
| 286 | + return np.array(result.observation.state, dtype=np.float32) |
| 287 | + |
| 288 | + def step(self, action): |
| 289 | + result = self.env.step(FinRLAction(actions=action.tolist())) |
| 290 | + return ( |
| 291 | + np.array(result.observation.state, dtype=np.float32), |
| 292 | + result.reward or 0.0, |
| 293 | + result.done, |
| 294 | + result.observation.metadata |
| 295 | + ) |
| 296 | + |
| 297 | +# Train |
| 298 | +env = SB3FinRLWrapper("http://localhost:8000") |
| 299 | +model = PPO("MlpPolicy", env, verbose=1) |
| 300 | +model.learn(total_timesteps=10000) |
| 301 | +``` |
| 302 | + |
| 303 | +## Troubleshooting |
| 304 | + |
| 305 | +### Server won't start |
| 306 | + |
| 307 | +1. Check if base image exists: |
| 308 | + ```bash |
| 309 | + docker images | grep envtorch-base |
| 310 | + ``` |
| 311 | + |
| 312 | +2. Build base image if missing: |
| 313 | + ```bash |
| 314 | + docker build -t envtorch-base:latest -f src/core/containers/images/Dockerfile . |
| 315 | + ``` |
| 316 | + |
| 317 | +### Import errors |
| 318 | + |
| 319 | +Make sure you're in the `src` directory: |
| 320 | +```bash |
| 321 | +cd OpenEnv/src |
| 322 | +python -c "from envs.finrl_env import FinRLEnv" |
| 323 | +``` |
| 324 | + |
| 325 | +### Configuration errors |
| 326 | + |
| 327 | +Verify your data file has all required columns: |
| 328 | +```python |
| 329 | +import pandas as pd |
| 330 | +df = pd.read_csv('your_data.csv') |
| 331 | +print(df.columns.tolist()) |
| 332 | +``` |
| 333 | + |
| 334 | +## Examples |
| 335 | + |
| 336 | +See the `examples/` directory for complete examples: |
| 337 | +- `examples/finrl_simple.py` - Basic usage |
| 338 | +- `examples/finrl_training.py` - Full training loop with PPO |
| 339 | +- `examples/finrl_backtesting.py` - Backtesting a trained agent |
| 340 | + |
| 341 | +## License |
| 342 | + |
| 343 | +BSD 3-Clause License (see LICENSE file in repository root) |
| 344 | + |
| 345 | +## References |
| 346 | + |
| 347 | +- [FinRL Paper](https://arxiv.org/abs/2011.09607) |
| 348 | +- [FinRL GitHub](https://github.com/AI4Finance-Foundation/FinRL) |
| 349 | +- [OpenEnv Documentation](../../README.md) |
0 commit comments