MEVerse is an RL-style OpenEnv environment for evaluating whether an agent can trade and manage liquidity in a Uniswap V3-like market while avoiding MEV attacks.
Reinforcement Learning, or RL, is a setup where an agent:
- observes the current state of an environment
- chooses an action
- receives a reward or penalty
- improves by learning which actions lead to better long-term outcomes
In our project, the environment is a simulated DeFi market. The agent must decide when to trade, when to provide liquidity, when to wait, and how to avoid being exploited by adversarial bots.
MEV stands for Maximal Extractable Value. In simple terms, it is the profit that bots or validators can extract by seeing a transaction before it is finalized and then reordering, inserting, or copying transactions around it.
For a normal user this means:
- you submit a trade
- a bot sees it in the public mempool
- the bot acts before or around your trade
- you get a worse price, lose fees, or both
This is not a theoretical problem.
MEV bots extracted $686M from Ethereum users in 2023.
We are building an environment where an agent learns and is evaluated on a real task:
- executing swaps in a concentrated-liquidity market
- managing liquidity positions
- recognizing mempool danger signals
- adapting to adversarial bot behavior
Instead of scoring a model on a static question-answer benchmark, MEVerse tests whether the agent can make a sequence of decisions under changing market conditions.
Most trading benchmarks are too simple:
BUY / SELL / HOLD- no adversary
- no market microstructure
- no mempool visibility
MEVerse is built to be closer to the real operational problem in DeFi:
- liquidity is concentrated by price range, like Uniswap V3
- transactions can be observed and exploited before execution
- the same action may be safe in one state and costly in another
- the agent is rewarded for both profit and defensive behavior
MEVerse is an OpenEnv environment with three difficulty levels:
easy: passive market, lower volatility, no aggressive adversarymedium: JIT-liquidity behavior appearshard: adaptive adversary and higher volatility
The environment tracks:
- pool price and active liquidity
- local tick distribution around the current price
- agent balances and LP positions
- visible mempool transactions
- recent MEV loss
- episode progress and task type
The agent can:
swap_exact_insplit_swapadd_liquidityremove_liquidityrange_orderjit_liquidityholdclose_episode
At each step:
- the agent receives the current market and portfolio state
- it chooses an action
- the environment simulates trade execution, LP updates, and adversarial behavior
- the agent receives a reward and the next observation
The reward is dense, not just pass/fail. It reflects:
- execution quality
- portfolio improvement
- LP fee capture
- MEV damage avoided or suffered
- end-of-episode performance
At the end of an episode, the environment also returns a normalized deterministic grade in [0, 1].
The current implementation already includes the base project structure and working environment logic:
- typed OpenEnv models for action and observation
- a MEVerse environment server
- a Python client for interacting with the environment
- task switching across
easy,medium, andhard - local validation through
openenv validate - a baseline inference runner using the OpenAI client
- placeholder
.envconfiguration for model access
The current base logic supports:
- swap execution
- liquidity add/remove flows
- range-style LP positioning
- JIT-liquidity simulation
- MEV-aware step rewards
- bounded invalid-action penalties with surfaced error metadata
- deterministic grading output
There is now one canonical README for the whole project at the repo root.
Main files and folders:
.
├── README.md
├── .env
├── app.py
├── client.py
├── inference.py
└── meverse/
├── __init__.py
├── client.py
├── models.py
├── openenv.yaml
├── pyproject.toml
├── Dockerfile
└── server/
├── app.py
└── meverse_environment.py
Validate the environment:
cd meverse
openenv validateRun the baseline inference from the repo root:
python inference.pyCurrent inference configuration is read from:
API_BASE_URLMODEL_NAMEHF_TOKENLOCAL_IMAGE_NAMEorMEVERSE_BASE_URLMEVERSE_TASK
This project is best understood as an RL-style evaluation environment for decision-making under adversarial market conditions.
The model is not being judged on memorizing DeFi trivia. It is being judged on whether it can:
- interpret structured market state
- choose sensible multi-step actions
- avoid predictable exploitation
- perform better as task difficulty increases
This README intentionally focuses on the base implementation now in the repo. It explains the problem, the environment, and the current working structure without overloading the reader with deeper protocol math.
More detailed technical documentation can be added once the next implementation layer is complete.