Skip to main content

Command Palette

Search for a command to run...

Building an Open MEV Simulation Engine for DeFi Arbitrage

Published
8 min read
Building an Open MEV Simulation Engine for DeFi Arbitrage
A
I'm Alessio Giannini, a Blockchain & MEV Engineer with a background in enterprise software development, now focused on DeFi, MEV, protocol economics, and execution security across EVM and Substrate-based chains. I build open source tooling for MEV analysis and cross-chain data collection under xchain-mev-research, and I work with protocols and operators on understanding and mitigating MEV exposure in production systems. Graduate of the Polkadot Blockchain Academy (Hong Kong, 2024). I write about what I build and what I find — MEV mechanics, DeFi infrastructure, cross-chain architecture, and the occasional deep dive into protocol internals

I built this because I needed it. As a MEV searcher, reaching a point where you want to reason seriously about arbitrage opportunities means you need a simulation engine: something that runs exact AMM math, handles price impact across sequential hops, and tells you the actual PnL of a route before you commit capital.

Tools of that kind exist, but the people who build them tend to keep them private. An arbitrage engine is direct competitive edge, and open-sourcing it means giving it away. The result is that the open ecosystem has simplified simulators, single-AMM implementations, or tools that paper over the hard parts.

xchain-mev-simulator is not a complete production system: it is an open simulation pipeline covering route discovery, exact AMM math, cumulative price impact, flash loan planning, and the decision layer design that sits above all of it.

This article walks through the technical core. The probabilistic decision model, which handles EV estimation, capital allocation, and cross-chain execution risk, has enough depth to deserve its own article and I will cover it next.

Cross-Chain by Design

The simulator is architected for multi-chain operation. Each chain gets its own isolated set of components: an IMarketRegistry for static pool topology, an IMarketState for the dynamic snapshot at a given block, and a FlashLoanManager for loan planning. A FlashLoanManagerRegistry maps each chain to its manager by identity, so any part of the pipeline that needs to plan loans for a specific chain looks it up without coupling to a concrete implementation.

The current implementation covers Moonbeam: StellaSwap V2, V3, and V4, Beamswap V2 and V3, and the Nimbus stDOT vault. Hydration integration is planned.

Market State and Sub-Block Granularity

The simulator consumes data from xchain-dex-indexer, which indexes pool snapshots at sub-block granularity via the afterTxIndex field. This means the market state at transaction 12 of block N is distinct from the state at transaction 35 of the same block.

For MEV analysis this is the relevant unit of measurement. It allows simulating backrunning precisely, comparing what a route returns when positioned at the beginning, middle, or end of a block, and reasoning about the value of transaction ordering within a block rather than just across blocks.

Route Discovery

Route discovery runs a DFS across the graph of all available pool pairs. Each directed edge in the graph represents a swap: token A to token B through a specific pool. The generator composes these edges into circular routes, sequences that begin and end on the same token, which is the structural requirement for arbitrage without directional inventory exposure.

The search is configurable on two axes: maximum depth (number of hops) and whether the same token can appear more than once in a route. Allowing repeated tokens increases the candidate set substantially. On the full Moonbeam pool set, across all supported protocols, the numbers look like this:

Max depth Total routes
2 108
3 348
4 4,036
5 18,936
6 157,224

Restricting to a single starting token reduces the space significantly. Starting from DOT with depth 4 and no repeated tokens, the generator produces 968 routes from 123 pairs. Allowing repeated tokens brings this to 1,212. The size of the candidate set is what makes a fast screening pass necessary before running full tick math.

The two-stage approach is straightforward: first pass with exchange-rate-only mode (fast, no tick crossing), discard unprofitable routes, then run full simulation on the survivors. Exchange-rate mode computes output using the current pool price without simulating tick transitions, which is accurate enough to filter obvious losers and fast enough to run across tens of thousands of candidates.

AMM Math Engines

The simulator implements four AMM types, each as a separate engine:

Uniswap V2 uses the standard constant-product formula with the exact fee model. Output is deterministic given reserves and fee tier, so simulation is a direct calculation with no iteration.

Uniswap V3 and Algebra V3 implement full tick-crossing logic. The engine steps through initialized ticks one by one, consuming liquidity in each active range and crossing into the next when a tick boundary is reached. Gas is estimated per tick crossed, which matters when comparing routes: a route that crosses eight ticks on one hop may be less profitable than a shallower alternative even if the raw AMM output is higher.

LST vault (Nimbus stDOT) uses an exchange-rate model. Staking and unstaking both execute as swaps against the vault at the current stDOT/DOT rate. The vault is included as a node in the route graph, which means routes like DOT → stDOT → xcDOT → DOT are discoverable and simulatable alongside pure AMM routes.

Cumulative Price Impact and State Mutation

When simulating a large set of candidate routes, running full tick math on every hop is expensive. RouteSimulator supports two modes: with state mutation disabled, pool reserves are not updated after each hop, which makes simulation fast and cheap enough to screen thousands of routes quickly. Routes that show no profit in this mode are discarded without ever running the heavier calculation.

For routes that survive screening, mutation is enabled: pool state is updated after each hop, and the HopsOutputCache invalidates cached entries for any pool whose state has changed. This ensures that sequential swaps on the same pool correctly reflect cumulative price impact, and that PnL figures used for ranking and bundle construction are accurate.

Optimal Input Size

Given a profitable route, the probe amount used during screening is arbitrary. It tells you the route direction is correct but not the actual maximum profit or the capital required to capture it.

Finding the optimal input, sStar, is the next step after screening. The approach depends on the route composition:

For routes composed exclusively of V2 pools, the profit function is a rational function of the input. The closed-form derivative can be computed and solved analytically. The two-pool case derivation is documented by Flashbots; the simulator extends it to multi-hop V2 chains using the same symbolic approach.

For routes containing V3, V4, or LST vault hops, tick-crossing logic makes the profit function non-smooth. Tick boundaries introduce discontinuities in the derivative, so closed-form methods do not apply. The planned implementation uses Brent's method on a bracket [minInput, maxInput], with the route simulator as the objective function. Brent's method converges super-linearly near the optimum and handles non-smooth functions correctly, which makes it preferable to golden-section search for this case.

sStar feeds directly into flash loan planning: once the optimal input is known, FlashLoanManager.planLoans() can size the borrow precisely. Without it, either you underborrow and leave profit on the table, or you overborrow and pay unnecessary fees.

Flash Loan Planning

Given a route and an input size, the flash loan planner selects the set of sources that covers the required capital at minimum cost.

The planner aggregates sources across all supported pool types: V2, V3, V4, and stable pools. For V3 and V4 sources, capacity is estimated from the current tick state via V3FlashCapacityEstimator, which derives available liquidity from the tick range around the current price. Source selection follows LowestFeeFirstStrategy: sources are ranked by fee rate and the cheapest set that covers the full required amount is selected. Pools already used as hops in the arbitrage route are excluded from flash loan sources, since borrowing from a pool that the route also swaps through invalidates the price assumptions the route was built on.

One practical constraint is that the optimal input size for a route may exceed what is available across all flash loan sources for that token. In that case the executable size is capped by source capacity, and the actual profit is lower than the theoretical maximum at sStar.

The Decision Layer

Above the simulation pipeline sits a layer that handles capital allocation across profitable routes under real execution uncertainty: inclusion probability, price survival between submission and execution, MEV tax from competing searchers, and tail risk via CVaR95.

This layer is written as structured pseudocode, readable as a specification rather than runnable code. It is useful on its own to understand what a professional cross-chain MEV bot actually has to reason about, and how much complexity sits above the simulation math. I will cover it in detail in the next article.

Status

The simulation layer is complete and tested: AMM math for V2, V3, V4, and the stDOT vault, route discovery and screening, flash loan planning with fee minimization. The integrity test suite validates simulation output against official DEX subgraph data.

What remains in the roadmap: sStar computation for mixed routes via Brent's method, non-conflict bundle construction and the full historical simulation pipeline for backtesting and model calibration.

The project is open source under MIT. If you work on MEV, DeFi infrastructure, or AMM math, contributions are welcome.

Repository: xchain-mev-simulator Indexer: xchain-dex-indexer


This article is part of the series Building a Cross-Chain MEV Bot. Next up: simulation tells you the PnL of a route. It does not tell you whether to take the trade. The next article covers the decision layer — how a searcher reasons under uncertainty, sizes positions, and decides when not to act.