Strategy 1 is a simple trading strategy where we task the model with buying and selling MSFT stock at predetermined times. We take Databento's market-by-price L10 data for MSFT. We then randomize the volume available at each price level to enforce correct synthetic-book reservation. If we schedule a trade for 10 shares and only 8 exist in the book, the model must only take 8. This caps the max volume available in the book.
The strategy tracks: Cash and MSFT position; Realized P&L using FIFO accounting; Unrealized P&L based on raw-book mid-prices; An equity curve and maximum drawdown; Synthetic-book statistics such as total size available and bid/ask VWAP post model trades.
Loading chart data...
| Model | pass@3↓ | pass@1↓ | Mean MAE (solved)↓ | Best run MAE↓ | Avg. attempts↓ |
|---|---|---|---|---|---|
| gemini-3-pro-preview | 1.00 | 1.00 | 14.83 | 14.83 | 1.00 |
| claude-sonnet-4.5 | 1.00 | 1.00 | 16.36 | 16.35 | 1.00 |
| mistral-large-2512 | 1.00 | 1.00 | 361.87 | 23.01 | 1.00 |
| gpt-5.1-codex-max | 1.00 | 1.00 | 844.74 | 0.002 | 1.00 |
| llama-4-maverick | 1.00 | 1.00 | 4,137.62 | 170.06 | 1.00 |
| deepseek-v3.2 | 1.00 | 0.80 | 133.63 | 7.26 | 1.40 |
| qwen3-max | 1.00 | 0.80 | 2,388.18 | 16.39 | 1.20 |
| grok-4 | 0.80 | 0.00 | 59.35 | 7.22 | 2.25 |
| claude-opus-4.5 | 0.60 | 0.40 | 14.10 | 7.18 | 1.33 |
| llama-3.1-nemotron-ultra | 0.60 | 0.40 | 8,799.23 | 111.63 | 1.67 |
| nova-premier-v1 | 0.40 | 0.20 | 688.57 | 171.13 | 1.50 |
| command-a | 0.20 | 0.00 | 630.21 | 630.21 | 3.00 |