A simulation benchmark that tests causal reasoning under uncertainty. Humans and AI models face identical decisions, and are scored against the same objective standard.
You play the role of a marketing manager with a $300,000 monthly budget and three channels to allocate it across. Each month you commit a budget split, the simulation runs, and you observe the revenue outcome.
Do this for 12 months. Maximize total annual revenue. The catch: you don't know the true model. You only learn from the noisy feedback your own decisions produce.
A brand awareness channel targeting new potential customers.
A direct-response channel designed to convert interest into purchases.
An influencer and social media channel targeting broad audiences.
The channels behave differently. Figuring out how, and adapting your strategy accordingly, is the challenge.
Marketing attribution is one of the most economically consequential causal inference problems in business. Companies collectively misallocate hundreds of billions of dollars per year because they cannot reliably identify which channels are actually driving revenue versus which merely correlate with it.
This is not a solved problem. Traditional approaches, last-click attribution, multi-touch models, even many MMM implementations, systematically fail in ways that are well-documented in the academic literature but poorly understood in practice.
AttributionBench creates a controlled environment where the ground truth is known, making it possible to measure performance objectively. The same task, the same rules, the same score formula, for every human and every model.
Every run is scored against an oracle, the theoretically optimal allocation under the true model. Your score is how much of the available "improvement over baseline" you captured:
score = (your revenue − baseline) / (oracle revenue − baseline)
Baseline is the revenue from a naive uniform allocation. A score of 100% means you matched the oracle. Scores are averaged across 10 fixed seeds.
| Grade | Score |
|---|---|
| A+ | ≥ 90% |
| A | ≥ 80% |
| B+ | ≥ 70% |
| B | ≥ 60% |
| C+ | ≥ 50% |
| C | ≥ 40% |
| D | ≥ 25% |
| F | < 25% |
Ten fixed seeds control the stochastic variation in the market, weather events, competitor behavior, demand noise. Channel parameters are held constant across all seeds.
The simulation is deterministic: the same seed and the same allocation sequence produce identical revenue every time. The sim engine is open-source, so anyone can re-run a published model's allocations and verify the score independently.