What is AttributionBench?

A simulation benchmark that tests causal reasoning under uncertainty. Humans and AI models face identical decisions, and are scored against the same objective standard.

The Task

You play the role of a marketing manager with a $300,000 monthly budget and three channels to allocate it across. Each month you commit a budget split, the simulation runs, and you observe the revenue outcome.

Do this for 12 months. Maximize total annual revenue. The catch: you don't know the true model. You only learn from the noisy feedback your own decisions produce.

Discovery

A brand awareness channel targeting new potential customers.

Conversion

A direct-response channel designed to convert interest into purchases.

Social

An influencer and social media channel targeting broad audiences.

The channels behave differently. Figuring out how, and adapting your strategy accordingly, is the challenge.

Why It Matters

Marketing attribution is one of the most economically consequential causal inference problems in business. Companies collectively misallocate hundreds of billions of dollars per year because they cannot reliably identify which channels are actually driving revenue versus which merely correlate with it.

This is not a solved problem. Traditional approaches, last-click attribution, multi-touch models, even many MMM implementations, systematically fail in ways that are well-documented in the academic literature but poorly understood in practice.

AttributionBench creates a controlled environment where the ground truth is known, making it possible to measure performance objectively. The same task, the same rules, the same score formula, for every human and every model.

Scoring

Every run is scored against an oracle, the theoretically optimal allocation under the true model. Your score is how much of the available "improvement over baseline" you captured:

score = (your revenue − baseline) / (oracle revenue − baseline)

Baseline is the revenue from a naive uniform allocation. A score of 100% means you matched the oracle. Scores are averaged across 10 fixed seeds.

GradeScore
A+≥ 90%
A≥ 80%
B+≥ 70%
B≥ 60%
C+≥ 50%
C≥ 40%
D≥ 25%
F< 25%

Reproducibility

Ten fixed seeds control the stochastic variation in the market, weather events, competitor behavior, demand noise. Channel parameters are held constant across all seeds.

The simulation is deterministic: the same seed and the same allocation sequence produce identical revenue every time. The sim engine is open-source, so anyone can re-run a published model's allocations and verify the score independently.

Get Started

🎮

Benchmark Yourself

Run the task as a human and get on the leaderboard alongside AI models.

👁️

AI Replays

Step through recorded model runs and read their reasoning in real time.