migrate budget allocation plots #2531
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2531 +/- ##
=======================================
Coverage 93.85% 93.86%
=======================================
Files 90 91 +1
Lines 13964 14011 +47
=======================================
+ Hits 13106 13151 +45
- Misses 858 860 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
@juanitorduz ready for review |
PR SummaryMedium Risk Overview Refactors repeated time-series rendering into a shared Updates Reviewed by Cursor Bugbot for commit c520d9e. Bugbot is set up for automated code reviews on this repo. Configure here. |
2feba1b to
e2a44b0
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: ROAS denominator wrong when carryover is enabled
- Changed ROAS calculation to use actual channel spend from the dataset instead of allocation * total_dates, which correctly accounts for carryover periods with zero spend.
- ✅ Fixed: Inconsistent sample dimension validation across methods
- Removed the unnecessary 'sample' dimension requirement from contribution_over_time to match allocation_roas behavior, allowing both (chain, draw) and sample dimension formats.
Or push these changes by commenting:
@cursor push ec507be1e3
Preview (ec507be1e3)
diff --git a/pymc_marketing/mmm/plotting/budget.py b/pymc_marketing/mmm/plotting/budget.py
--- a/pymc_marketing/mmm/plotting/budget.py
+++ b/pymc_marketing/mmm/plotting/budget.py
@@ -107,9 +107,16 @@
**pc_kwargs,
)
- n_periods = len(samples.date)
- roas_da = samples["channel_contribution_original_scale"].sum("date") / (
- samples["allocation"] * n_periods
+ # Compute total spend per channel from actual channel data in the dataset
+ # This correctly handles carryover periods (which have zero spend)
+ channels = list(samples["allocation"].channel.values)
+ channel_spend = xr.concat([samples[ch] for ch in channels], dim="channel")
+ channel_spend = channel_spend.assign_coords(channel=channels)
+ total_spend_per_channel = channel_spend.sum("date")
+
+ roas_da = (
+ samples["channel_contribution_original_scale"].sum("date")
+ / total_spend_per_channel
)
roas_da.name = "roas"
@@ -175,7 +182,7 @@
-------
tuple[Figure, NDArray[Axes]] or PlotCollection
"""
- for dim in ("channel", "date", "sample"):
+ for dim in ("channel", "date"):
if dim not in samples.dims:
raise ValueError(
f"Expected '{dim}' dimension in samples, but none found."You can send follow-ups to the cloud agent here.
…ution Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_distribution Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e2a44b0 to
f9d6548
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: AI agent planning documents accidentally committed to repo
- Deleted the docs/superpowers/ directory containing AI agent development workflow artifacts that were not user-facing documentation.
- ✅ Fixed: Redundant variable lookup after exact-name validation
- Replaced redundant substring-based variable search with direct access to the validated variable name.
Or push these changes by commenting:
@cursor push 04b5e3901d
Preview (04b5e3901d)
diff --git a/docs/superpowers/plans/2026-05-07-total-allocation-roas.md b/docs/superpowers/plans/2026-05-07-total-allocation-roas.md
deleted file mode 100644
--- a/docs/superpowers/plans/2026-05-07-total-allocation-roas.md
+++ /dev/null
@@ -1,288 +1,0 @@
-# total_allocation in sample_response_distribution Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Add a `total_allocation` data variable to `sample_response_distribution` output, and update `allocation_roas` to use it as the ROAS denominator instead of `allocation * len(samples.date)`.
-
-**Architecture:** Three focused changes — update test fixtures first so they include `total_allocation`, then update `allocation_roas` validation + denominator, then add `total_allocation` to `sample_response_distribution` output. TDD throughout.
-
-**Tech Stack:** Python, xarray, PyMC-Marketing `MultiDimensionalBudgetOptimizerWrapper`, `BudgetPlots`
-
----
-
-## File Map
-
-| File | Change |
-|------|--------|
-| `tests/mmm/plotting/test_budget.py` | Add `total_allocation` to two fixtures; add `test_missing_total_allocation_raises` |
-| `pymc_marketing/mmm/plotting/budget.py` | Add `total_allocation` validation; replace denominator in `allocation_roas` |
-| `tests/mmm/test_budget_optimizer_multidimensional.py` | Add `test_sample_response_distribution_includes_total_allocation` |
-| `pymc_marketing/mmm/multidimensional.py` | Add `total_allocation` to `constant_data` in `sample_response_distribution` |
-
----
-
-## Task 1: Update `allocation_roas` (test first)
-
-**Files:**
-- Test: `tests/mmm/plotting/test_budget.py`
-- Modify: `pymc_marketing/mmm/plotting/budget.py`
-
-### Step 1a: Update `simple_allocation_samples` fixture to include `total_allocation`
-
-In `tests/mmm/plotting/test_budget.py`, replace the `simple_allocation_samples` fixture (lines 43–65) with:
-
-```python
-@pytest.fixture(scope="module")
-def simple_allocation_samples(channels) -> xr.Dataset:
- """xr.Dataset with channel_contribution_original_scale + allocation, no extra dims."""
- rng = np.random.default_rng(SEED)
- n_sample, n_date = 80, 20
- dates = np.arange(n_date)
- contrib = rng.uniform(100, 500, (n_sample, n_date, len(channels)))
- alloc = rng.uniform(1000, 5000, len(channels))
- return xr.Dataset(
- {
- "channel_contribution_original_scale": xr.DataArray(
- contrib,
- dims=("sample", "date", "channel"),
- coords={
- "sample": np.arange(n_sample),
- "date": dates,
- "channel": channels,
- },
- ),
- "allocation": xr.DataArray(
- alloc,
- dims=("channel",),
- coords={"channel": channels},
- ),
- "total_allocation": xr.DataArray(
- alloc * n_date,
- dims=("channel",),
- coords={"channel": channels},
- ),
- }
- )
-```
-
-### Step 1b: Update `panel_allocation_samples` fixture to include `total_allocation`
-
-Replace the `panel_allocation_samples` fixture (lines 68–93) with:
-
-```python
-@pytest.fixture(scope="module")
-def panel_allocation_samples(channels) -> xr.Dataset:
- """xr.Dataset with geo extra dim for panel tests."""
- rng = np.random.default_rng(SEED + 1)
- n_sample, n_date = 80, 20
- geos = ["CA", "NY"]
- dates = np.arange(n_date)
- contrib = rng.uniform(100, 500, (n_sample, n_date, len(geos), len(channels)))
- alloc = rng.uniform(1000, 5000, (len(geos), len(channels)))
- return xr.Dataset(
- {
- "channel_contribution_original_scale": xr.DataArray(
- contrib,
- dims=("sample", "date", "geo", "channel"),
- coords={
- "sample": np.arange(n_sample),
- "date": dates,
- "geo": geos,
- "channel": channels,
- },
- ),
- "allocation": xr.DataArray(
- alloc,
- dims=("geo", "channel"),
- coords={"geo": geos, "channel": channels},
- ),
- "total_allocation": xr.DataArray(
- alloc * n_date,
- dims=("geo", "channel"),
- coords={"geo": geos, "channel": channels},
- ),
- }
- )
-```
-
-- [ ] **Step 1c: Write the failing test for missing `total_allocation`**
-
-Add this test inside `class TestAllocationRoas` in `tests/mmm/plotting/test_budget.py`, after the existing `test_missing_channel_dim_raises` test:
-
-```python
-def test_missing_total_allocation_raises(self):
- from pymc_marketing.mmm.plotting.budget import BudgetPlots
-
- rng = np.random.default_rng(SEED)
- channels = ["tv", "radio"]
- bad_samples = xr.Dataset(
- {
- "channel_contribution_original_scale": xr.DataArray(
- rng.uniform(0, 1, (10, 5, 2)),
- dims=("sample", "date", "channel"),
- coords={"channel": channels},
- ),
- "allocation": xr.DataArray(
- [1000.0, 2000.0],
- dims=("channel",),
- coords={"channel": channels},
- ),
- }
- )
- with pytest.raises(ValueError, match="total_allocation"):
- BudgetPlots().allocation_roas(bad_samples)
-```
-
-- [ ] **Step 1d: Run the new test to confirm it fails**
-
-```bash
-conda run -n pymc-marketing-dev pytest tests/mmm/plotting/test_budget.py::TestAllocationRoas::test_missing_total_allocation_raises -v
-```
-
-Expected: FAIL — `ValueError` is not raised because the validation doesn't exist yet.
-
-- [ ] **Step 1e: Implement the `total_allocation` validation and new denominator in `allocation_roas`**
-
-In `pymc_marketing/mmm/plotting/budget.py`, inside `allocation_roas`:
-
-1. After the `"channel" not in samples.dims` check (line 100–101), add:
-
-```python
-if "total_allocation" not in samples:
- raise ValueError(
- "Expected 'total_allocation' variable in samples, but none found."
- )
-```
-
-2. Replace lines 110–113:
-
-```python
-n_periods = len(samples.date)
-roas_da = samples["channel_contribution_original_scale"].sum("date") / (
- samples["allocation"] * n_periods
-)
-```
-
-With:
-
-```python
-roas_da = samples["channel_contribution_original_scale"].sum("date") / samples["total_allocation"]
-```
-
-- [ ] **Step 1f: Run the full `TestAllocationRoas` test class**
-
-```bash
-conda run -n pymc-marketing-dev pytest tests/mmm/plotting/test_budget.py::TestAllocationRoas -v
-```
-
-Expected: ALL PASS (fixtures now include `total_allocation`, new validation test passes).
-
-- [ ] **Step 1g: Run pre-commit and commit**
-
-```bash
-conda run -n pymc-marketing-dev pre-commit run --files pymc_marketing/mmm/plotting/budget.py tests/mmm/plotting/test_budget.py
-git add pymc_marketing/mmm/plotting/budget.py tests/mmm/plotting/test_budget.py
-git commit -m "feat(budget): use total_allocation as ROAS denominator"
-```
-
----
-
-## Task 2: Add `total_allocation` to `sample_response_distribution` (test first)
-
-**Files:**
-- Test: `tests/mmm/test_budget_optimizer_multidimensional.py`
-- Modify: `pymc_marketing/mmm/multidimensional.py`
-
-- [ ] **Step 2a: Write the failing test**
-
-Add this standalone test function to `tests/mmm/test_budget_optimizer_multidimensional.py` (after the existing `test_budget_distribution_carryover_interaction_issue` function):
-
-```python
-@compile_kwargs
-def test_sample_response_distribution_includes_total_allocation(
- dummy_df, fitted_mmm, compile_kwargs
-):
- """total_allocation == allocation * num_periods, regardless of include_carryover."""
- _df_kwargs, X_dummy, _y_dummy = dummy_df
-
- fitted_mmm.add_original_scale_contribution_variable(["channel_contribution"])
-
- num_periods = 4
- optimizable_model = MultiDimensionalBudgetOptimizerWrapper(
- model=fitted_mmm,
- start_date=X_dummy["date_week"].max() + pd.Timedelta(weeks=1),
- end_date=X_dummy["date_week"].max() + pd.Timedelta(weeks=num_periods),
- compile_kwargs=compile_kwargs,
- )
-
- allocation_strategy = xr.DataArray(
- np.full((2, 2), 10.0),
- dims=["channel", "geo"],
- coords={
- "channel": ["channel_1", "channel_2"],
- "geo": ["A", "B"],
- },
- )
-
- for include_carryover in [False, True]:
- result = optimizable_model.sample_response_distribution(
- allocation_strategy=allocation_strategy,
- include_carryover=include_carryover,
- )
- assert "total_allocation" in result, (
- f"total_allocation missing from output with include_carryover={include_carryover}"
- )
- expected = allocation_strategy * optimizable_model.num_periods
- xr.testing.assert_allclose(result["total_allocation"], expected)
-```
-
-- [ ] **Step 2b: Run the test to confirm it fails**
-
-```bash
-conda run -n pymc-marketing-dev pytest tests/mmm/test_budget_optimizer_multidimensional.py::test_sample_response_distribution_includes_total_allocation -v
-```
-
-Expected: FAIL — `AssertionError: total_allocation missing from output`.
-
-- [ ] **Step 2c: Implement `total_allocation` in `sample_response_distribution`**
-
-In `pymc_marketing/mmm/multidimensional.py`, replace line 3987:
-
-```python
-constant_data = allocation_strategy.to_dataset(name="allocation")
-```
-
-With:
-
-```python
-constant_data = xr.merge(
- [
- allocation_strategy.to_dataset(name="allocation"),
- (allocation_strategy * self.num_periods).to_dataset(name="total_allocation"),
- ]
-)
-```
-
-- [ ] **Step 2d: Run the new test to confirm it passes**
-
-```bash
-conda run -n pymc-marketing-dev pytest tests/mmm/test_budget_optimizer_multidimensional.py::test_sample_response_distribution_includes_total_allocation -v
-```
-
-Expected: PASS.
-
-- [ ] **Step 2e: Run the broader test suite to check for regressions**
-
-```bash
-conda run -n pymc-marketing-dev pytest tests/mmm/plotting/test_budget.py tests/mmm/test_budget_optimizer_multidimensional.py -v --tb=short 2>&1 | tail -30
-```
-
-Expected: ALL PASS.
-
-- [ ] **Step 2f: Run pre-commit and commit**
-
-```bash
-conda run -n pymc-marketing-dev pre-commit run --files pymc_marketing/mmm/multidimensional.py tests/mmm/test_budget_optimizer_multidimensional.py
-git add pymc_marketing/mmm/multidimensional.py tests/mmm/test_budget_optimizer_multidimensional.py
-git commit -m "feat(mmm): add total_allocation to sample_response_distribution output"
-```
\ No newline at end of file
diff --git a/docs/superpowers/specs/2026-05-07-total-allocation-roas-design.md b/docs/superpowers/specs/2026-05-07-total-allocation-roas-design.md
deleted file mode 100644
--- a/docs/superpowers/specs/2026-05-07-total-allocation-roas-design.md
+++ /dev/null
@@ -1,93 +1,0 @@
-# Design: Add `total_allocation` to `sample_response_distribution` output
-
-**Date:** 2026-05-07
-**Branch:** `isofer/2525-migrate-budget-allocation-plots`
-
-## Problem
-
-`allocation_roas` in `BudgetPlots` computes the ROAS denominator as:
-
-```python
-n_periods = len(samples.date)
-roas_da = samples["channel_contribution_original_scale"].sum("date") / (
- samples["allocation"] * n_periods
-)
-```
-
-`allocation` is the per-period (weekly) spend per channel. When `include_carryover=True`,
-`sample_response_distribution` appends `l_max` extra zeroed-out dates to the dataset for
-adstock tail computation. This means `len(samples.date)` equals `num_budget_periods + l_max`,
-overcounting the denominator and producing an artificially low ROAS.
-
-## Solution
-
-Add a `total_allocation` data variable to the output of `sample_response_distribution`.
-This is the ground-truth total spend per channel (and any extra dims like geo) over the
-budget period, independent of carryover.
-
-`total_allocation = allocation_strategy * self.num_periods`
-
-`self.num_periods` is already correctly computed in `MultiDimensionalBudgetOptimizerWrapper.__init__`
-as `total_dataset_dates - adstock.l_max`, so it represents the actual budget period count
-regardless of `include_carryover`.
-
-Update `allocation_roas` to use `total_allocation` directly as the denominator.
-
-## Files Changed
-
-| File | Change |
-|------|--------|
-| `pymc_marketing/mmm/multidimensional.py` | Add `total_allocation` to `constant_data` in `sample_response_distribution` |
-| `pymc_marketing/mmm/plotting/budget.py` | Update `allocation_roas` to validate and use `total_allocation` |
-| `tests/mmm/plotting/test_budget.py` | Add `total_allocation` to fixtures; add missing-var test |
-
-## Detailed Changes
-
-### 1. `sample_response_distribution` — multidimensional.py:3987
-
-Replace:
-```python
-constant_data = allocation_strategy.to_dataset(name="allocation")
-```
-
-With:
-```python
-constant_data = xr.merge([
- allocation_strategy.to_dataset(name="allocation"),
- (allocation_strategy * self.num_periods).to_dataset(name="total_allocation"),
-])
-```
-
-`total_allocation` inherits the same dims as `allocation` — `(channel,)` or `(channel, geo, ...)`.
-It is unaffected by `include_carryover` and `budget_distribution_over_period` because
-`allocation * num_periods` equals the total budget regardless of how it is distributed
-across periods (distribution sums to 1).
-
-### 2. `allocation_roas` — budget.py
-
-**Validation:** Add a check for `total_allocation` alongside the existing checks.
-
-**ROAS computation:** Replace:
-```python
-n_periods = len(samples.date)
-roas_da = samples["channel_contribution_original_scale"].sum("date") / (
- samples["allocation"] * n_periods
-)
-```
-
-With:
-```python
-roas_da = samples["channel_contribution_original_scale"].sum("date") / samples["total_allocation"]
-```
-
-### 3. Tests — test_budget.py
-
-- Add `total_allocation` to `simple_allocation_samples` and `panel_allocation_samples` fixtures
- (same value as `allocation * n_date` to preserve existing test semantics).
-- Add a test `test_missing_total_allocation_raises` for the new validation.
-
-## Invariants
-
-- `include_carryover=True/False` does not change `total_allocation`.
-- `budget_distribution_over_period` does not change `total_allocation`.
-- `total_allocation.dims == allocation.dims` (no `date` dim).
\ No newline at end of file
diff --git a/pymc_marketing/mmm/plotting/budget.py b/pymc_marketing/mmm/plotting/budget.py
--- a/pymc_marketing/mmm/plotting/budget.py
+++ b/pymc_marketing/mmm/plotting/budget.py
@@ -186,7 +186,7 @@
)
if "channel_contribution_original_scale" not in samples.data_vars:
raise ValueError(
- "Expected a variable containing 'channel_contribution' in samples, "
+ "Expected 'channel_contribution_original_scale' variable in samples, "
"but none found."
)
@@ -197,10 +197,7 @@
**pc_kwargs,
)
- contrib_var = next(
- v for v in samples.data_vars if "channel_contribution_original_scale" in v
- )
- da = _select_dims(samples[contrib_var], dims)
+ da = _select_dims(samples["channel_contribution_original_scale"], dims)
extra_dims = [
dYou can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit f9d6548. Configure here.
- Remove accidentally committed AI agent planning documents - Fix redundant variable lookup in contribution_over_time method - Use direct variable access after validation instead of substring search Applied via @cursor push command
|
@juanitorduz @carlosagostini ready for review |
juanitorduz
left a comment
There was a problem hiding this comment.
Looks very nice! I left two comments :)
| roas_da = ( | ||
| samples["channel_contribution_original_scale"].sum("date") | ||
| / samples["total_allocation"] |
There was a problem hiding this comment.
Can we use the incrementality module to compute roas?
There was a problem hiding this comment.
It's not clear to me if it's the right thing to do. It depends on what type of plots or analysis we want to provide for the user.
The current workflow and plots supports answering the following questions:
- if using a given allocation, what would be the future total ROAs for each channel?
- if using a given allocation, what would be the contribution of each channel at each point in time?
with the current plan, after we finish the work, the user will use the following workflow to answer the questions:
allocation_xarray, res_scipy = optimizable_model.optimize_budget(...)
sample_response_given_allocation = optimizable_model.sample_response_distribution(allocation_xarray, ...)
optimizable_model.plot..allocation_roas(sample_response_given_allocation, ...)
This is working well now.
If we want to use Incrementality here, we would need to update the Incrementatly module to support arbitraty allocation. Then we can replace the call of optimizable_model.sample_response_distribution, but we won't gain anything since it would give the same answer.
So there's no point in doing that.
I believe, that the only reason to bring in Incrementality, is if we want to answer the question: if using a given allocation, what would be the ROAs of each channel at different points in time in the future (like each month)?
Then we would need to use incrementally, since it can correctly compute the effect of carryover on each point in time.
The workflow then will look like:
allocation_xarray, res_scipy = optimizable_model.optimize_budget(...)
roas = mmm.incrementality.contribution_over_spend(
(allocation_xarray, ...)
optimizable_model.plot.roas_over_time(roas)
This would require to change incrementality to support arbitrary allocation, and adding plot.roas_over_time. This is not a lot of work, but the question is if this is useful for our users.
So thee main questions that need to be answered is will this be useful for our users. i.e. When they allocate budgets, which is usually at the quarter level, do they want to see what is the predicted ROAs for each month?
Another related question is that the budget allocation
- AFAIK, The ROAs computation now and the budget optimizer overall doesn't take into account the carryover affect of the budgets that were used before the allocation period. So if you optimize for Q2, it ignores any effect from Q1. Do we want to address that (both at the budget optimization code, and the ROAs calculation code)?
There was a problem hiding this comment.
Thanks @isofer !
-
Regarding this point. I think it would be nice to have the incrementality module so that we are consistent. I understand the extra work, so we can tackle this in another PR (please create an issue 🙏 )
-
I am not sure if it does it. It would be important to do it, but again, we can tackle this in another PR then :) (please create and issue 🙏 )
…com:pymc-labs/pymc-marketing into isofer/2525-migrate-budget-allocation-plots


Description
Adds a
BudgetPlotsnamespace class to the MMM plotting package with two methods for visualising budget allocation optimisation results. The class follows the same namespace pattern already used byDecompositionPlots,DiagnosticsPlots,SensitivityPlots, andTransformationPlots.allocation_roasreplacesMMMPlotSuite.budget_allocation. The old method was a side-by-side bar chart showing allocated spend vs channel contribution per channel. Two problems: (1) it plotted the posterior mean of a scaled contribution, discarding all uncertainty; (2) the reason users called it was to judge whether each channel's return justified its spend — which is exactly ROAS. The new method makes that intent explicit by computing ROAS directly (sum(channel_contribution_original_scale) / allocation) and showing the full posterior distribution as a forest plot with HDI bars, so under- and over-performing channels are immediately visible with their uncertainty.contribution_over_timereplacesMMMPlotSuite.allocated_contribution_by_channel_over_time.The time-series rendering logic previously inlined in
DecompositionPlots.components_contributionshas been extracted into a shared_plot_timeseries_channelhelper in_helpers.py; bothDecompositionPlotsandBudgetPlotsnow delegate to it.channel_contribution_original_scalehas been added to the defaultvar_namesinMultiDimensionalBudgetOptimizerWrapperso it is available in the returned dataset without callers having to request it explicitly.see demo here
Related Issue
Changes Made
pymc_marketing/mmm/plotting/budget.py— new file:BudgetPlotswithallocation_roasandcontribution_over_timepymc_marketing/mmm/plotting/_helpers.py— extracted_plot_timeseries_channelshared helperpymc_marketing/mmm/plotting/decomposition.py— replaced inline time-series rendering with_plot_timeseries_channelpymc_marketing/mmm/plotting/__init__.py— exportBudgetPlotspymc_marketing/mmm/multidimensional.py— addchannel_contribution_original_scaleto default sampled var namestests/mmm/plotting/test_budget.py— new: 343-line test file covering both methods (happy path, panel/multi-geo, dim subsetting, error cases, visual structure assertions)📚 Documentation preview 📚: https://pymc-marketing--2531.org.readthedocs.build/en/2531/