# ShopStream Premium — Data Dictionary

One business world ("ShopStream", an online retailer) studying the causal
effect of **Premium membership** on **monthly spend ($)**, delivered as two
linked tables. Same markets, same program, same true effect — each design
just consumes a different *shape* of the data.

---

## `shopstream_customers.csv` — one row per customer (N = 8,000)
Used for the **A/B test** and the **RDD**.

| Column | Meaning | Used by |
|---|---|---|
| `customer_id` | unique id | — |
| `market` | one of 16 markets (e.g. `WE-2`); nested in region | links to panel |
| `region` | Northeast / Midwest / South / West | control |
| `age` | years (18–80) | control |
| `gender` | Female / Male / Nonbinary | control |
| `female` | 1 if Female | control |
| `income_k` | annual income, $000 | control / effect modifier |
| `tenure_months` | months as a customer | control |
| `urbanicity` | urban / suburban / rural | control |
| `channel` | app / web / store / referral | control |
| `app_user` | 1 if primarily uses the app | **effect modifier** |
| `in_pilot` | 1 if customer was part of the Phase-1 A/B pilot | **A/B filter** |
| `pilot_treat` | 1 = randomly given Premium, 0 = control (blank if not in pilot) | **A/B treatment** |
| `spend_ab` | monthly spend during the pilot ($) (blank if not in pilot) | **A/B outcome** |
| `loyalty_points` | running variable for the RDD | **RDD running var** |
| `premium_by_points` | 1 if `loyalty_points` >= 1000 (Premium auto-granted) | **RDD treatment** |
| `spend_rdd` | monthly spend in the threshold era ($) | **RDD outcome** |

**A/B recipe:** filter `in_pilot == 1`, regress `spend_ab ~ pilot_treat`.
**RDD recipe:** keep a bandwidth around 1000, regress `spend_rdd ~ premium_by_points * (loyalty_points - 1000)`.

---

## `shopstream_panel.csv` — one row per market × month (16 × 24 = 384)
Used for the **DiD**.

| Column | Meaning |
|---|---|
| `market` | market id (links to customer file) |
| `region` | region |
| `month` | 1–24 (program launched in treated markets at month 13) |
| `treated_market` | 1 if this market eventually rolled out Premium |
| `post` | 1 if `month >= 13` |
| `avg_spend` | average monthly spend per customer in that market-month ($) |
| `avg_income_k` | market average income (rolled up from customers) |
| `pct_app` | market share of app users (rolled up from customers) |

**DiD recipe:** regress `avg_spend ~ treated_market * post`; the
`treated_market:post` interaction is the estimate.

---

## The single true number
The **true causal effect of Premium is +$18/month** (slightly larger for app
users, smaller for others, averaging to $18). Every design is trying to
recover this. The A/B test is the only one guaranteed to be unbiased — use it
as the benchmark to judge the other two.
