Meeting Notes

1. Target Definitions

1a. STPM quit probability targets

Source: smktrans package (technical report). File: quit_probability_calibration_targets_20260216_v2.csv.

The STPM estimates annual transition probabilities from current smoker to former smoker using a closed-form demographic accounting formula. The formula adjusts for differential mortality by smoking status, immigration/emigration survivorship, initiation, and relapse:

pquit(a, y, s, d) = f(πcurrent, πformer, l(x), μs(x), pinit, prelapse)

where a = age, y = year, s = sex, d = IMD quintile, π = smoothed HSE proportions (nnet::multinom), l(x) = birth cohort survivorship (HMD + ONS), μs = smoking-status-specific mortality (52 disease RRs from tobalcepi), prelapse = Hawkins 2010 + Jackson 2019 estimates by years since quit.

Calibration targets average these single-year-of-age probabilities into strata:

TableStratificationYearsPurpose
3Sex × Age (25–44, 45–64, 65–74)2011–2016Calibration
4IMD quintile × 3-year period2011–2016Calibration
5Sex × Age2017–2019Validation
6IMD quintile2017–2019Validation

Typical range: 6–12% annual quit probability. Uncertainty: Beta(n=20) assumption with 0.9 correlation between samples (100 draws).

In STPM, a "quit" means transitioning from current smoker to former smoker within one year. There is no quit tunnel in STPM. In the ABM, the verification metric matches this by counting anyone who was a SMOKER in January and is in any quit state at the December snapshot:
pquit,ABM = |{i : stateJan(i) = SMOKER ∧ stateDec(i) ∈ QUIT_STATES}| / |{i : stateJan(i) = SMOKER}|
where QUIT_STATES = {NEWQUITTER, ONGOINGQUITTER1–11, EXSMOKER}.

1b. STS quit attempt rate targets

Source: Smoking Toolkit Study (STS), waves 2007–2019. Pipeline: analysis/sts-quit-targets/.

Outcome variable: trylyc — whether a past-year smoker made ≥1 quit attempt in the past 12 months.

pattempt = P(trylyc = 1 | bState ∈ {2, 3})

where bState 2 = stopped smoking in past year, bState 3 = current smoker. Denominator: past-year smokers only.

Stratified by: Age × Sex × 3-year period, and IMD × 3-year period. Ages 25–74.

Typical range: 20–45%. Multi-stage MICE imputation (SynthSmoke-compliant). IMD imputed via multinomial logit from HSE 2011. Weights post-stratified to ONS 2011 population.

2. Baseline Population (tick 0, 2011)

Synthetic population: 9,842 agents. 2,000 smokers (20.3%), 2,133 ex-smokers, 258 in quit tunnel, 5,451 never-smokers.

2a. State distribution: ABM vs STPM lens

The ABM has a 12-month quit tunnel (NEWQUITTER → ONGOINGQUITTER1–11) before reaching EXSMOKER. STPM has no tunnel: anyone who quits is immediately a former smoker. This means STPM's "former smoker" maps to ABM's tunnel occupants + EXSMOKER.

2b. Smoker demographics

2c. IMD × COM-B variable correlations at baseline

These plots show how the COM-B input variables correlate with IMD quintile at baseline. Q1 = least deprived, Q5 = most deprived. Click to open full-size lightbox.

Maintenance COM-B variables by IMD

Attempt COM-B variables by IMD

3. ABM Output vs Targets (fake-beta-2)

Run config: attempt bias = −4.0 (recalibrated from SEM −2.169), maintenance bias = −0.767 (original SEM intercept, already ~monthly per Harry). cCigConsumptionPrequit frozen at quit start. mNonSmokerSelfIdentity fake beta (0.001) for maintenance. Exemplar agents stratified by IMD × sex (10 agents across 5 IMD quintiles × 2 sexes).

STPM quit rate — IMD facets

STPM quit rate — Age×Sex facets

STS attempt rate — Age×Sex

STS attempt rate — IMD

4. Quit Tunnel Survival — Cohort Analysis

Track all agents who were SMOKER in the 2011 December snapshot (n=2,119) across 29 years.

Result: 0 agents reach EXSMOKER in 29 years of simulation. 1,622 of 2,119 entered the tunnel at least once. Deepest state observed: OQ9 (month 9).

Cohort state over time

Maximum tunnel depth ever reached (December snapshots)

Note: the cohort is defined as SMOKER at the 2011 December snapshot (tick 12), not at baseline (tick 0). The Dec 2011 snapshot has 2,119 smokers vs 2,000 at tick 0. The difference (+119 agents) reflects new 16-year-olds who entered during 2011 and baseline quitters who relapsed back to SMOKER by December.

4b. Agent Journeys

Exemplar agent journeys (click image to open full-size lightbox)

Tick-by-tick variable tracking (maintenance ticks)

Each panel tracks one COM-B variable across all maintenance ticks for each exemplar agent. Only variables that change over time are shown — this reveals whether endogenous variables (addiction decay, cessation aids, non-smoker identity) and exogenous variables (regional prevalence, age transitions) are working as expected. The first panel shows the resulting P(maintenance).

Exemplar Agent Selection Bias

The 10 exemplar agents were selected to be stratified by IMD × sex (one per IMD quintile per sex). Their month-0 P(maintenance) values are compared to the full population below.

Population (n=2000)Exemplar (n=6)
Median monthly P(maint)0.2130.234
Mean monthly P(maint)0.2240.218
Logged mean (simulation)0.254
The logged simulation mean (0.254) differs from the static calculation (0.218) because: (1) regional prevalence is set dynamically at runtime (~0.17–0.24 depending on region and year), (2) cessation aids (NRT, varenicline, etc.) are allocated stochastically when an agent enters NEWQUITTER, and (3) addiction decays during the tunnel, increasing P(maint) in later months. The static calculation uses month-0 values only.

5. Population-Level Maintenance Probability

5a. The COM-B maintenance formula

Implemented in comb_theory.py, class QuitMaintenanceTheory.

P(maint) = σ(η) = 1 / (1 + e−η)
η = C + O + M + b₀

where:

C = Σj βj · xj for j ∈ {capability predictors} O = Σj βj · xj for j ∈ {opportunity predictors} M = 0.001 · mNonSmokerSelfIdentity

Current intercept: b₀ = -0.7672137. Source: data/intermediate_data/com_b/maintenance_SEM_coefficients_202509019_v2.csv.

Per Harry's email, this intercept represents the log-odds of still being abstinent roughly one month after a quit attempt (reference category: "more than a week and up to a month" since quit). It is already approximately a monthly probability — no 6-month conversion needed.

5b. Addiction decay during the tunnel

ak+1 = ak · e−λΔt where λ = 0.0368, Δt = 4.33 weeks

Monthly decay factor: e−0.0368×4.33 = 0.8526. After 12 months: a₁₂ = a₀ · 0.852612 = a₀ · 0.1475.

On relapse, cCigAddictStrength resets to its pre-quit value.

5c. Distribution across 2,000 baseline smokers

Each agent's maintenance probability is computed from their syn pop attributes. Two scenarios:

Baseline scenarioBest-case scenario
Cessation aidsNone (all OFF)All ON (e-cig, NRT, varenicline, behavioural support, cytisine)
Regional prevalence0.20 (static)0.20 (static)
Smoking neighbours1 (static)1 (static)
Combined logit boost from aids0+2.43

What these numbers mean for the 12-month tunnel

An agent must survive 12 consecutive monthly maintenance checks to reach EXSMOKER. The 12-month survival probability is the product of 12 monthly probabilities (with addiction decay applied each month):

P(survive 12 months) = ∏k=0..11 P(maint at month k)

Quick conversion guide — if monthly P(maint) were constant:

Monthly P(maint)Annual survival P12Interpretation
0.300.000005 (0.0005%)Virtually impossible
0.500.000244 (0.02%)~1 in 4,000
0.700.0138 (1.4%)~1 in 72
0.800.0687 (6.9%)~1 in 15
0.900.282 (28%)~1 in 4
0.950.540 (54%)Majority survive
Baseline (no aids)Best case (all aids ON)
Per monthOver 12 monthsPer monthOver 12 months
Median0.2135.98e-080.7394.73e-02
Mean0.2248.66e-060.7296.46e-02
Max0.6150.0045450.9140.341137

Is this an intercept problem or a beta problem?

The predictor sum (C + O + M, excluding intercept) averages -0.537 across all baseline smokers. Even with b₀ = 0, the mean logit would be -0.537, giving P(maint) ≈ 0.369 and P(12mo) ≈ 6.34e-06.

Largest negative contributors to the logit (mean β·x across baseline smokers):

VariableβSEmean(x)mean(β·x)Prevalence
cCigAddictStrength−0.1950.0512.03−0.39691% have urge > 0
cCigConsumptionPrequit−0.0290.00812.3−0.357All smokers
oSocialHousing−0.3910.1690.48−0.18848% in social housing
oEducationalLevelBelowDegree−0.1750.1600.58−0.10258% below degree

Meanwhile, the largest positive contributors (cessation aids: e-cigarette β=+0.45, varenicline β=+0.72, cytisine β=+0.79) contribute almost nothing at baseline because <10% of smokers use them. They only activate when an agent enters the quit tunnel and is stochastically allocated aids.

Both the intercept (b₀ = -0.7672137, SE = 0.700) and the predictor betas contribute to the low P(maintenance). The predictor sum alone averages -0.537 — already negative before the intercept is added. Is the problem the intercept, the betas, or both?

5c-ii. Sensitivity analysis: which betas and bias matter most?

First, the conversion from monthly P(maintenance) to 12-month tunnel survival. The curve is P12 — it drops off steeply below ~0.85/month.

Combined variable range tornado: each variable is varied from its minimum to maximum plausible runtime value (e.g. cessation aids 0→1, addiction 0→5, regional prevalence 0.10→0.30). The intercept is varied by ±1 SE (0.700). All other terms held at population means.

5d. Survival curves by cCigAddictStrength

cCigAddictStrength = STS sturge variable: self-reported strength of urge to smoke, integer 0–5 (0 = none, 5 = extreme). β = −0.195.

Three bias scenarios: current (−0.767), moderate uplift (0.0), and strong uplift (+1.0). At the current intercept, addiction strength barely differentiates agents — all curves collapse near zero. The intercept dominates. At b₀ = +1.0, the curves separate and addiction strength starts to matter as intended.

6. Maintenance Bias: Temporal Scale & Sweep Analysis

6a. Attempt bias conversion (already applied)

The attempt SEM intercept (−2.169) was estimated on a 6-month window (per Harry). Converting to monthly:

P6mo = σ(−2.169) = 0.1026 P1mo = 1 − (1 − P6mo)1/6 = 0.0179 b₀monthly = logit(0.0179) = −4.006 ≈ −4.0

This uses the complementary formula: P(≥1 event in 6 months) is a union of monthly events, so P(no event in 1 month) = (1 − P6mo)1/6.

6b. Maintenance bias — already monthly

Per Harry's email: the maintenance SEM intercept represents the log-odds of still being abstinent roughly one month after a quit attempt (reference category: "more than a week and up to a month" since quit). This is already approximately a monthly probability.

Harry's caveats:

Harry's suggestion: treat the intercept as-is for monthly maintenance, or slightly downscale it and add uncertainty.

6c. Maintenance bias sweep (9 values × 10 runs × 120 ticks)

Sweep from b₀ = −1.0 to +1.0. For each bias value, 10 replicate runs of 120 ticks (2011–2020). The plot shows: (1) number of baseline smokers who reached EXSMOKER by 2020, (2) number still in the tunnel at the 2020 snapshot, and (3) the STPM verification quit rate (right axis, 0–1 scale). The dashed blue line is the mean STPM target.

The STPM quit rate (blue diamonds) is insensitive to the maintenance bias — it stays near the target across the full sweep because it counts tunnel occupants as quitters. The actual EXSMOKER count (green bars) is highly sensitive: ~0 at b₀ = −1.0 vs ~151 at b₀ = +1.0. The STPM verification metric cannot distinguish between agents who survive the tunnel and agents who merely enter it.

6d. Scenarios for the maintenance bias

Scenariob₀RationaleExpected effect
Current (SEM as-is)−0.767Harry says intercept is already monthly ~0 agents reach EXSMOKER. Tunnel is a revolving door.
Moderate uplift0.0Remove negative intercept; let predictors drive variation ~8 EXSMOKER per 10 years (sweep data). Still very low.
Strong uplift+0.5Compensate for SEM underestimation of transition probability ~40 EXSMOKER per 10 years. Some agents survive.
Calibrate to real-worldTBDTarget ~3–5% of attempters surviving 12 months Requires b₀ ≈ +0.8 to +1.2 (rough estimate from sweep).

7. Summary & Open Questions

What we know

Questions for discussion

  1. How can we further improve the verification here before we move on to calibration — what reasonable level of betas (including b₀) are we looking for?
  2. How should we verify that the ABM produces a realistic number of long-term ex-smokers, given that the current STPM metric cannot distinguish tunnel entry from tunnel survival?