# Attribution Tracking Scoping Document

**Date:** 2026-04-12
**Status:** Research & scoping only — do not build
**Goal:** Connect content → leads → revenue across the EN marketing engine

---

## Executive Summary

The EN system currently has **zero attribution infrastructure**. Every link published — bio pages, blog CTAs, nurture emails, social captions, landing pages — is a raw URL with no tracking parameters. GHL natively supports UTM capture via its `attributions` array on contacts, but nothing in the EN pipeline populates it. Revenue tracking is manual (3-column Google Sheet tab). The OSC investor tracker (`tools/osc_tracker.py`) is a mature click-tracking system that could serve as architectural inspiration, but it's deal-specific and not transferable as-is.

Building attribution requires changes to **5 skills/workflows** and **2 tools**, plus new custom fields in GHL and a new aggregation workflow. Estimated total effort: 3-4 sessions.

---

## 1. UTM Strategy

### 1.1 Proposed UTM Convention

Every outbound link published by the EN system should include these 4 parameters:

| Parameter | Convention | Examples |
|---|---|---|
| `utm_source` | Platform or channel where the link appears | `bio`, `instagram`, `linkedin`, `email_nurture`, `email_welcome`, `blog`, `landing_page`, `tiktok`, `facebook` |
| `utm_medium` | Content format | `cta_button`, `post_caption`, `carousel`, `email`, `blog_cta`, `nav_cta` |
| `utm_campaign` | Active campaign slug (from Campaigns tab) | `spring-intake-2026`, `free-audit-launch`, `challenge-may-2026` |
| `utm_content` | Specific content piece identifier | `nurture-w1-2026-04-14`, `blog-myofunctional-sleep`, `script-hook-pattern-3`, `bio-primary-cta` |

**Rules:**
- All lowercase, hyphens for spaces (no underscores in values — underscores only in param names)
- `utm_campaign` defaults to `organic` when no active campaign exists
- `utm_content` should be human-readable and traceable back to the Scripts/Blog/Nurture tab row
- Never include PII in UTM values

### 1.2 Where Links Get Published

| Surface | Link Types | Current State | File to Modify |
|---|---|---|---|
| **Bio link page** | Primary CTA, secondary CTAs, latest content links | Raw URLs, no JS allowed | `~/.claude/skills/bio/SKILL.md` |
| **Social post captions** | Occasional link in bio reference, LinkedIn post links | Most captions don't include URLs; some LinkedIn posts do | `~/.claude/skills/content/SKILL.md` (Steps 4, 5) |
| **Nurture emails** | CTA button hrefs | Raw URLs from `active_landing_page` | `~/.claude/skills/content/SKILL.md` (Step 8) |
| **Welcome sequence emails** | CTA button hrefs per email | Raw URLs | `~/.claude/skills/content/SKILL.md` (Step 9) |
| **Blog post CTAs** | Bottom-of-post CTA, nav CTA | Raw relative URLs (`../contact.html`) | `tools/post_blog.py` |
| **Blog LinkedIn share text** | URL in share caption | Raw blog URL | `~/.claude/skills/content/SKILL.md` (Step 7, Caption column) |
| **Landing pages** | Form action, booking link CTA | Raw URLs | `~/.claude/skills/positioning/SKILL.md` |
| **Campaign landing pages** | Email capture forms, direct CTAs | Raw URLs | `workflows/biweekly_campaign.md` |

### 1.3 Workflow Changes Required

#### `~/.claude/skills/content/SKILL.md`

**Effort: Medium (largest change)**

- **Step 7 (Blog):** When generating the LinkedIn share caption (stored in Caption column), append UTMs to the blog URL: `?utm_source=linkedin&utm_medium=post_caption&utm_campaign={campaign_slug}&utm_content=blog-{slug}`
- **Step 8 (Nurture emails):** Every CTA button `href` needs UTMs appended: `?utm_source=email_nurture&utm_medium=email&utm_campaign={campaign_slug}&utm_content=nurture-w{week}-{date}`
- **Step 9 (Welcome sequence):** Same pattern: `?utm_source=email_welcome&utm_medium=email&utm_campaign=welcome-sequence&utm_content=welcome-email-{N}`
- **Step 12 (Bio update):** The `/bio` call already regenerates the page — bio skill handles its own UTMs (see below)
- **Step 15 (GHL scheduling):** If LinkedIn post captions include a link, UTMs should already be in the caption text from Step 5/7

**What to add to the skill:** A "UTM Construction" subsection in the Content Standards area defining the helper pattern. The agent constructs UTMs inline when generating each content piece — no Python tool needed. The campaign slug comes from the Campaigns tab `Campaign` column (slugified).

#### `~/.claude/skills/bio/SKILL.md`

**Effort: Small**

- The bio page is static HTML with no JS (3G performance requirement). UTMs must be **baked into the HTML at build time**.
- Primary CTA href: append `?utm_source=bio&utm_medium=cta_button&utm_campaign={campaign_slug}&utm_content=bio-primary-cta`
- Secondary CTA hrefs: same pattern with `bio-secondary-{product_name}`
- Latest content links: `?utm_source=bio&utm_medium=cta_button&utm_campaign={campaign_slug}&utm_content=bio-latest-{N}`
- Social profile links: No UTMs (these go to external platforms, not conversion pages)

#### `~/.claude/skills/positioning/SKILL.md`

**Effort: Small**

- Landing page CTA buttons: append `?utm_source=landing_page&utm_medium=cta_button&utm_campaign={campaign_slug}&utm_content=lp-{product_slug}`
- Form action URLs: if pointing to GHL form/funnel, GHL will auto-capture UTMs from the page URL — so the **inbound link to the landing page** matters more than the form action itself

#### `tools/post_blog.py`

**Effort: Small**

- The `cta_link` in `CLIENT_CONFIGS` is currently a relative path (`../contact.html`). To support UTMs, this needs to become an absolute URL pointing to the client's booking/contact page.
- Nav CTA and post CTA: append `?utm_source=blog&utm_medium={nav_cta|blog_cta}&utm_campaign={campaign_slug}&utm_content=blog-{slug}`
- The `campaign_slug` would need to be passed as a parameter or read from the client's Google Sheet Campaigns tab at publish time.

#### `workflows/biweekly_campaign.md`

**Effort: Small**

- Campaign landing page brief (Step 2E): Add instruction to include UTM parameters on all inbound links that point to the landing page
- Content Integration Plan (Step 2F): Where it says `{link}` for newsletter P.S. and social CTAs, specify that `{link}` includes UTMs

---

## 2. Source Capture in GHL

### 2.1 Native GHL Attribution

**GHL automatically captures UTM parameters** when a contact is created through a GHL-hosted form or funnel. The data lands in the `attributions` array on the contact object:

```json
{
  "attributions": [
    {
      "utmSessionSource": "email_nurture",
      "medium": "email",
      "isFirst": true
    }
  ]
}
```

The `isFirst: true` flag marks first-touch attribution. Subsequent visits from the same contact append to the array (multi-touch).

**This means:** If the client's landing page or booking form is a GHL funnel/form, UTM capture is automatic — no additional setup needed. The UTMs travel through the URL → GHL reads them from the page URL when the form is submitted.

### 2.2 What Needs to Be True

For GHL auto-capture to work:

1. The **landing page must be a GHL-hosted form/funnel** OR use a GHL form embed with the `?utm_source=...` params in the parent page URL
2. The **inbound link** (from bio, email, blog, social) must include UTM params in the URL
3. GHL reads UTMs from the **page URL** at form submission time, not from hidden fields

**If the landing page is NOT GHL-hosted** (e.g., a custom HTML page from the positioning skill deployed to Vercel):
- Option A: Embed a GHL form iframe — GHL won't see the parent page UTMs
- Option B: Use hidden form fields that read UTMs from the URL via JavaScript, then map to GHL custom fields
- Option C: Redirect to a GHL funnel URL with UTMs forwarded as query params

**EN uses custom-built forms, not GHL forms.** GHL native UTM auto-capture does not apply. Every landing page and form needs:
1. A JS snippet that reads UTM params from the page URL on load
2. Hidden form fields (`utm_source`, `utm_medium`, `utm_campaign`, `utm_content`) populated by that JS
3. The form submission handler (backend or webhook) must pass those UTM values to GHL when creating/updating the contact via API

### 2.3 GHL Custom Fields to Create

The native `attributions` array captures `utmSessionSource` and `medium` but does **not** capture `utm_campaign` or `utm_content`. To get full attribution, create these custom fields on each EN client's GHL location:

| Field Name | Field Key | Type | Purpose |
|---|---|---|---|
| First Touch Source | `contact.first_touch_source` | TEXT | First `utm_source` value |
| First Touch Medium | `contact.first_touch_medium` | TEXT | First `utm_medium` value |
| First Touch Campaign | `contact.first_touch_campaign` | TEXT | First `utm_campaign` value |
| First Touch Content | `contact.first_touch_content` | TEXT | First `utm_content` value — traces to specific content piece |
| Attribution Summary | `contact.attribution_summary` | LARGE_TEXT | JSON or semicolon-delimited multi-touch history |

These fields would be populated by:
- A GHL workflow trigger on "Contact Created" that reads the UTM values from the contact's first attribution entry
- Or a webhook/automation that fires on form submission and maps URL params to custom fields

**Current state of custom fields (OSC location — only one accessible via MCP):** 24 custom fields exist, all OSC-specific (investor_tier, deals_participated, etc.). No attribution fields. Other locations (EN, LL, NC) may differ but couldn't be queried.

### 2.4 The `source` Field

GHL contacts have a standard `source` field (currently `null` on sampled contacts). This can be set during contact creation via API:

```python
payload = {
    "firstName": "...",
    "email": "...",
    "source": "bio_page",  # or "nurture_email", "blog_cta", etc.
}
```

The `ghl_import_contacts.py` tool currently does NOT set this field. For new contacts created through forms, GHL may auto-populate it based on the form/funnel name.

---

## 3. Data Aggregation

### 3.1 Weekly Attribution Report Workflow

A new workflow (`workflows/attribution_report.md`) would:

1. **Read new contacts** from the past 7 days via GHL API (`GET /contacts` with `startAfter` date filter)
2. **For each contact**, extract:
   - `attributions` array (native UTM data)
   - Custom fields (`first_touch_source`, `first_touch_campaign`, `first_touch_content`)
   - `tags` (for campaign/segment info)
   - `dateAdded`
3. **Read opportunities** linked to those contacts:
   - `monetaryValue` — the revenue figure
   - `status` — won/open/lost
   - `source` — opportunity-level source
   - `pipelineStageId` — current stage
4. **Group by** source → medium → campaign → content piece
5. **Write to** a new Google Sheet tab or update existing Financials tab

### 3.2 Proposed Sheet Schema: "Attribution" Tab

| Column | Description | Example |
|---|---|---|
| Week | ISO week start date | 2026-04-13 |
| Source | `utm_source` value | email_nurture |
| Medium | `utm_medium` value | email |
| Campaign | `utm_campaign` slug | spring-intake-2026 |
| Content Piece | `utm_content` value | nurture-w1-2026-04-14 |
| Leads Generated | Count of new contacts with this attribution | 3 |
| Pipeline Value | Sum of `monetaryValue` for open opportunities | $4,500 |
| Revenue Won | Sum of `monetaryValue` for won opportunities | $1,200 |
| Revenue Attributed | Revenue from contacts with this first-touch source | $1,200 |
| ROAS | Revenue / estimated content production cost | 4.2:1 |

### 3.3 Revenue Attribution Logic

```
Contact (first_touch_source = "email_nurture", first_touch_content = "nurture-w1-2026-04-14")
  └── Opportunity (monetaryValue = $1,200, status = "won")
       └── Attribution: $1,200 revenue attributed to "nurture-w1-2026-04-14"
```

**Attribution model:** First-touch by default (simplest, most actionable for a small team). The `first_touch_content` field traces revenue back to a specific content piece in the Scripts/Blog/Nurture tab.

**Multi-touch (future):** The `attributions` array on GHL contacts stores all touches. A more sophisticated model could weight first-touch (40%), last-touch (40%), and middle touches (20% split). Not worth building until there's enough data volume.

### 3.4 Implementation: New Tool

A new Python tool (`tools/attribution_report.py`) would:
1. Query GHL contacts API with date filter
2. Query GHL opportunities API, joining on `contactId`
3. Aggregate by UTM dimensions
4. Write to Google Sheet Attribution tab via `google-workspace` MCP
5. Optionally send a Telegram summary

The OSC tracker (`tools/osc_tracker.py`) provides architectural patterns for token-based tracking and GHL tagging that could inform this tool's design, but it's deal-slug-specific and would need significant refactoring to generalize.

---

## 4. Portal Display

### 4.1 Dashboard Attribution Card

Once the Attribution tab has data, the client portal dashboard could show:

```
TOP PERFORMING CONTENT THIS MONTH
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. "5 Signs Your Intake System Is Losing Cases"
   Blog post → 12 leads → $3,600 revenue

2. Week 3 Nurture Email
   Email → 8 leads → $2,400 revenue

3. Bio Page Primary CTA
   Bio link → 5 leads → $1,500 revenue

Total attributed revenue this month: $7,500
Total leads from EN content: 25
```

### 4.2 Minimum Viable Data

Before attribution cards are useful:
- **Minimum 4 weeks of UTM-tagged links** in circulation (leads take time to convert)
- **Minimum 10 attributed leads** to show meaningful patterns
- **At least 1 opportunity with `status=won`** tied to a tracked contact

For most EN clients, this means attribution display wouldn't be useful until ~6-8 weeks after UTM implementation launches.

### 4.3 Portal Implementation Note

The portal templates are **FROZEN** (April 2026 freeze). The attribution card would need to wait until the freeze is lifted, or be added to the Dashboard tab in Google Sheets as a data section that the existing dashboard template reads from. The latter approach (Sheet-driven) avoids modifying frozen templates.

---

## 5. Gaps and Prerequisites

### 5.1 What Must Be True Before Attribution Works

| Prerequisite | Current State | Effort to Fix |
|---|---|---|
| UTMs on all outbound links | Zero UTMs anywhere | Medium — 5 files to update |
| GHL forms/funnels as conversion points | Most clients use GHL forms | Verify per client |
| GHL custom fields for full UTM capture | No attribution fields exist | Small — create 5 fields per location |
| GHL workflow to map UTMs to custom fields | Does not exist | Small — GHL workflow builder |
| Contacts linked to opportunities | Exists (contactId on opportunities) | Already working |
| Opportunities have monetary values | Exists on OSC; needs verification for EN/LL/NC | Check per client |
| Weekly aggregation workflow | Does not exist | Medium — new tool + workflow |
| Attribution tab in client sheets | Does not exist (Revenue tab is 3 columns) | Small — add tab to template |

### 5.2 Client Readiness Assessment

| Client | GHL Location | Forms/Funnels | Opportunities Pipeline | Ready? |
|---|---|---|---|---|
| **Lasting Language** | `8HGLSPECfIaQfJaFO7Ef` | Yes (patient intake) | Unknown — needs pipeline setup | Almost |
| **Nurse Charles** | `VMK8SyHTd8uCMgcJ7I2A` | Yes (challenge signup) | Unknown — needs pipeline setup | Almost |
| **Oak Street Capital** | `5Ch1Ppe3kx5W7cFmQOlD` | Yes (investor forms) | Yes (Investor Pipeline, 7 stages) | Yes — but uses deal-specific tracker, not UTMs |
| **New EN clients** | Created at onboarding | Depends on setup | Created during onboarding | Depends |

### 5.3 Estimated Effort by Workflow

| File | Change | Effort | Dependencies |
|---|---|---|---|
| `~/.claude/skills/content/SKILL.md` | Add UTM construction rules to Steps 7, 8, 9; update content standards | 1 session | UTM convention defined (this doc) |
| `~/.claude/skills/bio/SKILL.md` | Bake UTMs into CTA hrefs at build time | 30 min | UTM convention defined |
| `~/.claude/skills/positioning/SKILL.md` | Add UTMs to landing page CTAs | 30 min | UTM convention defined |
| `tools/post_blog.py` | Change `cta_link` to absolute URLs + UTMs | 30 min | Campaign slug passed as param |
| `workflows/biweekly_campaign.md` | Add UTM instructions to link placeholders | 15 min | UTM convention defined |
| GHL custom fields | Create 5 fields on each client location | 30 min per location | GHL admin access |
| GHL workflow (UTM → fields) | Build "on contact created" workflow in GHL | 1 hour per location | Custom fields created |
| `tools/attribution_report.py` | New tool: query GHL, aggregate, write to Sheet | 1 session | Custom fields + UTMs live |
| `workflows/attribution_report.md` | New workflow: weekly attribution report | 30 min | Tool built |
| Sheet template update | Add Attribution tab to template sheet | 15 min | Schema defined (this doc) |

---

## 6. Architecture Decision (Confirmed 2026-04-12)

**Chosen path: Per-client API keys + Vercel serverless function.**

Tested: GHL Agency API key CANNOT create contacts across sub-accounts (returns 401 "not authorized for this scope"). Per-location PIT tokens are required.

**Form capture flow:**
```
Landing page form → POST /api/capture (Vercel serverless) → reads {email, client_id, UTMs}
  → looks up client's GHL API key + locationId from config → creates GHL contact with UTM data
  → returns success → form shows "You're in. Check your inbox."
```

**Per-client setup (manual, done during onboarding):**
1. In GHL: Settings → Integrations → API Keys → Create PIT token
2. Add to .env: `{PREFIX}_GHL_API_KEY=pit-...` and `{PREFIX}_GHL_LOCATION_ID=...`
3. Add client to `CLIENT_GHL_MAP` in `tools/attribution_report.py`
4. Add client to the Vercel capture function config

**Task added to Workflow 0 Step 11d** with full instructions.

**Still to build:** The Vercel serverless function at `tools/landing-pages/api/capture.js` that receives form POSTs and creates GHL contacts. This is blocked until the first landing page is generated by the positioning skill.

---

## 7. Prioritized Implementation Plan (Updated)

### Phase 1: Foundation (1 session)
**Goal:** Get UTMs flowing on all links

1. Define UTM convention (done — Section 1.1 of this doc)
2. Update `~/.claude/skills/content/SKILL.md` — add UTM rules to Steps 7, 8, 9
3. Update `~/.claude/skills/bio/SKILL.md` — bake UTMs into CTA hrefs
4. Update `~/.claude/skills/positioning/SKILL.md` — UTMs on landing page CTAs
5. Update `tools/post_blog.py` — absolute CTA URLs with UTMs
6. Update `workflows/biweekly_campaign.md` — UTM instructions

**After Phase 1:** Every new content cycle produces UTM-tagged links. No data flows into GHL yet, but the links are ready.

### Phase 2: GHL Capture (1 session)
**Goal:** GHL stores attribution data on every new contact

1. Create custom fields on each client GHL location (5 fields x N locations)
2. Build GHL "Contact Created" workflow that reads UTMs → writes to custom fields
3. Verify: submit a test form with UTM-tagged URL, confirm fields populate
4. Update `tools/ghl_import_contacts.py` to set `source` field on bulk imports

**After Phase 2:** New leads have attribution data. Historical contacts do not (can't retroactively attribute).

### Phase 3: Reporting (1 session)
**Goal:** Weekly attribution reports in Google Sheets

1. Add "Attribution" tab to client sheet template (columns from Section 3.2)
2. Build `tools/attribution_report.py` — queries GHL, aggregates, writes to Sheet
3. Write `workflows/attribution_report.md` — SOP for running the report
4. Test with real data from Phase 2

**After Phase 3:** Weekly visibility into which content drives leads and revenue.

### Phase 4: Dashboard Display (when portal freeze lifts)
**Goal:** Attribution data visible on client portal dashboard

1. Add attribution card to dashboard template (reads from Attribution tab)
2. Show: top content by leads, top content by revenue, monthly totals
3. Requires ~6-8 weeks of data from Phases 1-3 before it's meaningful

---

## 7. Open Questions

1. **GHL form vs. custom landing page split:** What percentage of EN clients use GHL-hosted forms vs. custom landing pages from the positioning skill? This determines how much JS-based UTM capture work is needed.

2. **Opportunity pipeline for non-OSC clients:** Do LL, NC, and new EN clients have opportunities with monetary values in GHL? If not, revenue attribution can't work until pipelines are set up.

3. **GA4 expansion:** Should GA4 be added to bio pages and landing pages for server-side attribution backup? Currently only on LL blog pages (`G-2BZKEBSS68`).

4. **Multi-touch vs. first-touch:** First-touch is recommended for simplicity. Is there a use case where Bryce needs to know "this contact saw 3 blog posts, then a nurture email, then booked"? If so, the `attributions` array supports it but the reporting tool needs to be more complex.

5. **OSC tracker reuse:** The `osc_tracker.py` click-tracking system (token-based redirects, bot filtering, GHL tagging) is battle-tested. Should it be generalized into an EN-wide link tracker, or is UTM-based passive tracking sufficient?

6. **Content production cost tracking:** The ROAS column in the Attribution tab needs a cost denominator. How should content production cost be estimated? Per-piece flat rate? Monthly retainer / pieces produced?
