A modular, cloud-native Customer Data Platform built on Google Cloud — combining BigQuery as the unified data warehouse, Pub/Sub for real-time event streaming, Dataflow for transformation, and Cloud Functions for activation — with full consent, identity, and governance layers.
| Stream event ingestion | < 200ms |
| Profile unification | < 2s |
| Audience export | < 4hrs |
| DSR deletion | < 72hrs |
| API egress P99 | < 500ms |
Transform in-store, call-centre, and direct mail interactions into addressable digital signals — using deterministic ID matching, BigQuery computed events, and real-time profile enrichment via Firestore.
🛒 POS Transaction — Raw Offline Event
✅ Enriched Profile — Post Identity Match
match_confidence=PROBABILISTIC and excluded from PII-sensitive activations.| Use Case | Offline Signal | Online Channel | Logic | Platform |
|---|---|---|---|---|
| In-store Win-Back | Last purchase > 90 days | Paid Social | Exclude recent buyers, target lapsed | Meta CAPI |
| Cross-sell Footwear → Apparel | footwear_buyer=true | Display / Search | Bid +40% for apparel queries | SA360 + GAds |
| Offline LTV → Digital ROAS | ltv_percentile > 80 | Customer Match | High-value seed audience for lookalike | Google Ads |
| Call Centre Recovery | complaint_logged=true | In-App Message | Send apology + voucher within 2hrs | moEngage |
| Post-Purchase NPS | purchase within 24hrs | Email / Push | Trigger satisfaction survey | moEngage |
| Direct Mail Responders | ref_code scanned | Search + Display | Suppress DM audience; shift budget online | SA360 |
| Loyalty Tier Upgrade | cumulative_ltv > £500 | Push Notification | Platinum tier promotion | moEngage + Amplitude |
Activate BigQuery audiences and Firestore computed events into the full stack of paid media, analytics, and engagement platforms — with consent gating, rate limiting, and schema validation on every outbound pipe.
Deterministic and probabilistic matching across web, mobile, offline, and CRM identity spaces — with a full collision resolution framework covering merge, split, override, and quarantine strategies.
A walkthrough of how Marcus gets identified and cross-device stitched in real time — starting from a tablet session, then progressing through a desktop login. Every step maps to a live Firestore write, sGTM decision, and CDP profile update.
| Collision Type | Trigger | Resolution |
|---|---|---|
| Shared Device | Same GAID, 2+ email logins within 2hrs | SPLIT into separate profiles; GAID marked shared |
| Merged Households | Same postcode + loyalty_id collision | PARENT/CHILD household graph created |
| Data Conflict | BQ and CRM disagree on email | PRIORITY RULE CRM > offline > web (configurable) |
| Ghost Profile | Cookie profile never matched deterministically | QUARANTINE after 90 days inactivity |
| ID Theft Signal | Same email, 5+ different devices in 1hr | FREEZE + alert + manual review queue |
| Consent Mismatch | Profile A consents, Profile B (same person) opts out | OPT-OUT WINS always — merged profile inherits lowest consent |
| Late Arriving Offline | Offline event arrives 7 days after online session | RETROACTIVE MERGE re-process historical audiences |
cdp.merge_audit_log. The rollback system reads this log to reconstruct any prior profile state — point-in-time recovery to within 1 second of any event.cdp.profile_snapshots (BQ + GCS backup). This is the rollback source of truth — not a diff, a full copy.rollback_identity Cloud Function.delete_user + re-add signal with the corrected profile. Audiences are refreshed within the next scheduled export window (max 4hrs).🔁 Rollback Cloud Function
| Scenario | Trigger | Rollback Method | Scope | SLA | Auto? |
|---|---|---|---|---|---|
| Wrong Merge — False Positive | match_confidence < 0.65 on review | Restore last snapshot; split UIDs back | Both merged profiles | 1 hr | Semi-auto |
| Fraud Freeze Triggered | 5+ devices in 1hr anomaly | Auto-revert last 3 merge operations | Target profile only | 5 min | Auto |
| Incorrect Shared Device Split | Manual admin review | Re-merge child UIDs from parent snapshot | Child profiles + events | 4 hrs | Manual |
| Consent Override Error | Opt-out cascade applied to wrong UID | Restore consent flags from snapshot; re-add to audiences | Consent fields only | 30 min | Auto |
| DSR Merge Pre-Deletion | DSR received for a merged profile | Rollback merge to identify original UID scope; then delete only that scope | Pre-merge UIDs | 72 hrs | Semi-auto |
| Retroactive Merge Wrong Offline Event | Source system corrects offline transaction | Remove enrichment; undo BQ computed event; re-run match | Enriched events only | 24 hrs | Manual |
| Schema Version Mismatch | New schema breaks profile shape | Revert identity_map schema; restore from BQ snapshot | Entire profile table | 2 hrs | Semi-auto |
merge_audit_log with require_partition_filter=true. Never update or delete — WORM pattern for regulatory compliance.The complete event contract — from JSON schema validation and deduplication logic to blocking rules, access control, and egress throttle configuration.
📋 Master CDP Event JSON Schema
✅ Valid Event Example
cdp-schema-violations topic — never dropped silently. Dead-letter queue retains for 7 days.message_id. Protects against SDK double-sends on network retry.GroupByKey(event_id). Collapses duplicates arriving within the same processing window.event_id as the unique key. Ensures idempotency even if Dataflow delivers twice.event_id field. Amplitude: insert_id. Google Ads: upload job idempotency key. Each destination handles its own window.| Rule Name | Trigger Condition | Action | Scope | Reversible |
|---|---|---|---|---|
| opt_out_block | consent.marketing = false | Block all marketing egress | Profile + all linked devices | Yes |
| dsr_suppression | DSR request received | Immediately suppress from all audiences; queue deletion | All destinations + BQ | No |
| fraud_freeze | 5+ device IDs in 1hr OR velocity anomaly | Freeze profile; alert security team | Profile-level | Manual review |
| minor_block | age_verified = false OR age < 18 | Block all personalised advertising egress | All marketing platforms | Yes (on verification) |
| pii_in_event_block | DLP API detects raw PII in event body | Quarantine event; alert data team | Single event | After remediation |
| schema_violation_block | JSON schema validation fails | Route to DLQ; never ingest to main stream | Single event | Fix + resubmit |
| geo_restriction | user_country = sanctioned list | Block all processing; alert compliance | Profile + events | Compliance only |
| egress_rate_exceeded | Connector exceeds configured RPS | Queue event; apply exponential backoff | Connector-level | Auto-resolves |
| consent_mode_downgrade | Consent signal degrades (e.g., CMP update) | Retroactively remove from active audiences | Profile + downstream | On re-consent |
| Role | BQ Access | Firestore | Pub/Sub | Secrets |
|---|---|---|---|---|
| cdp-ingestion-sa | Writer (raw only) | Writer | Publisher | None |
| cdp-dataflow-sa | Reader + Writer (clean) | Reader | Sub | Viewer |
| cdp-connector-sa | Reader (audiences) | Reader | None | Accessor |
| cdp-analyst-sa | Reader (no PII tables) | None | None | None |
| cdp-admin-sa | Full (audit only) | Full | Full | Full |
cdp-egress-dlq-{destination} Pub/Sub topic. Replayed after human review or automated retry rules.Everything you need in place before day 1 — GCP configuration, team capabilities, data quality thresholds, and regulatory prerequisites.
| Requirement | Minimum Spec | Status Check |
|---|---|---|
| GCP Project | Org-level Billing Account + VPC | Required |
| BigQuery Dataset | Multi-region EU, CMEK enabled | Required |
| Cloud Pub/Sub | 3 topics minimum (raw, clean, DLQ) | Required |
| Firestore | Native mode, eu-west2 region | Required |
| Cloud KMS | Key ring for CMEK + PII column encryption | Required |
| VPC Service Controls | Perimeter around BQ + Firestore | Strongly Rec |
| Secret Manager | All API keys (never env vars) | Required |
| Cloud Composer (Airflow) | v2.x for BQ job orchestration | Required |
| Data Catalog | Schema registry + PII tagging | Strongly Rec |
| GCP Budget Alerts | Set at 80% + 100% of monthly limit | Best Practice |
| Requirement | Detail | Owner |
|---|---|---|
| GDPR Lawful Basis | Consent (Art.6a) or Legitimate Interest documented per purpose | DPO |
| PECR Compliance | UK: soft opt-in for email; hard opt-in for cookies | Legal |
| TCF 2.2 CMP | IAB-registered CMP integrated with consent signal flow | MarTech |
| Data Processing Agreement | DPA with GCP, each connector vendor, each data supplier | Legal |
| ROPA (Records of Processing) | Every data flow documented in Article 30 register | DPO |
| DSR Process | 72hr response SLA; tested quarterly | Engineering |
| EEA Data Transfer Mechanism | SCCs or Binding Corporate Rules for non-EU destinations | Legal |
| Retention Policy | Raw events: 90 days; Profiles: active+24mo; Audit: 7yr | Data Governance |
Concrete, end-to-end activation playbooks showing exactly which data, which BigQuery queries, which connectors, and which consent requirements apply — per use case.
SELECT email_sha256 FROM cdp.unified_profiles WHERE ltv_percentile >= 90 AND offline_signals.purchase_count >= 3 AND consent_marketing = truepredicted_90d_ltv per uid. This score is passed as conversion value at the time of the conversion event (not a flat £1).SELECT uid, gclid, ROUND(predicted_90d_ltv * ltv_weight_factor, 2) AS conv_value FROM cdp.scored_profiles WHERE predicted_90d_ltv > 0 AND consent_marketing = truead_user_data=GRANTED. Modelled conversions kick in for non-consented users — do not suppress the conversion ping, just omit PII.SELECT email_sha256 FROM cdp.unified_profiles WHERE (last_purchase_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY) OR consent_marketing = false OR is_frozen = true OR dsr_active = true) AND consent_analytics = trueSELECT uid, email_sha256, push_token, phone_sha256, segment_name, personalisation_payload FROM cdp.audience_export WHERE segment_name IN ('browse_abandoner_24h','winback_90d') AND consent_marketing = true AND moEngage_opted_in = truepersonalisation_payload JSON: last browsed product, price, stock level, recommended alternatives (Vertex AI), loyalty points balance, first name — injected into moEngage template via Liquid tags.consent_email=true, consent_sms=true, push_opted_in=true per channel individually — one consent does NOT cover all channels. Double-opt-in for SMS required under PECR.profiles/{uid}/msg_frequency. Cap enforced by Cloud Function before any moEngage API call.| Use Case | KPI | Typical Uplift | Time to Value | Risk |
|---|---|---|---|---|
| Lookalike Expansion | CPM efficiency, ROAS | +20–40% ROAS | 4–8 weeks | Low |
| Abandoned Cart Recovery | Recovery rate, incremental revenue | 12–18% cart recovery | 2–4 weeks | Low |
| Offline Attribution | True ROAS, measured conversions | +15–35% measured ROAS | 6–12 weeks | Medium |
| Churn Prevention | Retention rate, LTV | 8–15% churn reduction | 8–16 weeks | Medium |
| Mobile Attribution Clean Room | CPI accuracy, incremental installs | +25% attribution accuracy | 10–20 weeks | High |
| Value-Based Bidding | ROAS, CPA efficiency vs LTV | +25–45% ROAS vs flat-bid | 6–10 weeks | Medium |
| Suppression — Paid | Wasted spend reduction | 8–18% budget saving | 1–2 weeks | Low |
| Owned Channel Remarketing | Revenue/send, unsubscribe rate | 2.4–4.1x revenue vs broadcast | 3–6 weeks | Low |
Real-time error rates, pipeline trend lines, egress health, and profile lookup — all driven from BigQuery monitoring tables and Firestore live reads. Refresh every 1.2 seconds.
| Connector | Status | Success Rate | Avg Latency | DLQ Depth | Last Send | Rate Limit |
|---|
Real-time audience health across Google Ads, Meta, Amazon DSP, and SA360 — match rates, active usage, staleness flags, and ROI signal pulled programmatically from each platform's API. Data refreshes every 4 hours. Min 1,000 matched users required for platform disclosure (privacy threshold).
user_list.size_for_display/search + match rate per offline_user_data_job. Meta Marketing API returns approximate_count on every Custom Audience. Amazon DSP API returns audience size with privacy-threshold approximation. SA360 inherits from Google Ads user_list + surfaces bid modifier performance per audience. All four APIs require OAuth service account tokens with ads_management / userlist.read scopes stored in Secret Manager. Note: Google Ads API Customer Match upload capability migrates to Data Manager API — April 2026 deadline for active tokens.
| Platform | Match Rate API | List Size API | Active Use API | Staleness Flag | Constraint |
|---|---|---|---|---|---|
| 🔵 Google Ads | ✓ offline_user_data_job | ✓ user_list resource | ✓ ad_group_audience_view | ✓ REFRESH recommendation | Data Manager API migration Apr 2026 |
| 🔷 Meta | ✓ approximate_count | ✓ Custom Audience API | ✓ Insights API | Partial — no native flag | Min 1,000 users for disclosure |
| 🟠 Amazon DSP | ✓ DSP Audiences API | ✓ Approximated count | ✓ AMC + DSP Reporting | Partial — manual check | Privacy threshold; batch-only match |
| 🔴 SA360 | Via Google Ads API | ✓ Inherits from GAds | ✓ Bid modifier + Floodlight | ✓ Via GAds recommendation | No raw match rate in SA360 API directly |
ad_group_audience_viewdelivery_status = active| Platform | Audience | Last Refresh | Days Stale | Impact | Recommended Action |
|---|