{"id":51439772,"url":"https://github.com/guiarpi/funnel-analysis","last_synced_at":"2026-07-05T10:30:22.208Z","repository":{"id":357630537,"uuid":"1237823639","full_name":"guiarpi/funnel-analysis","owner":"guiarpi","description":"End-to-end product funnel analysis on GA4 BigQuery data — session-scoped SQL, cohort retention, device segmentation, interactive dashboard, and AI-generated product brief using Claude/Gemini.","archived":false,"fork":false,"pushed_at":"2026-05-28T13:58:36.000Z","size":136,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-28T15:21:35.613Z","etag":null,"topics":["bigquery","dashboard","funnel-analysis","google-cloud","plotly","python","sql"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guiarpi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-13T14:42:17.000Z","updated_at":"2026-05-28T13:59:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/guiarpi/funnel-analysis","commit_stats":null,"previous_names":["guiarpi/funnel-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/guiarpi/funnel-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiarpi%2Ffunnel-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiarpi%2Ffunnel-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiarpi%2Ffunnel-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiarpi%2Ffunnel-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guiarpi","download_url":"https://codeload.github.com/guiarpi/funnel-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiarpi%2Ffunnel-analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35151638,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-05T02:00:06.290Z","response_time":100,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","dashboard","funnel-analysis","google-cloud","plotly","python","sql"],"created_at":"2026-07-05T10:30:21.043Z","updated_at":"2026-07-05T10:30:22.192Z","avatar_url":"https://github.com/guiarpi.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GA4 Product Funnel Analysis\n\nEnd-to-end product analytics project built on Google Analytics 4 (GA4) data in BigQuery. Covers session-scoped funnel analysis, revenue metrics, device segmentation, time-to-convert, shoppers funnel, weekly cohort retention, and an AI-generated product brief — all in Python.\n\n**Dataset:** [`bigquery-public-data.ga4_obfuscated_sample_ecommerce`](https://console.cloud.google.com/bigquery?p=bigquery-public-data\u0026d=ga4_obfuscated_sample_ecommerce) · Nov 2020 – Jan 2021\n\n---\n\n## What's in this project\n\n| File | Description |\n|---|---|\n| `ga4_funnel_analysis.ipynb` | Main analysis notebook — all SQL, charts, and commentary |\n| `generate_dashboard.py` | Runs all queries and produces a standalone `dashboard.html` |\n| `dashboard.html` | Self-contained interactive dashboard — open in any browser, no Python needed |\n| `product_brief_ga4_funnel.md` | Pre-generated AI product brief (included so the dashboard works without an API key) |\n\n---\n\n## Analysis overview\n\n### 1 · Session-scoped funnel\nConverts the GA4 event stream into a sequential 5-step funnel: **Session Start → View Item → Add to Cart → Begin Checkout → Purchase**. Each session is counted at a step only if it completed all prior steps in order within the same session — matching how the GA4 UI calculates funnel conversion and avoiding the inflation caused by carrying past-purchase behaviour into new sessions.\n\nThe grain is `(user_pseudo_id, ga_session_id)`. Session IDs are extracted from the nested `event_params` array using `UNNEST`.\n\n### 2 · Revenue metrics\nTotal revenue, average order value, and median order value for converting sessions. The mean/median gap surfaces revenue concentration from high-value orders.\n\n### 3 · Device breakdown\nThe same sequential funnel split by `device.category` (desktop, mobile, tablet). Used to assess whether device type is a meaningful driver of conversion loss.\n\n### 4 · Time-to-convert\nMedian time (in hours) between each funnel step, using `APPROX_QUANTILES` on microsecond-precision event timestamps. Restricted to sessions that completed the full funnel, so this measures deliberation time for buyers, not abandonment.\n\n### 5 · Shoppers funnel\nAn alternative funnel starting at `view_item` — isolating users who demonstrated product intent. Separating this from the full-session funnel reveals the true add-to-cart rate among engaged users and sizes the product discovery gap.\n\n### 6 · Weekly cohort retention\nUsers grouped by the week of their first session, tracked week-over-week. The heatmap shows what percentage of each cohort returned in subsequent weeks. Analogous to D7/D14 retention in a SaaS context.\n\n### 7 · AI product brief (Section 10 of the notebook)\nStructured analysis context is passed to an LLM to generate a product brief with: executive summary, key findings, hypotheses, recommended actions, and metrics to track next. Demonstrates AI-augmented analysis beyond chart generation.\n\nA pre-generated brief (`product_brief_ga4_funnel.md`) is committed to the repo so the dashboard renders fully without an API key. To regenerate it yourself, see the API key setup options below.\n\n---\n\n## Key findings\n\n- **~79% of sessions never reach a product page** — the dominant loss in the funnel, upstream of everything else\n- Once users do view a product, add-to-cart rates are healthy — the problem is discovery, not the checkout flow\n- Median time-to-purchase is under an hour for converting sessions — low deliberation friction\n- Week-1 cohort retention averages in the single digits — most acquired users do not return\n- Device type explains very little of the conversion gap — mobile checkout UX is not the lever\n\n---\n\n## How to run\n\n### Prerequisites\n\n- Python 3.9+\n- A Google Cloud project with BigQuery API enabled\n- Access to `bigquery-public-data` (available to all GCP accounts)\n\n### Install dependencies\n\n```bash\npip install google-cloud-bigquery pandas plotly db-dtypes anthropic\n```\n\n### Authenticate with Google Cloud\n\nThis project uses [Application Default Credentials](https://cloud.google.com/docs/authentication/application-default-credentials) — no service account key file needed.\n\n```bash\ngcloud auth application-default login\n```\n\n### Set your project ID\n\nIn both `ga4_funnel_analysis.ipynb` and `generate_dashboard.py`, update:\n\n```python\nPROJECT_ID = \"your-gcp-project-id\"\n```\n\n### Run the notebook\n\n```bash\njupyter notebook ga4_funnel_analysis.ipynb\n```\n\nRun all cells in order. Sections 1–9 have no API key dependency. Section 10 (AI brief) requires an LLM API key — choose one:\n\n**Option A — Anthropic Claude (paid, ~$0.01 per run)**\n```bash\nexport ANTHROPIC_API_KEY=\"sk-ant-...\"\n```\nGet a key at [console.anthropic.com](https://console.anthropic.com) → API Keys.\n\n**Option B — Google Gemini (free, no credit card)**\n```bash\nexport GEMINI_API_KEY=\"AIza...\"\npip install google-generativeai\n```\nGet a free key at [aistudio.google.com](https://aistudio.google.com) → \"Get API key\". Then run **Section 10f** in the notebook instead of 10d.\n\n**Option C — Skip (dashboard still works)**\nA pre-generated brief is already in `product_brief_ga4_funnel.md`. The dashboard embeds it automatically — no API key needed.\n\n### Generate the dashboard\n\n```bash\npython generate_dashboard.py\n```\n\nOpens as `dashboard.html` — a fully self-contained file with no runtime dependencies. If `product_brief_ga4_funnel.md` exists (generated in Section 10 of the notebook), it will be embedded in the dashboard automatically.\n\n---\n\n## SQL patterns used\n\n| Pattern | Where |\n|---|---|\n| `UNNEST(event_params)` scalar subquery | Extracting `ga_session_id` from the nested array |\n| Sequential `COUNTIF` chain | Enforcing step order within a session |\n| `APPROX_QUANTILES` | Efficient percentile computation for time-to-convert |\n| `DATE_TRUNC(..., WEEK(MONDAY))` | Weekly cohort bucketing |\n| `_TABLE_SUFFIX BETWEEN` | Partition pruning to control BigQuery costs |\n\nAll SQL is written for **BigQuery Standard SQL** and runs against the public GA4 sample dataset with no modifications.\n\n---\n\n## B2B SaaS translation\n\nThe funnel events map directly to SaaS product analytics concepts:\n\n| GA4 event | SaaS equivalent |\n|---|---|\n| `session_start` | User login / app open |\n| `view_item` | Feature discovery — navigating to a core feature |\n| `add_to_cart` | Feature first use — attempting the core action |\n| `begin_checkout` | Workflow initiated — e.g. creating a first record |\n| `purchase` | Activation — completing the key value moment |\n| Week-1 retention | D7 retention after first login |\n\nThe SQL patterns, funnel methodology, and retention framework apply without modification to any event-based product analytics dataset.\n\n---\n\n## Tech stack\n\n- **Python** — pandas, Plotly, google-cloud-bigquery\n- **BigQuery** — Standard SQL, partitioned `events_*` tables, GA4 export schema\n- **Plotly** — Interactive charts rendered to standalone HTML\n- **Anthropic API / Google Gemini** — LLM for AI-generated product briefs (Gemini free tier supported)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguiarpi%2Ffunnel-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguiarpi%2Ffunnel-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguiarpi%2Ffunnel-analysis/lists"}