{"id":50806011,"url":"https://github.com/jeffgreendesign/lead-enrichment-api","last_synced_at":"2026-06-13T01:05:20.512Z","repository":{"id":363403417,"uuid":"1175655842","full_name":"jeffgreendesign/lead-enrichment-api","owner":"jeffgreendesign","description":"Pattern exploration: what happens when you treat Pydantic schemas as an AI governance contract? Applied here to a webhook-driven lead enrichment pipeline.","archived":false,"fork":false,"pushed_at":"2026-06-08T18:25:21.000Z","size":117,"stargazers_count":0,"open_issues_count":4,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-08T20:15:09.891Z","etag":null,"topics":["ai-governance","lead-enrichment","pydantic","schema-validation","synthetic-data","typescript","vercel","webhook"],"latest_commit_sha":null,"homepage":"https://lead-enrichment-api.vercel.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jeffgreendesign.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-08T01:47:36.000Z","updated_at":"2026-06-08T18:22:05.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jeffgreendesign/lead-enrichment-api","commit_stats":null,"previous_names":["jeffgreendesign/lead-enrichment-api"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/jeffgreendesign/lead-enrichment-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeffgreendesign%2Flead-enrichment-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeffgreendesign%2Flead-enrichment-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeffgreendesign%2Flead-enrichment-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeffgreendesign%2Flead-enrichment-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jeffgreendesign","download_url":"https://codeload.github.com/jeffgreendesign/lead-enrichment-api/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jeffgreendesign%2Flead-enrichment-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34268215,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-governance","lead-enrichment","pydantic","schema-validation","synthetic-data","typescript","vercel","webhook"],"created_at":"2026-06-13T01:05:19.874Z","updated_at":"2026-06-13T01:05:20.504Z","avatar_url":"https://github.com/jeffgreendesign.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# lead-enrichment-api\n\nA serverless lead enrichment pipeline for lending and financial services, demonstrating how LLM inference integrates into marketing automation workflows. Built as a pattern exploration — webhook payload in, structured enrichment event out, with Pydantic schema validation acting as an AI governance layer throughout.\n\n**Stack:** Python · FastAPI · Anthropic Claude · Pydantic v2 · Docker · Google Cloud Run · GCS · Snowflake\n\n---\n\n## What it does\n\nA `POST /enrich` endpoint accepts a raw lead webhook payload (the kind you'd receive from a form submission, CRM, or ad platform) and returns a structured enrichment event ready for downstream ingestion by a CDP like Segment or a marketing automation platform like SendGrid.\n\nThe pipeline runs three steps inside the request lifecycle:\n\n1. **Input validation** — Pydantic parses and validates the incoming webhook payload before any processing occurs\n2. **LLM classification + personalization** — The lead data is sent to Claude with a structured prompt requesting:\n   - Loan type intent (`bridge_rtl` / `rental` / `unknown`)\n   - Investor experience level (`first_time` / `experienced` / `unknown`)\n   - Urgency score (1–5)\n   - A one-line personalized outreach message (e.g., referencing speed-to-close for a flip, DSCR flexibility for a rental portfolio)\n   - Classification rationale for auditability\n3. **Output validation** — The LLM's JSON response is validated against a strict Pydantic schema before anything is returned. If the model drifts from the contract — wrong types, missing fields, placeholder text in the outreach message — the request fails with a `422` and an explicit `ai_output_validation_failed` error. This is intentional: the schema is the governance layer.\n\nThe final response is shaped as an enrichment event: original lead fields plus AI-derived attributes, formatted as if it would be forwarded to a downstream platform.\n\n---\n\n## Architecture\n\n```\nWebhook Payload\n      │\n      ▼\n┌─────────────────────┐\n│  Input Validation   │  Pydantic: LeadWebhookPayload\n│  (FastAPI / Pydantic)│\n└─────────┬───────────┘\n          │\n          ▼\n┌─────────────────────┐\n│   LLM Enrichment    │  Claude claude-sonnet-4-6\n│   (Anthropic SDK)   │  System prompt + structured JSON request\n└─────────┬───────────┘\n          │\n          ▼\n┌─────────────────────┐\n│  Output Validation  │  Pydantic: LLMClassification\n│  (AI Governance)    │  422 on schema mismatch — hard fail\n└─────────┬───────────┘\n          │\n          ▼\n┌─────────────────────┐\n│  Structured Event   │  EnrichedLeadResponse\n│  (CDP / Automation) │  Ready for Segment Track / SendGrid\n└─────────┬───────────┘\n          │\n          ├──▶ API Response (JSON)\n          │\n          └──▶ GCS ──▶ Snowpipe ──▶ Snowflake\n               (enrichment archive)  (analytics / reporting)\n```\n\nThe governance approach — using Pydantic as a contract between the LLM and the rest of the system — is the core pattern this project explores. Structured output validation is one of the more underrated techniques for making LLM-integrated services production-ready: it forces the model to earn its place in the pipeline on every request.\n\n---\n\n## Project structure\n\n```\nlead-enrichment-api/\n├── src/\n│   └── lead_enrichment/\n│       ├── main.py          # FastAPI app, routes, lifecycle, error handlers\n│       ├── models.py        # Pydantic models: payload, LLM output, response\n│       ├── enrichment.py    # LLM call + parse + validation logic\n│       └── prompts.py       # System prompt + user prompt builder\n├── tests/\n│   ├── conftest.py              # Shared fixtures and test client setup\n│   ├── test_health.py           # Health endpoint tests\n│   ├── test_models.py           # Pydantic model validation tests\n│   ├── helpers.py                 # Test helper utilities\n│   ├── test_enrichment.py       # GCS write and dead-letter tests\n│   └── test_enrich_endpoint.py  # End-to-end /enrich integration tests\n├── fixtures/                # Sample lead payloads for testing\n├── snowflake/\n│   └── setup.sql            # Snowflake storage integration, stage, table, Snowpipe\n├── postman/\n│   └── lead-enrichment-api.postman_collection.json\n├── scripts/\n│   ├── security-check.sh    # Pre-commit secret and safety scanner\n│   ├── sync-postman.py      # Regenerate Postman collection from OpenAPI + fixtures\n│   ├── verify-snowpipe.py   # Snowpipe connectivity and ingestion verification\n│   └── verify-snowpipe.sh   # Shell wrapper for Snowpipe verification\n├── .github/\n│   └── workflows/\n│       └── ci.yml           # Lint, type-check, security scan, test\n├── Dockerfile               # Cloud Run-optimized, python:3.12-slim, non-root\n├── pyproject.toml           # Dependencies, build config, ruff/pytest settings\n├── .pre-commit-config.yaml  # Ruff, security, and standard hooks\n├── dashboard/                # Next.js dashboard (separate deployable → Vercel)\n│   ├── app/                  # App Router pages and API routes\n│   ├── components/           # React components\n│   ├── lib/                  # API client, types, fixture data\n│   └── README.md\n├── .env.example\n├── CHANGELOG.md\n└── README.md\n```\n\n---\n\n## Running locally\n\n### Prerequisites\n\n- Python 3.12+\n- An Anthropic API key (get one at [console.anthropic.com](https://console.anthropic.com))\n\n### Setup\n\n```bash\ngit clone https://github.com/jeffgreendesign/lead-enrichment-api\ncd lead-enrichment-api\n\npython -m venv .venv\nsource .venv/bin/activate  # Windows: .venv\\Scripts\\activate\n\npip install -e \".[dev]\"\n\ncp .env.example .env\n# Edit .env and add your ANTHROPIC_API_KEY\n```\n\n### Start the server\n\n```bash\n# Development (auto-reload, loads .env automatically)\nuvicorn src.lead_enrichment.main:app --reload --port 8080 --env-file .env\n\n# Production-equivalent\nuvicorn src.lead_enrichment.main:app --host 0.0.0.0 --port 8080\n```\n\n### Run with Docker\n\n```bash\ndocker build -t lead-enrichment-api .\n\ndocker run --rm \\\n  -p 8080:8080 \\\n  -e ANTHROPIC_API_KEY=sk-ant-... \\\n  lead-enrichment-api\n```\n\n---\n\n## Testing with curl\n\n**Fix-and-flip bridge loan lead:**\n\n```bash\ncurl -X POST http://localhost:8080/enrich \\\n  -H \"Content-Type: application/json\" \\\n  -d @fixtures/lead_bridge_fix_flip.json | jq\n```\n\n**Rental portfolio lead:**\n\n```bash\ncurl -X POST http://localhost:8080/enrich \\\n  -H \"Content-Type: application/json\" \\\n  -d @fixtures/lead_rental_portfolio.json | jq\n```\n\n**Sparse data (tests graceful handling):**\n\n```bash\ncurl -X POST http://localhost:8080/enrich \\\n  -H \"Content-Type: application/json\" \\\n  -d @fixtures/lead_sparse.json | jq\n```\n\n**All fixtures in sequence:**\n\n```bash\nfor f in fixtures/*.json; do\n  echo \"\\n── $f ──\"\n  curl -s -X POST http://localhost:8080/enrich \\\n    -H \"Content-Type: application/json\" \\\n    -d @$f | jq '.loan_type, .investor_experience, .urgency_score, .outreach_message'\ndone\n```\n\n**Interactive API docs:** [http://localhost:8080/docs](http://localhost:8080/docs)\n\n---\n\n## Postman\n\nA Postman collection is included at `postman/lead-enrichment-api.postman_collection.json` with all endpoints and fixture payloads pre-loaded. Import it into Postman and the `{{base_url}}` variable defaults to `http://localhost:8080`.\n\nTo regenerate the collection after changing endpoints, models, or fixtures:\n\n```bash\npython scripts/sync-postman.py\n```\n\nThis pulls the OpenAPI schema from the FastAPI app and combines it with every fixture in `fixtures/` to produce an up-to-date collection.\n\n---\n\n## Deploying to Cloud Run\n\n```bash\n# Build and push to Artifact Registry\ngcloud builds submit \\\n  --tag us-central1-docker.pkg.dev/YOUR_PROJECT/YOUR_REPO/lead-enrichment-api:latest\n\n# Deploy\ngcloud run deploy lead-enrichment-api \\\n  --image us-central1-docker.pkg.dev/YOUR_PROJECT/YOUR_REPO/lead-enrichment-api:latest \\\n  --region us-central1 \\\n  --platform managed \\\n  --set-env-vars ANTHROPIC_API_KEY=sk-ant-... \\\n  --allow-unauthenticated \\\n  --memory 512Mi \\\n  --cpu 1 \\\n  --min-instances 0 \\\n  --max-instances 10\n```\n\nFor production, inject `ANTHROPIC_API_KEY` from Secret Manager rather than `--set-env-vars`:\n\n```bash\ngcloud run deploy lead-enrichment-api \\\n  --image ... \\\n  --set-secrets ANTHROPIC_API_KEY=anthropic-api-key:latest\n```\n\n---\n\n## Snowflake Setup (Snowpipe)\n\nEnriched leads are written to GCS on every successful `/enrich` call. Snowpipe auto-ingests these files into Snowflake for analytics.\n\n### Prerequisites\n\n- `GCS_ENRICHMENT_BUCKET` environment variable set on the Cloud Run service\n- Snowflake account with `ACCOUNTADMIN` role\n\n### Steps\n\n1. **Run the setup SQL** — paste and run everything from [`snowflake/setup.sql`](snowflake/setup.sql) in a Snowflake worksheet, through the pipe creation. Stop before `MANUAL STEP 1`.\n\n2. **Get the Snowflake service account**\n\n   ```sql\n   DESC INTEGRATION gcs_lead_enrichment;\n   ```\n\n   Copy the `STORAGE_GCP_SERVICE_ACCOUNT` value (looks like `xxxx@gcpuscentral1-xxxx.iam.gserviceaccount.com`).\n\n3. **Grant it access in GCP** (run in your terminal)\n\n   ```bash\n   gcloud storage buckets add-iam-policy-binding gs://lead-enrichment-output \\\n     --member=\"serviceAccount:\u003cSTORAGE_GCP_SERVICE_ACCOUNT_FROM_STEP_2\u003e\" \\\n     --role=\"roles/storage.objectViewer\"\n   ```\n\n4. **Test the stage can read your files**\n\n   ```sql\n   LIST @martech.gcs_leads_stage;\n   ```\n\n   You should see your enriched lead JSON files listed.\n\n5. **Get the Pub/Sub notification channel**\n\n   ```sql\n   SHOW PIPES LIKE 'lead_enrichment_pipe' IN SCHEMA martech;\n   ```\n\n   Copy the `notification_channel` value from the output.\n\n6. **Create the GCS notification** (run in your terminal)\n\n   ```bash\n   gsutil notification create \\\n     -t \u003cnotification_channel_from_step_5\u003e \\\n     -f json \\\n     -e OBJECT_FINALIZE \\\n     gs://lead-enrichment-output\n   ```\n\n7. **Manually load existing files** — Snowpipe only auto-ingests new files, so refresh to pick up any that already exist:\n\n   ```sql\n   ALTER PIPE martech.lead_enrichment_pipe REFRESH;\n   ```\n\n8. **Verify**\n\n   ```sql\n   -- Check pipe status\n   SELECT SYSTEM$PIPE_STATUS('martech.lead_enrichment_pipe');\n\n   -- Check data (wait ~60s after refresh)\n   SELECT lead_id, loan_type, urgency_score, ingested_at\n   FROM martech.raw_webhook_events\n   ORDER BY ingested_at DESC;\n   ```\n\n---\n\n## Sample enriched response\n\n```json\n{\n  \"lead_id\": \"lead_001\",\n  \"email\": \"marcus.bellamy@example-investors.com\",\n  \"first_name\": \"Marcus\",\n  \"last_name\": \"Bellamy\",\n  \"loan_type\": \"bridge_rtl\",\n  \"investor_experience\": \"experienced\",\n  \"urgency_score\": 5,\n  \"outreach_message\": \"Marcus, with 21 days to close we can move fast — our bridge product closes in as little as 10 business days with same-day term sheets. Let's talk today.\",\n  \"classification_rationale\": \"Lead explicitly mentions a 21-day closing requirement and references prior flips, indicating high urgency and experienced investor status. ARV provided confirms fix-and-flip intent.\",\n  \"raw\": { \"...\": \"original payload\" },\n  \"metadata\": {\n    \"enriched_at\": \"2026-03-07T09:15:42Z\",\n    \"model\": \"claude-sonnet-4-6\",\n    \"schema_version\": \"1.0\",\n    \"input_tokens\": 847,\n    \"output_tokens\": 215\n  }\n}\n```\n\n---\n\n## AI governance pattern\n\nThe validation flow is worth calling out explicitly because it's the part most teams skip. The `LLMClassification` model isn't just parsing — it's enforcing a contract:\n\n- `urgency_score` must be an integer between 1 and 5. The LLM can't return `\"high\"` or `null`.\n- `outreach_message` has a character limit and a validator that rejects unfilled template placeholders. If the model returns `\"Hi [Name], ...\"` it's a hard failure.\n- `loan_type` and `investor_experience` are enums. Hallucinated values fail immediately.\n\nWhen validation fails, the API returns a `422` with `error: \"ai_output_validation_failed\"` rather than silently forwarding bad data downstream. That failure mode is part of the design — it surfaces model drift at the API boundary rather than in the CDP or email platform.\n\n---\n\n## Fixtures\n\nThe `fixtures/` directory contains seven sample payloads covering common and edge-case scenarios in real estate investment lending:\n\n| File | Scenario |\n|------|----------|\n| `lead_bridge_fix_flip.json` | Experienced fix-and-flip investor, hard deadline, single-family in Atlanta |\n| `lead_rental_portfolio.json` | Seasoned landlord, DSCR refi, 11-unit portfolio in Cleveland |\n| `lead_first_time_vague.json` | First-time investor, unclear strategy, Phoenix |\n| `lead_commercial_bridge.json` | Value-add multi-family bridge, Tampa, $2.8M |\n| `lead_sparse.json` | Minimal data — tests graceful degradation |\n| `lead_contradictory.json` | Mixed flip + hold signals — tests classifier edge case handling |\n| `lead_experienced_rental_sfr.json` | Clean DSCR rental, tenant in place, Phoenix |\n\n---\n\n## Related projects\n\nThis sits alongside other AI tooling and developer automation work I'm building:\n\n- **[textrawl](https://github.com/jeffgreendesign/textrawl)** — Web-to-markdown conversion optimized for LLM workflows and Obsidian\n- **[logpare](https://github.com/jeffgreendesign/logpare)** — Log parsing and analysis tooling\n- **[guardrail-sim](https://github.com/jeffgreendesign/guardrail-sim)** — Simulation tooling for AI safety and output governance patterns\n- **[hirejeffgreen.com](https://hirejeffgreen.com)** — Portfolio and API-first developer presence\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjeffgreendesign%2Flead-enrichment-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjeffgreendesign%2Flead-enrichment-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjeffgreendesign%2Flead-enrichment-api/lists"}