{"id":50969342,"url":"https://github.com/r0han01/snowflake-cortex-freight-agent","last_synced_at":"2026-06-19T00:30:39.131Z","repository":{"id":357059715,"uuid":"1235184257","full_name":"r0han01/snowflake-cortex-freight-agent","owner":"r0han01","description":"A Snowflake Cortex Agent for natural-language analytics on the U.S. DOT FAF5 freight dataset (2018-2024). End-to-end implementation of Cortex Analyst (NL → SQL), Cortex Search (RAG), and Cortex Agents (orchestration) over a 7.8M-row fact table.","archived":false,"fork":false,"pushed_at":"2026-05-11T05:04:59.000Z","size":58,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-11T07:09:18.405Z","etag":null,"topics":["ai-agents","cortex-agents","cortex-analyst","cortex-search","data-analytics","llm","rag","snowflake","snowflake-cortex","text-to-sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/r0han01.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-11T04:58:56.000Z","updated_at":"2026-05-11T05:05:03.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/r0han01/snowflake-cortex-freight-agent","commit_stats":null,"previous_names":["r0han01/snowflake-cortex-freight-agent"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/r0han01/snowflake-cortex-freight-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r0han01%2Fsnowflake-cortex-freight-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r0han01%2Fsnowflake-cortex-freight-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r0han01%2Fsnowflake-cortex-freight-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r0han01%2Fsnowflake-cortex-freight-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/r0han01","download_url":"https://codeload.github.com/r0han01/snowflake-cortex-freight-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r0han01%2Fsnowflake-cortex-freight-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34513020,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","cortex-agents","cortex-analyst","cortex-search","data-analytics","llm","rag","snowflake","snowflake-cortex","text-to-sql"],"created_at":"2026-06-19T00:30:38.555Z","updated_at":"2026-06-19T00:30:39.097Z","avatar_url":"https://github.com/r0han01.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg width=\"1584\" height=\"684\" alt=\"image\" src=\"https://github.com/user-attachments/assets/869bf372-c6ff-46fa-9325-fb4ecd16e1ad\" /\u003e\n\n# Snowflake Cortex AI on FAF5 Freight Data\n\nA Snowflake Cortex Agent that answers natural-language questions about U.S. freight movements (2018–2024). End-to-end implementation using Snowflake's Cortex AI stack: **Cortex Analyst** (NL → SQL), **Cortex Search** (hybrid semantic + keyword retrieval), and **Cortex Agents** (orchestration).\n\nThe agent runs over a 7.8M-row fact table of state-level freight flows from the U.S. DOT Freight Analysis Framework, with retrieval over 50 freight analyst reports for cross-modal questions that need both structured numbers and narrative context.\n\n---\n\n## Data source\n\nThe freight dataset is the **Freight Analysis Framework version 5.7.1 (FAF5)**, published by:\n\n\u003e **U.S. Department of Transportation**\n\u003e Federal Highway Administration (FHWA)\n\u003e Bureau of Transportation Statistics (BTS)\n\u003e [bts.gov/faf](https://www.bts.gov/faf) · [faf.ornl.gov/faf5](https://faf.ornl.gov/faf5/)\n\nLicense: U.S. federal government public domain.\nFile used: state-level OD totals for 2018–2024 (~309 MB CSV, 1.1M wide rows pivoted to 7.8M long rows).\n\nThe 50 freight analyst reports in this project are **synthetic** — written to exercise Cortex Search retrieval against realistic prose. They are not real internal documents.\n\n---\n\n## Tech stack\n\n- **Snowflake** — database, compute, and Cortex AI services\n- **Snowflake Cortex Analyst** — NL → SQL via a YAML semantic model\n- **Snowflake Cortex Search** — managed hybrid (semantic + keyword) retrieval, default Arctic Embed model\n- **Snowflake Cortex Agents** — orchestrates Analyst + Search behind one chat interface\n- **Snowflake Intelligence** — Snowsight chat UI for end-user interaction\n- **Python** (`pandas`, `pyarrow`, `openpyxl`) — one-time data-prep utilities\n\n---\n\n## What's built\n\n| Layer | What | Status |\n|---|---|---|\n| Data prep | Wide CSV → long Parquet, with row/measure-sum validation | ✅ |\n| Snowflake infra | Database, warehouse, role, schemas, stages | ✅ |\n| Data load | 7.8M-row fact + 50 reports + 6 lookups | ✅ |\n| Enriched view | Fact joined to all 6 lookups (human-readable labels) | ✅ |\n| Cortex Search | Hybrid search over the 50 reports | ✅ |\n| Cortex Analyst | YAML semantic model — 8 dimensions, 6 measures, 2 named filters | ✅ |\n| Cortex Agent | Both tools wired into one Snowflake Intelligence agent | ✅ |\n\nRoadmap (not yet built): custom tools (forecast / anomaly detection / brief generation), a graded eval set, demo polish.\n\n---\n\n## Folder layout\n\n```\nProject/\n├── lookups/             6 reference CSVs (state codes, commodity codes, modes, ...)\n├── scripts/             Python scripts for one-time data prep\n├── snowflake/           SQL files for Snowflake setup, load, view, search\n├── semantic_models/     Cortex Analyst YAML semantic model\n└── notes/               Detailed project plan + progress log\n```\n\nEach folder has its own `README.md`.\n\n---\n\n## How to run it\n\n**Prerequisites**\n- A Snowflake account with Cortex AI enabled (trial works)\n- Python 3.9+ for the data-prep scripts\n- The raw FAF5 CSV + the FAF5 metadata Excel file from BTS / ORNL\n\n**Order**\n1. `python3 scripts/build_lookups.py` — extract 6 lookup CSVs from the FAF5 metadata workbook\n2. `python3 scripts/pivot_to_long.py` — pivot wide CSV to long Parquet\n3. Upload the Parquet + reports CSV + 6 lookup CSVs to the `DATA_STAGE` in Snowsight\n4. In Snowsight, run `snowflake/01_setup_infra.sql` through `04_create_search.sql` in order, as `ACCOUNTADMIN`\n5. Upload `semantic_models/freight_semantic_model.yaml` to the `YAML_STAGE`\n6. In Snowsight, create an Agent and wire both tools (`cortex_analyst_freight` + `freight_reports_search`)\n7. Open the agent in Snowflake Intelligence and ask freight questions\n\nFull step-by-step — with every validation check, gotcha, and copy-paste field value — is in [`notes/README.md`](notes/README.md).\n\n---\n\n## Credits\n\n- **U.S. Bureau of Transportation Statistics** — for the FAF5 dataset\n- **Snowflake** — for the Cortex AI platform and the [getting-started-with-cortex-agents](https://github.com/Snowflake-Labs/sfguide-getting-started-with-cortex-agents) quickstart that this project's architecture follows\n\n## License\n\n[MIT](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fr0han01%2Fsnowflake-cortex-freight-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fr0han01%2Fsnowflake-cortex-freight-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fr0han01%2Fsnowflake-cortex-freight-agent/lists"}