https://github.com/vladkol/crm-data-agent

CRM Data Q&A Agent - Advanced RAG with NL2SQL over Salesforce Data
https://github.com/vladkol/crm-data-agent

adk agents bigquery nl2sql salesforce vertex-ai

Last synced: about 1 month ago
JSON representation

CRM Data Q&A Agent - Advanced RAG with NL2SQL over Salesforce Data

Host: GitHub
URL: https://github.com/vladkol/crm-data-agent
Owner: vladkol
License: apache-2.0
Created: 2025-05-03T00:56:49.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2025-05-03T01:13:35.000Z (about 1 month ago)
Last Synced: 2025-05-03T02:34:31.307Z (about 1 month ago)
Topics: adk, agents, bigquery, nl2sql, salesforce, vertex-ai
Language: Python
Homepage:
Size: 979 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# CRM Data Q&A Agent - Advanced RAG with NL2SQL over Salesforce Data

This is a 📊 Data Analytics Agent that grounds its conversation in Salesforce data replicated to a Data Warehouse in [BigQuery](https://cloud.google.com/bigquery).

The agent demonstrates an advanced [Retrieval-Augmented Generation](https://cloud.google.com/use-cases/retrieval-augmented-generation) workflow
in a multi-agentic system with contextualized Natural-Language-to-SQL
components powered by Long Context and In-Context Learning capabilities of [Gemini 2.5 Pro](https://deepmind.google/technologies/gemini).

The agent is built with [Google Agent Development Kit](https://google.github.io/adk-docs/).

* The agent interprets questions about state of the business how it's reflected in CRM rather than directly referring to Salesforce data entities.
* It generates SQL query to gather data necessary for answering the question
* It creates interactive [Vega Lite 4](https://vega.github.io/vega-lite-v4/) diagrams.
* It analyzes the results, provides key insights and recommended actions.

## 🕵🏻‍♀️ Simple questions are complex

Top 5 customers by impact in the US this year

### Examples of questions the agent can answer

* "Top 5 customers in every country"
* "What are our best lead sources?"
* or more specific "What are our best lead sources by value?"
* Lead conversion trends in the US.

### Screenshot

What are our best lead source in every country?

### High-Level Design

What are our best lead source in every country?

## 🚀 Deploy and Run

* Clone this repository:

```bash
git clone https://github.com/vladkol/crm-data-agent && cd crm-data-agent
```

* Create a Python virtual Environment

> [uv](https://docs.astral.sh/uv/) makes it easy: `uv venv .venv --python 3.11 && source .venv/bin/activate`

* Install dependencies

`pip install -r src/requirements.txt`

or, with `uv`:

`uv pip install -r src/requirements.txt`

* Create `.env` file in `src` directory. Set configuration values as described below.

> [src/.env-template](src/.env-template) is a template to use for your `.env` file.

### Configuration variables

> `src/.env` must be created and variables specified before taking further steps in deployment, local or cloud.

`GOOGLE_CLOUD_PROJECT` - [REQUIRED] Project Id of a Google Cloud Project that will be used with Vertex AI (and Cloud Run if deployed).

`GOOGLE_CLOUD_LOCATION` - [REQUIRED] Google Cloud Region to use with Vertex AI (and Cloud Run if deployed).

`AI_STORAGE_BUCKET` - [REQUIRED] Cloud Storage Bucket for ADK Asset Service and for staging Vertex AI assets.

`BQ_PROJECT_ID` - *[OPTIONAL]* Project Id of a Google Cloud Project that will be used for running BigQuery query jobs. If empty, `GOOGLE_CLOUD_PROJECT` value will be used.

`BQ_LOCATION` - [REQUIRED] BigQuery location of the Salesforce datasets.

`SFDC_DATA_PROJECT_ID` - *[OPTIONAL]* Project Id of a Google Cloud Project of the Salesforce dataset. If empty, `BQ_LOCATION` value will be used.

`SFDC_BQ_DATASET` - [REQUIRED] Name of the Salesforce dataset (in project *SFDC_DATA_PROJECT_ID*).

`SFDC_METADATA_FILE` - *[OPTIONAL]* Salesforce Metadata file (do not change this value if using the demo data).

> If you are deploying a demo, leave `BQ_PROJECT_ID` and `SFDC_DATA_PROJECT_ID` empty.
> All resources will be created in GOOGLE_CLOUD_PROJECT project.

**If you deploy the agent to Cloud Run**, its service account must have the following roles:

* BigQuery Job User (`roles/bigquery.jobUser`) in BQ_PROJECT_ID project (or GOOGLE_CLOUD_PROJECT, if BQ_PROJECT_ID is empty).
* BigQuery Data Viewer (`roles/bigquery.dataViewer`) for SFDC_BQ_DATASET dataset.
* Storage Object User (`roles/storage.objectUser`) for AI_STORAGE_BUCKET bucket.
* Vertex AI User (`roles/aiplatform.user`) in GOOGLE_CLOUD_PROJECT project.

### Enable APIs in your project

```bash
gcloud services enable \
aiplatform.googleapis.com \
cloudbuild.googleapis.com \
run.googleapis.com \
--project=[GOOGLE_CLOUD_PROJECT]
```

> Replace `[GOOGLE_CLOUD_PROJECT]` with GOOGLE_CLOUD_PROJECT value you put in `src/.env` file.

### Create Vertex AI Agent Engine Instance

Run `utils/get_agent_engine.py` script.

> The script will modify `src/.env` file by adding `AGENT_ENGINE_ID` variable.

### Deploy Salesforce Data

#### Demo data

Run `utils/deploy_demo_data.py` script.

> **Note**: Demo data contains records dated 2020-2022. If you ask questions with "last year" or "6 months ago", they will likely return no data.

#### Real Salesforce Data

Create a [BigQuery Data Transfer for Salesforce](https://cloud.google.com/bigquery/docs/salesforce-transfer).

Make sure you transfer the following objects:

* Account
* Case
* CaseHistory
* Contact
* CurrencyType
* DatedConversionRate
* Event
* Lead
* Opportunity
* OpportunityHistory
* RecordType
* Task
* User

#### Deployment with your custom Salesforce.com metadata

*COMING SOON!*

This will allow you to use your customized metadata in addition to analyzing your real data replicated to BigQuery.

### Run Locally

* Run `.\run_local.sh`
* Open `http://localhost:8080` in your browser.

#### Deploy and Run in Cloud Run

* Run `.\deploy_to_cloud_run.sh`

> This deployment uses default Compute Service Account for Cloud Run.
To make changes in how the deployment is done, adjust `gcloud` command in [deploy_to_cloud_run.py](utils/deploy_to_cloud_run.py)

**Cloud Run Authentication Note**:

By default, this script deploys a Cloud Run service that requires authentication.
You can switch to unauthenticated mode in [Cloud Run](https://console.cloud.google.com/run) or configure a [Load Balancer and Identity Access Proxy](https://cloud.google.com/iap/docs/enabling-cloud-run) (recommended).

## 📃 License

This repository is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.

## 🗒️ Disclaimers

This is not an officially supported Google product. This project is not eligible for the [Google Open Source Software Vulnerability Rewards Program](https://bughunters.google.com/open-source-security).

Code and data from this repository are intended for demonstration purposes only. It is not intended for use in a production environment.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vladkol/crm-data-agent

Awesome Lists containing this project

README