https://github.com/isfarbaset/dsan-assistant
  
  
    This repository hosts an AI-powered Georgetown Course Finder, developed using LangChain and deployed as an AWS Lambda function. 
    https://github.com/isfarbaset/dsan-assistant
  
ai-assistant course-information-retrieval fastapi georgetown-university langchain python-development streamlit unified-search
        Last synced: 7 months ago 
        JSON representation
    
This repository hosts an AI-powered Georgetown Course Finder, developed using LangChain and deployed as an AWS Lambda function.
- Host: GitHub
- URL: https://github.com/isfarbaset/dsan-assistant
- Owner: isfarbaset
- License: mit
- Created: 2025-03-26T23:05:24.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-31T21:33:55.000Z (7 months ago)
- Last Synced: 2025-03-31T22:28:27.114Z (7 months ago)
- Topics: ai-assistant, course-information-retrieval, fastapi, georgetown-university, langchain, python-development, streamlit, unified-search
- Language: Python
- Homepage:
- Size: 76.1 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- 
            Metadata Files:
            - Readme: README.md
- License: LICENSE
 
Awesome Lists containing this project
README
          # DSAN Assistant
A LangGraph-based RAG (Retrieval-Augmented Generation) system for the Georgetown University Data Science and Analytics (DSAN) program. This repository showcases a design pattern for building and deploying LangGraph agents with a progression from local development to serverless deployment. We crawl the [DSAN](https://analytics.georgetown.edu/) website and use it to provide answers to general questions about different courses. Question such as _Is dsan 6000 a prereq for 6725?_, _is there a course on bioinformatics?_, _how many core courses are there?_ and so on and so forth.
We use the following tools and technologies:
1. [Firecrawl.dev](https://www.firecrawl.dev/) and ingest the [data](data/documents_1.json) in a local [`FAISS`](https://python.langchain.com/docs/integrations/vectorstores/faiss/) index.
1. [Amazon Bedrock](https://aws.amazon.com/bedrock/) for LLMs, Amazon API gateway and AWS Lambda for hosting.
1. [LangGraph](https://www.langchain.com/langgraph) for Agents and LangChain for RAG.
## Demo video

## Architecture Overview
```mermaid
graph LR
    %% Agent Building section
    A1["LangGraph"] --> D["FastAPI"]
    A2["Streamlit"] --> D
    A3["Amazon Bedrock"] --> D
    
    %% API & Packaging section
    D --> E["Docker Container"]
    
    %% AWS Deployment section
    E --> F["AWS Services"]
    F --> G1["AWS Lambda & Amazon API Gateway"]
    
    %% Subgraph definitions
    subgraph "Agent Building"
        A1
        A2
        A3
    end
    
    subgraph "API & Packaging"
        D
        E
    end
    
    subgraph "AWS Deployment"
        F
        G1
    end
    
    %% Styling
    classDef dev fill:#d1f0ff,stroke:#0077b6
    classDef mid fill:#ffe8d1,stroke:#b66300
    classDef aws fill:#ffd1e8,stroke:#b6007a
    
    class A,B,C1,C3 dev
    class D,E mid
    class F,G1 aws
```
This project demonstrates a complete workflow for developing and deploying AI agents:
1. **Local Development**: Build and test the agent locally
2. **FastAPI Server**: Convert the agent to a FastAPI application
3. **Docker Containerization**: Package the application in a Docker container
4. **AWS Lambda Deployment**: Deploy the containerized application to AWS Lambda with API Gateway
## Components
- **RAG System**: Uses LangChain, FAISS, and AWS Bedrock to provide information about Georgetown's DSAN program
- **LangGraph Agent**: ReAct agent pattern with tools for retrieving program information
- **Streamlit Frontend**: User-friendly chat interface for interacting with the agent
- **FastAPI Backend**: Serves the agent via HTTP endpoints
- **AWS Lambda Integration**: Serverless deployment with API Gateway
### Dev workflow
1. **Data Collection**:
   - Crawl data using firecrawl.dev. Save the data as JSON and place it as [`documents_1.json`](data/documents_1.json) in the `data` folder.
   - Place the crawled data in the data folder
2. **Index Building**:
   - Run build_index.py to create the FAISS vector index
3. **Local Testing**:
   - Run the FastAPI server with `langchain serve`
   - Test the API endpoints with the local webserver
   - Test the user interface with Streamlit
4. **Deployment**:
   - Run python deploy.py to deploy to AWS Lambda and API Gateway
   - Test the deployed application by running Streamlit with the API Gateway endpoint
```mermaid
graph TD
    A[Crawl data with firecrawl.dev] --> B[Place data in data folder]
    B --> C[Run build_index.py]
    C --> D[Run langchain server]
    D --> E[Test with FastAPI/uvicorn local webserver]
    E --> F[Test with Streamlit]
    F --> G[Run python deploy.py]
    G --> H[Deploy to API Gateway and Lambda]
    H --> I[Run Streamlit with API Gateway endpoint]
    
    classDef data fill:#d1f0ff,stroke:#0077b6
    classDef local fill:#ffe8d1,stroke:#b66300
    classDef deploy fill:#ffd1e8,stroke:#b6007a
    
    class A,B,C data
    class D,E,F local
    class G,H,I deploy
```
## Prerequisites
- Python 3.11+
- [uv](https://github.com/astral-sh/uv) for Python package management
- Docker (for containerization)
- AWS CLI configured with appropriate permissions
- AWS account with access to Bedrock and Lambda services
## Setup Instructions
### 1. Clone the Repository
```bash
git https://github.com/yourusername/dsan-assistant
cd dsan-assistant
```
### 2. Environment Setup
Create a `.env` file in the project root with your AWS credentials and configuration:
```
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
```
### 3. Install Dependencies with uv
This project uses `uv` for Python package management:
```bash
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
# Create a virtual environment and install dependencies
uv venv --python 3.11 && source .venv/bin/activate && uv pip install --requirement pyproject.toml
```
### 4. Build the Vector Index
Before running the application, you need to build the vector index from the source documents:
```bash
python build_index.py
```
This will create a FAISS index in the `indexes/dsan_index` directory.
## Running Locally
### Run the FastAPI Server
```bash
langchain serve
```
### Run the Streamlit Frontend
```bash
streamlit run chatbot.py -- --api-server-url http://localhost:8000/generate
```
## Setup LangSmith (Optional)
LangSmith will help us trace, monitor and debug LangChain applications.
You can sign up for LangSmith [here](https://smith.langchain.com/).
If you don't have access, you can skip this section
```shell
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=
export LANGCHAIN_PROJECT=  # if not specified, defaults to "default"
```
## Deployment Process
### 1. Build and Push Docker Image
The repository includes a script to build and push the Docker image to Amazon ECR:
```bash
chmod +x build_and_push.sh
./build_and_push.sh
```
### 2. Deploy to AWS Lambda
Use the deployment script to create or update the Lambda function and API Gateway:
```bash
python deploy.py --function-name dsan-assistant --role-arn YOUR_LAMBDA_ROLE_ARN --api-gateway
```
If you want to your Amazon Bedrock in a cross-account way i.e. the Lambda exists in say Account A but you want to use Amazon Bedrock in Account B then use the following command line
```bash
python deploy.py --function-name dsan-assistant --role-arn YOUR_LAMBDA_ROLE_ARN  --bedrock-role-arn YOUR_ACCOUNT_B_BEDROCK_ROLE_ARN--api-gateway
```
The IAM role you need to use for the AWS Lambda needs to have Amazon Bedrock access (for example via [`AmazonBedrockFullAccess`](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonBedrockFullAccess.html)) to use the models available via Amazon Bedrock and the models need to be enabled within your AWS account, see instructions available [here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html).
This will:
1. Create/update a Lambda function using the Docker image
2. Set up an API Gateway with appropriate routes
3. Configure permissions and API keys
4. Output the deployed API URL
### 3. Connect Streamlit to Deployed API
Once deployed, you can connect the Streamlit frontend to the deployed API:
```bash
streamlit run chatbot.py -- --api-server-url https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/prod/generate
```
## Project Structure
```
gtown-course-finder/
├── app/                      # FastAPI application
│   ├── __init__.py
│   └── server.py             # FastAPI server implementation
├── data/                     # Source data
│   └── documents_1.json      # DSAN program information
├── indexes/                  # Vector indexes
│   └── dsan_index/           # FAISS index for DSAN data
├── .env                      # Environment variables (not in repo)
├── .gitignore                # Git ignore file
├── build_and_push.sh         # Script to build and push Docker image
├── build_index.py            # Script to build vector index
├── chatbot.py                # Streamlit frontend
├── deploy.py                 # AWS Lambda deployment script
├── Dockerfile                # Docker configuration
├── dsan_rag_setup.py         # RAG system setup
├── pyproject.toml            # Project configuration
├── README.md                 # Project documentation
└── requirements.txt          # Python dependencies
```
## Key Features
- **Conversation Memory**: Maintains chat history for contextual responses
- **Vector Search**: FAISS-based retrieval for efficient document search
- **AWS Bedrock Integration**: Leverages AWS's foundation models
- **Cross-Account Access**: Supports cross-account access to AWS Bedrock
- **Streamlit UI**: User-friendly interface with Georgetown branding
## License
[LICENCE](./LICENSE)
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.