https://github.com/ricoledan/deloitte-insightbot

💬 Retrieval Augmented Generation (RAG) system based on Deloitte's Weekly Global Economic Update
https://github.com/ricoledan/deloitte-insightbot

chromadb deloitte langchain llm python retrieval-augmented-generation

Last synced: 29 days ago
JSON representation

💬 Retrieval Augmented Generation (RAG) system based on Deloitte's Weekly Global Economic Update

Host: GitHub
URL: https://github.com/ricoledan/deloitte-insightbot
Owner: Ricoledan
Created: 2024-09-09T14:00:52.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-09-13T16:16:12.000Z (almost 2 years ago)
Last Synced: 2025-03-28T07:13:18.536Z (about 1 year ago)
Topics: chromadb, deloitte, langchain, llm, python, retrieval-augmented-generation
Language: Python
Homepage:
Size: 19.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Deloitte-Insightbot

## Overview

The `deloitte-insightbot` is a question and answer system designed to provide insights based on Deloitte's weekly

economic updates. These updates offer a brief overview of the global political and economic situation, summarizing key

impacts and trends.

## Features

- **Data Ingestion**: Fetches content from Deloitte's weekly economic

  update [URL](https://www2.deloitte.com/us/en/insights/economy/global-economic-outlook/weekly-update/weekly-update-2023-10.html?icid=archive_click).

- **Embeddings Storage**: Stores embeddings of the content in a VectorDB.

- **Retrieval-Augmented Generation**: Retrieves relevant passages to generate answers for user queries using an LLM.

## Components

- **Data Ingestion**: A module to scrape and parse content from the specified URL.

    - `UnstructuredURLLoader` class to fetch and parse the content from the URL.

- **Embeddings Model**: Utilizes an embedding model to convert content into vector representations.

    - `OpenAIEmbeddings` model with the model name `text-embedding-3-large`.

- **VectorDB**: Stores the embeddings for efficient retrieval.

    - `Chroma` class from langchain_chroma is used to interact with ChromaDB.

- **LLM**: Generates answers based on the retrieved passages.

    - ChatOpenAI class with the model name `gpt-3.5-turbo`.

## Usage

1. **Ingest Data**: Run the data ingestion script to fetch and parse the content.

2. **Store Embeddings**: Use the embeddings model to convert the content into vectors and store them in the VectorDB.

3. **Query System**: Input a user query to retrieve relevant passages and generate an answer using the LLM.

## Commands

Install the required packages

```bash

pip install -r requirements.txt

```

Start the ChromaDB container

```bash

docker compose up -d

```

Ping the ChromaDB container to check if it is running

```bash

curl localhost:8000/api/v1/heartbeat

```

Run the application

```bash

python src/main.py

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ricoledan/deloitte-insightbot

Awesome Lists containing this project

README