https://github.com/sortphy/chatgpdune
Dune themed RAG based LLM ChatBot. | Using Ollama, DeepSeek, Neo4J and LangChain
https://github.com/sortphy/chatgpdune
chatbot deepseek dune langchain llm neo4j ollama rag
Last synced: about 1 year ago
JSON representation
Dune themed RAG based LLM ChatBot. | Using Ollama, DeepSeek, Neo4J and LangChain
- Host: GitHub
- URL: https://github.com/sortphy/chatgpdune
- Owner: sortphy
- License: mit
- Created: 2025-06-24T23:21:48.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-03T02:11:19.000Z (about 1 year ago)
- Last Synced: 2025-07-03T03:23:55.730Z (about 1 year ago)
- Topics: chatbot, deepseek, dune, langchain, llm, neo4j, ollama, rag
- Language: Python
- Homepage:
- Size: 9.61 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ChatGPDune
### Dune themed RAG based LLM ChatBot. | Using Ollama, DeepSeek, Neo4J and LangChain
#### Grupo: Gustavo Henrique, Icaro Botelho, Maruan Biasi, Mauricio Nunes
-------------------------------
# To run:
## Ollama Setup:
- Install Ollama from https://ollama.com/download
- Pull whatever model you want to use, by default the project uses only deepseek-r1, you can pull it using the following command:
- ```pull deepseek-r1:latest```
## Neo4j Setup:
- Install neo4j desktop from https://neo4j.com/download/
- Open Neo4j desktop and create a database, preferably called chatgpdune
- When creating the database, make sure to add your user and password to the .env
## Project Setup
- clone git repo
- create and activate venv via ```python -m venv venv```
- install python requirements via ```pip install -r requirements.txt```
## Database Setup
- you need a populated database to use the RAG
- the easiest way is to import the pre-processed embeddings from the Dune 1 book using the csv file
- open neo4j desktop, connect to the database and import the file located inside our project at /database/Ingested/book-1-only/node-export.csv
- This csv file contains all embeddings from the Dune 1 book, which would take hours to process.
- If you want more data inside you database, which is recommended, follow the tutorial below on how to process data locally.
## Run Backend
- from project root do the following
- ```cd backend```
- ```uvicorn app:app --reload```
## Run Frontend
- from project root do the following
- ```cd frontend/chatgpdune```
- ```npm i```
- ```npm run start```
## (Optional) How to process data locally for the RAG
- To process data locally, which is basically generate the embeddings for chunks of text using your own machine, do the following
- Pull the nomic-embed-text model via ollama using ```ollama pull nomic-embed-text```
- ```cd RAG```
- Everything inside the "data" folder will be processed and it's embeddings will the added to the database.
- If you want to ignore a file, which means, skip it's embeddings, you can put it inside /data/ignore. All other children folders inside /data will be processed, only /data/ignore wont.
- It supports the following file formats: [.txt, .pdf, .html, .htm, .md, .markdown]
- It also does webscraping from the dune fandom wiki, you can choose to turn this feature on or off when you run the data ingestion system.
- Inside the file "data_ingestion.py" you can customize different parameters on the "Tweakable settings" section.
- On line 294, you can customize how many wiki pages to scrape.
- To run the data ingestion system, use the following command: ```python data_ingestion.py```