An open API service indexing awesome lists of open source software.

https://github.com/eziodevio/ai-knowledge-bot

his is my own custom-built offline AI bot that lets you chat with PDFs and web pages using **local embeddings** and **local LLMs** like LLaMA 3. I built it step by step using LangChain, FAISS, HuggingFace, and Ollama โ€” without relying on OpenAI or DeepSeek APIs anymore (they just kept failing or costing too much)
https://github.com/eziodevio/ai-knowledge-bot

ai-chatbot chat-with-pdf chat-with-webpage document-summarization eziodevio faiss huggingface-embeddings langchain llama3 local-llm offline-ai ollama rag streamlit vectorstore

Last synced: 4 months ago
JSON representation

his is my own custom-built offline AI bot that lets you chat with PDFs and web pages using **local embeddings** and **local LLMs** like LLaMA 3. I built it step by step using LangChain, FAISS, HuggingFace, and Ollama โ€” without relying on OpenAI or DeepSeek APIs anymore (they just kept failing or costing too much)

Awesome Lists containing this project

README

          

![MIT License](https://img.shields.io/badge/license-MIT-blue)
![Built with LangChain](https://img.shields.io/badge/Built%20with-LangChain-4b7bec)
![Offline AI](https://img.shields.io/badge/LLM-Ollama-green)
![last commit](https://img.shields.io/github/last-commit/EzioDEVio/ai-knowledge-bot?color=blue)
![repo size](https://img.shields.io/github/repo-size/EzioDEVio/ai-knowledge-bot)
![GitHub issues](https://img.shields.io/github/issues/EzioDEVio/ai-knowledge-bot)
![Forks](https://img.shields.io/github/forks/EzioDEVio/ai-knowledge-bot?style=social)
![Stars](https://img.shields.io/github/stars/EzioDEVio/ai-knowledge-bot?style=social)
![PRs](https://img.shields.io/github/issues-pr/EzioDEVio/ai-knowledge-bot)

# ๐Ÿง  AI Knowledge Bot

This is my own custom-built offline AI bot that lets you chat with PDFs and web pages using **local embeddings** and **local LLMs** like LLaMA 3.

I built it step by step using LangChain, FAISS, HuggingFace, and Ollama โ€” without relying on OpenAI or DeepSeek APIs anymore (they just kept failing or costing too much).

---

## ๐Ÿš€ Features

- ๐Ÿ“„ Chat with uploaded PDF files
- ๐ŸŒ Ask questions about a webpage URL
- ๐Ÿง  Uses local HuggingFace embeddings (`all-MiniLM-L6-v2`)
- ๐Ÿฆ™ Powered by Ollama + LLaMA 3 (fully offline LLM)
- ๐Ÿ—ƒ๏ธ Built-in FAISS vectorstore
- ๐Ÿงพ PDF inline preview
- ๐Ÿงฎ Built-in calculator + summarizer tools (via LangChain agents)
- ๐Ÿง  Page citation support (know where each answer came from)
- ๐Ÿ“œ Chat history viewer with download button (JSON)
- ๐ŸŽ›๏ธ Simple Streamlit UI with dark/light mode toggle
- ๐Ÿ‘จโ€๐Ÿ’ป Footer credit: *Developed by EzioDEVio*

---

## ๐Ÿ“ฆ Tech Stack

- `langchain`, `langchain-community`
- `sentence-transformers` for local embeddings
- `ollama` for local LLMs (`llama3`)
- `PyPDF2` for PDF parsing
- `FAISS` for vector indexing
- `Streamlit` for frontend

---

## ๐Ÿ›  Setup Guide

### 1. Clone this repo

```bash
git clone https://github.com/EzioDEVio/ai-knowledge-bot.git
cd ai-knowledge-bot
````

---

### 2. Create and activate virtualenv (optional but recommended)

```bash
python -m venv venv
.\venv\Scripts\activate # Windows for Mac is different
```

---

### 3. Install dependencies

```bash
pip install -r requirements.txt
```

Make sure `sentence-transformers` is installed โ€” needed for local embeddings.

---

### 4. Install Ollama (for local LLM)

Download and install from:

๐Ÿ‘‰ [https://ollama.com/download](https://ollama.com/download)

After installation, verify:

```bash
ollama --version
```

Then pull and run the model:

```bash
ollama run llama3
```

> This will download the LLaMA 3 model (approx. 4โ€“8GB). You can also try `mistral`, `codellama`, etc.

---

### 5. Run the app

```bash
streamlit run app.py
```

The app will open at:

```
http://localhost:8501
```

---

## ๐Ÿ“ Folder Structure

```
ai-knowledge-bot/
โ”œโ”€โ”€ app.py # Main Streamlit UI
โ”œโ”€โ”€ backend/
โ”‚ โ”œโ”€โ”€ pdf_loader.py # PDF text extraction
โ”‚ โ”œโ”€โ”€ web_loader.py # Webpage scraper
โ”‚ โ”œโ”€โ”€ vector_store.py # Embedding + FAISS
โ”‚ โ””โ”€โ”€ qa_chain.py # LLM QA logic (Ollama + tools)
โ”œโ”€โ”€ .env # Not used anymore (was for API keys)
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md
```

---

## โœ… Working Setup Summary

| Component | Mode |
| ---------------- | ------------------------------------ |
| Embeddings | Local (`HuggingFace`) |
| Vectorstore | Local (`FAISS`) |
| LLM Response | Local (`Ollama` + `llama3`) |
| Internet Needed? | โŒ Only for first-time model download |

---

## โš ๏ธ Why I Avoided OpenAI / DeepSeek

* **OpenAI** failed with `RateLimitError` and quota issues unless I added billing.
* **DeepSeek** embedding endpoints didnโ€™t work โ€” only chat models supported.

So I switched to:

* ๐Ÿ” Local `HuggingFaceEmbeddings` for vectorization
* ๐Ÿฆ™ `ChatOllama` for full offline AI answers

---

## โœ… Now Completed Features

* โœ… PDF upload + preview
* โœ… URL content QA
* โœ… Chat history with page citations
* โœ… Calculator + summarizer tools
* โœ… Footer attribution
* โœ… JSON export
* โœ… 100% offline functionality

---

## ๐Ÿณ Run with Docker (Secure Production Mode)

Build and run the app securely using a **multi-stage Dockerfile**:

1. Build the container

```bash
docker build -t ai-knowledge-bot .
```

2. Run the container
Make sure Ollama is running on the host, open up a powershell or in different terminal then:
```
docker run -p 8501:8501 \
--add-host=host.docker.internal:host-gateway \
ai-knowledge-bot
```
---
## ๐Ÿ” Dockerfile Security Highlights
โœ… Multi-stage build (separates dependencies from runtime)

โœ… Minimal base (python:3.10-slim)

โœ… Non-root appuser by default

โœ… .env, venv, logs excluded via .dockerignore

โœ… Exposes only necessary port (8501)

โœ… Automatically starts Streamlit app

---
## ๐Ÿ’ฌ License

MIT โ€” feel free to fork, use, or improve it.

---

## ๐Ÿ”ฅ Built by EzioDEVio | ๐Ÿ‡ฎ๐Ÿ‡ถ | ๐Ÿง 

From concept to offline AI โ€” all step by step.

---