Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wojciechpolak/ml-playground

Personal ML Playground (RAG, LangChain, LlamaIndex, Ollama)
https://github.com/wojciechpolak/ml-playground

langchain llamaindex ml ollama rag

Last synced: 26 days ago
JSON representation

Personal ML Playground (RAG, LangChain, LlamaIndex, Ollama)

Host: GitHub
URL: https://github.com/wojciechpolak/ml-playground
Owner: wojciechpolak
License: mit
Created: 2024-10-16T17:48:14.000Z (3 months ago)
Default Branch: master
Last Pushed: 2024-10-16T17:50:12.000Z (3 months ago)
Last Synced: 2024-10-18T16:29:14.580Z (3 months ago)
Topics: langchain, llamaindex, ml, ollama, rag
Language: Python
Homepage:
Size: 213 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# ML Playground

This is a personal playground for experimenting with Machine
Learning tools, including Retrieval-Augmented Generation (RAG)
using either [LangChain](https://www.langchain.com/) or
[LlamaIndex](https://www.llamaindex.ai/).
This project is not intended for general use.

## Prerequisites

* Install and run [Ollama](https://ollama.com/)
* Download the necessary models (e.g. llama3.2)

## Installation

Clone the repository:

```shell
git clone https://github.com/wojciechpolak/ml-playground.git
```

Navigate to the project directory and set up a virtual environment:

```shell
cd ml-playground
python3 -m venv venv
source venv/bin/activate
poetry install
```

### Setting Up Environment Variables

Configure environment variables by creating a `.env` file
in the root directory. Here is an example of the variables
you may need:

```ini
APP_TITLE=My AI Assistant
CHROMA_HOST=
CHROMA_PATH=
CHROMA_PORT=
COLLECTION_NAME=default
COLLECTION_RESET=
DEBUG=
HF_EMBEDDING=
INDEXER_LOADER=dir|pdf|obsidian
INDEXER_SPLIT_DOCS=
LLAMA_INDEX_CACHE_DIR=./run/cache
NLTK_DATA=./run/nltk_data
OLLAMA_EMBEDDING=all-minilm
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3.2
OLLAMA_TEMPERATURE=0.75
PDF_INCLUDE_SOURCE=
PDF_SOURCE_LINK_PREFIX=
QE=
SENTENCE_TRANSFORMERS_HOME=./run/cache
TOKENIZERS_PARALLELISM=true
TOP_K=5
WHISPER_LANG=
WL_BEARER_TOKEN=
WL_COOKIE_TOKEN=
WL_DEPTH_LIMIT=0
```

## RAG Chat

This project includes RAG implementations for answering questions from
personal knowledge bases like Obsidian notes, leveraging LangChain or
LlamaIndex.

### Indexing

Choose between LangChain or LlamaIndex for indexing your knowledge
base:

```shell
python -m rag.ver_langchain.indexer ~/Obsidian/Notes/
python -m rag.ver_llamaindex.indexer ~/Obsidian/Notes/
```

To test the retriever:

```shell
python -m rag.ver_langchain.retriever "your query here"
```

### Embedding Models

Ollama Models:
* all-minilm
* mxbai-embed-large
* nomic-embed-text

HuggingFace Models:
* intfloat/multilingual-e5-small
* ipipan/silver-retriever-base-v1.1
* sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

### Running the Chat

Text-based chat version:

```shell
python -m rag.ver_langchain.chat_text
```

UI version using [Streamlit](https://streamlit.io/):

```shell
PYTHONPATH=. streamlit run rag/ver_langchain/chat_ui.py
```

## License

This project is licensed under the MIT License.
See the [LICENSE](LICENSE) file for more details.