https://github.com/idaholab/ranger
This repository contains a Python-based auto-response bot that uses the GitHub API and LlamaIndex package to monitor and generate relevant responses for new discussions in the GitHub MOOSE repository.
https://github.com/idaholab/ranger
discussions moose vector-database
Last synced: about 1 month ago
JSON representation
This repository contains a Python-based auto-response bot that uses the GitHub API and LlamaIndex package to monitor and generate relevant responses for new discussions in the GitHub MOOSE repository.
- Host: GitHub
- URL: https://github.com/idaholab/ranger
- Owner: idaholab
- License: mit
- Created: 2025-09-04T15:29:46.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-12-16T18:02:12.000Z (6 months ago)
- Last Synced: 2025-12-20T09:06:54.693Z (6 months ago)
- Topics: discussions, moose, vector-database
- Language: Python
- Homepage:
- Size: 13.7 MB
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RANGER
This repository contains a Python-based auto-response bot that leverages the GitHub API and the open-source LlamaIndex package to automatically generate responses to discussions on GitHub. The bot monitors GitHub MOOSE repository discussions and provides the most relevant posts when a new discussion is initiated by users. We call it R.A.N.G.E.R. – "Responsive Assistant for Navigating and Guiding Engineering with Rigor"
## How It Works
The bot uses the GitHub API to fetch discussions from a MOOSE repository and store the data in a vector database. When a new discussion is initiated, the algorithm compares the discussion title with the content of all previous discussions (title + discussions) in the database and provides the most relevant posts to the user. The database is updated regularly to include all new posts, potentially on a monthly basis.
## Repository Contents
The repository includes the source code for discussion data parsing, vector database generation, relevant post suggestions, and unit test scripts.
### GitHubAPI.py
This script fetches relevant information from the GitHub discussion forum using GraphQL queries.
**Environment Variable Required:**
- `GITHUB_TOKEN`: The token should be granted sufficient access to read the repository.
**Functionality:**
- Automatically traverses through pagination to fetch information from each discussion post.
- Stores each page in JSON format, including the original question and comments.
**Prerequisites:**
- `query.gql.in`
- `GITHUB_TOKEN`
### IndexGenerator.py
This script embeds the relevant discussion information into a vector database using LlamaIndex functions.
**Functionality:**
- Uses `SimpleDirectoryReader` to read JSON data from `GitHubAPI.py` and save it as a `Document` object.
- Uses `HuggingFaceEmbedding` to load the embedding model. The default model is "all-MiniLM-L6-v2".
- Uses `SemanticSplitterNodeParser` to chunk content into nodes according to their semantic similarity.
- Uses `VectorStoreIndex` to generate the vector database and save it locally.
**Prerequisites:**
- Transformer model (default: all-MiniLM-L6-v2)
### GitHubBot.py
This script loads the vector database, generates the most relevant posts according to the title of a new post, and posts the result as a reply.
**Functionality:**
- Uses cosine similarity search to find the similarity between the new post's title and previous posts' titles and discussion contents.
- Adjustable parameters: `top_n` (number of most relevant posts) and `threshold` (similarity cutoff).
**Prerequisites:**
- Transformer model (default: all-MiniLM-L6-v2)
- Vector database (`/db_dir`)
**Note:** It is recommended to use the same transformer model for vector database embedding and retrieval for best performance.
## Submodule
- `all-MiniLM-L12-v2` is a sentence-transformer model used to embed content into a vector index. It maps sentences and paragraphs to a 384-dimensional dense vector space for tasks like clustering or semantic search.
## Installation
To install and set up the `moose-discussion-bot`, follow these steps:
1. Install [Miniforge](https://github.com/conda-forge/miniforge)
2. Create your environment:
```bash
conda create -n RANGER python pip
conda activate RANGER
```
3. Clone the repository:
```bash
git clone https://github.com/idaholab/moose-discussion-bot.git
```
4. Navigate to the repository directory:
```bash
cd moose-discussion-bot
```
5. Install the required dependencies:
```bash
pip install -r requirements.txt
```
6. Configure the bot by creating a `.env` file with the necessary environment variables:
```plaintext
GITHUB_TOKEN=your_github_token
REPO_OWNER=your_repo_owner
REPO_NAME=your_repo_name
```
## Testing
Separate unit tests are developed for each class in the repository using `unittest`. The tests are organized as follows:
1. `test_GitHubAPI.py`: Contains unit tests for `GitHubAPI.py`.
2. `test_IndexGenerator.py`: Contains unit tests for `IndexGenerator.py`.
3. `test_GitHubBot.py`: Contains unit tests for `GitHubBot.py`.
To run the tests:
```bash
pytest
```
## Validation Mode
Use the `validation` subcommand to run a small, reproducible, **offline** check of the pipeline:
1. Read a pin file (e.g., `pinned.txt`) that lists discussions to fetch (`owner/repo#123` or full discussion URLs).
2. Fetch those discussions into a raw folder (`--val-out-dir`).
3. Build a fresh vector database (`--val-db`).
4. Answer a one-off `--prompt` using the offline index.
5. Optionally write and/or compare a golden result (`--write-golden`, `--golden`).
**Example**
```bash
python RANGER.py --config config.yaml validation
```
## References
1. [MOOSE GitHub Mining](https://github.com/hugary1995/moose-gh-mining)
2. [Discuss-Bot](https://github.com/dhrubasaha08/Discuss-Bot)