https://github.com/harimkang/docsense

An intelligent document assistant powered by Open-Source Large Language Models
https://github.com/harimkang/docsense

document-qa llm nlp qwen qwen2

Last synced: 3 months ago
JSON representation

An intelligent document assistant powered by Open-Source Large Language Models

Host: GitHub
URL: https://github.com/harimkang/docsense
Owner: harimkang
License: mit
Created: 2024-12-17T09:49:25.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-12-17T12:16:55.000Z (10 months ago)
Last Synced: 2025-04-12T08:58:03.592Z (6 months ago)
Topics: document-qa, llm, nlp, qwen, qwen2
Language: Python
Homepage: https://harimkang.github.io/docsense/
Size: 39.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# DocSense 📚

[![PyPI version](https://badge.fury.io/py/docsense.svg)](https://badge.fury.io/py/docsense)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Versions](https://img.shields.io/pypi/pyversions/docsense.svg)](https://pypi.org/project/docsense/)
[![Tests](https://github.com/harimkang/docsense/actions/workflows/test.yml/badge.svg)](https://github.com/harimkang/docsense/actions/workflows/test.yml)
[![codecov](https://codecov.io/gh/harimkang/docsense/branch/main/graph/badge.svg)](https://codecov.io/gh/harimkang/docsense)

An intelligent document assistant powered by Open-Source Large Language Models 🤖

DocSense is a powerful tool that helps you interact with your documents using natural language. It leverages the open-source Qwen language model (with plans to support more open-source models) to understand and answer questions about your documents with high accuracy and context awareness, all completely free to use.

## Features ✨

- 🔍 Advanced semantic search using FAISS
- 💡 Intelligent question answering with open-source LLMs (currently Qwen)
- 📝 Support for multiple document formats (txt, md, rst, etc.)
- ⚡ GPU acceleration for faster processing
- 🔄 Batch processing for memory efficiency
- 💾 Persistent vector storage

## Installation 🛠️

### CPU Version

```bash
pip install docsense
```

### GPU Version (Recommended)

First, install PyTorch with CUDA support:

```bash
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
```

Then install FAISS with GPU support:

```bash
conda install -c conda-forge faiss-gpu
```

Finally, install DocSense:

```bash
pip install docsense
```

## Usage 🚀

### Creating Document Index

Index your documents directory:

```bash
docsense index /path/to/your/documents
```

### Asking Questions

Ask a question to your documents:

```bash
docsense ask "How to use this library?"
```

### Interactive Mode

Start an interactive session for multiple questions:

```bash
docsense daemon
```

### Command Line Options

All commands support the following options:

- `--model-name`: Specify the Qwen model to use (default: "Qwen/Qwen2-7B")
- `--device`: Choose computing device ("cuda" or "cpu", default: "cuda")
- `--index-path`: Set custom path for the vector index

Example with options:

```bash
docsense index /path/to/your/documents --model-name "Qwen/Qwen2-7B" --device "cuda" --index-path /path/to/your/index
```

## License 📄

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Star History 🌟

[![Star History Chart](https://api.star-history.com/svg?repos=harimkang/docsense&type=Date)](https://star-history.com/#harimkang/docsense&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/harimkang/docsense

Awesome Lists containing this project

README