An open API service indexing awesome lists of open source software.

https://github.com/zjunlp/OmniThink

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
https://github.com/zjunlp/OmniThink

artificial-intelligence generation gpt information-seeking knowledge-augmented-generation large-language-models machine-writing natural-language-processing ominithink qwen retrieval-augmented-generation slow-thinking

Last synced: 4 months ago
JSON representation

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Awesome Lists containing this project

README

          






OmniThink




Expanding Knowledge Boundaries in Machine Writing
through Thinking


👏 Welcome to try OmniThink in our **[ Modelscope online demo](https://www.modelscope.cn/studios/iic/OmniThink) and [🤗HuggingFace online demo]( https://huggingface.co/spaces/zjunlp/OmniThink)**!


[🤖Project]
[📄Paper]
[📺Youtube]







## Table of Contents
- 🚩[Acknowledgement](#Acknowledgement)
- 🌻[Quick Start](#quick-start)
- 🌟[Introduction](#Introduction)
- 🔧[Dependencies](#Dependencies)
- 🔍[Local Search Support](#-local-search-support)
- 📉[Results](#Results)
- 🧐[Evaluation](#evaluation)

# 🔔News
- `2025-08-24`, We have added **offline local search** support using RAGFlow technology! Now you can search local documents without internet connection.
- `2025-03-12`, We have optimized the Docker usage for OmniThink.
- `2025-02-20`, We have added the evaluation methods from the paper to OmniThink, and in the future, we will integrate more evaluation methods.
- `2025-01-28`, We have provided support for the deepseek-reasoner model. You can try running ./examples/deepseekr1.py to test OmniThink's performance within deepseek-reasoner.

Previous News

- `2025-01-18`, we open-sourced OmniThink, a machine writing framework.

# 🌻Acknowledgement

- This work is implemented by [DsPY](https://github.com/stanfordnlp/dspy), [STORM](https://github.com/stanford-oval/storm) Sincere thanks for their efforts.
- We are also very grateful to [Zhangjiabao-nudt](https://github.com/Zhangjiabao-nudt) and [techshoww](https://github.com/techshoww) for their contributions to this repository.
- if you have any questions, please feel free to contact via xizekun.xzk@alibaba-inc.com, 1786594371@qq.com or xizekun2023@zju.edu.cn or create an issue.

## 📖 Quick Start

- 🌏 The **Online Demo** is avaiable at [ModelScope](https://www.modelscope.cn/studios/iic/OmniThink) now!

# 📌 Introduction

Welcome to **OmniThink**, an innovative machine writing framework designed to replicate the human cognitive process of iterative expansion and reflection in generating insightful long-form articles.

- **Iterative Expansion and Reflection**: OmniThink uses a unique mechanism that simulates human cognitive behaviors to deepen the understanding of complex topics.
- **Enhanced Knowledge Density**: OmniThink focuses on expanding knowledge boundaries, resulting in articles that are rich in information and insights.
- **Comprehensive Article Generation**: OmniThink constructs outlines and generates articles, delivering high-quality content that is both coherent and contextually robust.



# 🛠 Dependencies

## 📦 Conda

```bash
conda create -n OmniThink python=3.11
git clone https://github.com/zjunlp/OmniThink.git
cd OmniThink
# Install requirements
pip install -r requirements.txt
```

## 🔍 Local Search Support

OmniThink now supports **offline local search** using RAGFlow technology! This feature allows you to:

- **Search local documents** without internet connection
- **Use vector embeddings** for semantic search
- **Index and retrieve** your own document collections
- **Maintain data privacy** with local-only processing

### Local Search Features

- **OfflineRAGFlow**: Core RAG engine with FAISS vector database
- **LocalSearch**: DSPy-compatible search interface
- **Sentence Transformers**: High-quality text embeddings
- **Smart Chunking**: Intelligent document segmentation
- **Semantic Retrieval**: Context-aware search results

### Quick Local Search Setup

```python
from src.tools.rm import OfflineRAGFlow, LocalSearch

# Initialize the local RAG engine
rag_engine = OfflineRAGFlow(
model_name="sentence-transformers/all-MiniLM-L6-v2",
chunk_size=800,
overlap=120,
k=5
)

# Add documents to your local index
rag_engine.ingest(
text="Your document content here...",
meta={"title": "Document Title", "doc_id": "doc1"}
)

# Create DSPy-compatible search interface
local_search = LocalSearch(search=rag_engine, k=3)

# Use in your DSPy pipeline
results = local_search.forward("your search query")
```

## 🐳 Docker
```
git clone https://github.com/zjunlp/OmniThink.git
docker pull zjunlp/omnithink:latest
docker run -it zjunlp/omnithink:latest
```

🔑 Before running, please export the LM API key and SEARCH key as an environment variable:

```bash
export LM_KEY=YOUR_API_KEY
export SEARCHKEY=YOUR_SEARCHKEY
```

### Local Search Dependencies

For local search functionality, additional packages are required:

```bash
# Install local search dependencies
pip install sentence-transformers faiss-cpu numpy

# Or use the updated requirements.txt
pip install -r requirements.txt
```

> You can define your own [LM API](https://github.com/zjunlp/OmniThink/blob/main/src/tools/lm.py) and [SEARCH API](https://github.com/zjunlp/OmniThink/blob/main/src/tools/rm.py)

> Note that the output of the LM should be a LIST.

# Results in OmniThink
The preformance of OmniThink is shown below:



# Generate Article in OmniThink
Just one command required
```bash
sh run.sh
```
You can find your Article, Outline and mindmap in ./results/

# 🔍 Evaluation

We provide convenient scripts for evaluating your method. The evaluation is divided into three categories: **Rubric_Grading**, **Knowledge_Density**, and **Information_Diversity**.

We use the `factscore` library. Please run the following code before starting the evaluation.
```
cd eval
git clone https://github.com/shmsw25/FActScore.git
```

For Rubric Grading
```
python Rubric_Grading.py \
--articlepath articlepath \
--modelpath modelpath
```

For Information Diversity
```
python Information_Diversity.py \
--mappath mappath \
--model_path model_path
```

For Knowledge_Density
```
python Knowledge_Density.py \
--articlepath articlepath \
--api_path api_path \
--threads threads
```

## Citation
If you find our repo useful in your research, please kindly consider cite:
```angular2
@misc{xi2025omnithinkexpandingknowledgeboundaries,
title={OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking},
author={Zekun Xi and Wenbiao Yin and Jizhan Fang and Jialong Wu and Runnan Fang and Ningyu Zhang and Jiang Yong and Pengjun Xie and Fei Huang and Huajun Chen},
year={2025},
eprint={2501.09751},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.09751},
}
```