https://github.com/edisonleeeee/in-context-rag
https://github.com/edisonleeeee/in-context-rag
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/edisonleeeee/in-context-rag
- Owner: EdisonLeeeee
- Created: 2025-01-28T05:14:30.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-05-21T01:44:09.000Z (about 1 year ago)
- Last Synced: 2025-12-27T14:15:15.840Z (5 months ago)
- Language: Python
- Size: 16.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# In-context RAG
## Environment Setup
+ Step 0: Create a new Conda virtual environment
```bash
conda create -n labelrag python==3.12 -c conda-forge -y
conda activate labelrag
```
+ Step 1: Install PyTorch 2.4 or [other versions](https://pytorch.org/)
```bash
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0
```
+ Step 2: Install dependencies
```bash
pip install transformers peft numpy pandas tqdm nest_asyncio huggingface_hub sentence_transformers
pip install deepspeed trl tensorboard loguru triton bitsandbytes tiktoken modelscope ogb
pip install flash-attn
```
+ Step 3: Install the corresponding version of `vllm` (compatible with PyTorch 2.4)
```bash
pip install vllm==0.6.3.post1
```
+ Step 4: Install [swift](https://github.com/modelscope/ms-swift), version 3.x is required
```bash
pip install 'ms-swift[llm]'
```
+ Step 5: Install [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html)
```bash
pip install torch_geometric
```
+ Step 6: Install [DGL](https://www.dgl.ai/pages/start.html) (only for loading graph data)
+ Step 7 (optional): Install FAISS
```bash
conda install -c pytorch -c nvidia faiss-gpu=1.8.0 -y
# pip install faiss-gpu
```
+ Step 8 (optional): Install `git-lfs` for downloading large models
```bash
apt-get install git-lfs
```
## Download Datasets
+ Download CSTAG-related datasets. The `ogbn-arxiv` dataset will be downloaded automatically in the code.
```bash
huggingface-cli download --repo-type dataset --resume-download Sherirto/CSTAG --local-dir CSTAG --local-dir-use-symlinks False
```
If the download fails in regions with restricted access, use a mirror source:
```bash
HF_ENDPOINT=https://hf-mirror.com huggingface-cli download --repo-type dataset --resume-download Sherirto/CSTAG --local-dir CSTAG --local-dir-use-symlinks False
```
+ Download DTGB from [DTGN](https://github.com/zjs123/DTGB)
## Download Models from ModelScope
```bash
git lfs clone https://www.modelscope.cn/LLM-Research/Meta-Llama-3.1-8B-Instruct.git
git lfs clone https://www.modelscope.cn/qwen/Qwen2.5-7B-Instruct.git
git lfs clone https://www.modelscope.cn/LLM-Research/gemma-2-9b-it.git
git lfs clone https://www.modelscope.cn/LLM-Research/Mistral-7B-Instruct-v0.3.git
git lfs clone https://www.modelscope.cn/LLM-Research/Phi-3.5-mini-instruct.git
huggingface-cli download --resume-download sentence-transformers/all-mpnet-base-v2 --local-dir ./all-mpnet-base-v2
```
**Note**: After cloning with `git lfs`, the `.git` folder in the model directory may take up significant space. Make sure to remove it if necessary:
```bash
cd Meta-Llama-3.1-8B-Instruct && rm -rf .git
```
## Run the Code
+ Preprocess datasets and construct prompts:
```python
python generate_prompt_node.py
python generate_prompt_edge.py
```
+ Configure the `gpu_ids` in the shell script, then run all models in one command (multi-GPU inference with native PyTorch):
```bash
bash scripts/run_node.sh
bash scripts/run_edge.sh
```
For more parameter details, refer to the Swift documentation: [Command Line Parameters](https://swift.readthedocs.io/en/latest/Instruction/Command-line-parameters.html)