{"id":19932083,"url":"https://github.com/amazon-science/piperag","last_synced_at":"2025-09-19T07:31:29.534Z","repository":{"id":244475844,"uuid":"813827664","full_name":"amazon-science/piperag","owner":"amazon-science","description":"PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)","archived":false,"fork":false,"pushed_at":"2024-06-14T22:54:52.000Z","size":151,"stargazers_count":24,"open_issues_count":1,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-09T05:11:42.136Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-11T20:25:35.000Z","updated_at":"2025-08-21T12:49:49.000Z","dependencies_parsed_at":"2024-06-15T00:42:51.250Z","dependency_job_id":"73a8c05d-faf8-449c-9e79-00e3f3cc22cf","html_url":"https://github.com/amazon-science/piperag","commit_stats":null,"previous_names":["amazon-science/piperag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/amazon-science/piperag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fpiperag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fpiperag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fpiperag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fpiperag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/piperag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fpiperag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275896454,"owners_count":25548198,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-19T02:00:09.700Z","response_time":108,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T23:08:58.651Z","updated_at":"2025-09-19T07:31:29.165Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design\n\nWe developed our project based on [this repository](https://github.com/TobiasNorlund/retro).\n\n## Environment\n\n```\nconda create -n retro python=3.9 -y\nconda activate retro\n\n# if use torch 2.x to use torch.compile (https://pytorch.org/get-started/locally/) \npip3 install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n# if stick to the old version\npip install --extra-index-url https://download.pytorch.org/whl/cu113 \\\n                torch==1.12.1+cu113 \\\n                torchvision==0.13.1+cu113 \\\n                torchaudio==0.12.1+cu113 \n\n# CUDA \u0026 cuDNN version must match the onnxruntime version\n# Wenqi: it seems that even if for CUDA 12.0, if we install pytorch based on 11.8, it would work, no need to reinstall CUDA!\nhttps://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements \nhttps://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux\u0026target_arch=x86_64\u0026Distribution=Ubuntu\u0026target_version=20.04\u0026target_type=deb_local \n\n# if installed on AWS AMI, later on ‘Failed to initialize NVML: Driver/library version mismatch’ may appear because the system by default forces another CUDA version\n# solution: https://stackoverflow.com/questions/43022843/nvidia-nvml-driver-library-version-mismatch\nsudo add-apt-repository ppa:graphics-drivers/ppa\nsudo apt-get update\nsudo apt-get upgrade # The following packages have unmet dependencies:  nvidia-driver-520 : Depends: nvidia-driver-525 but it is not installed\nsudo apt --fix-broken install\n# then reboot\n\n\n# Sometimes it shows that the fabric manager version does not match the Nvidia driver version - in this case we need to update the driver, e.g., \nsudo apt install nvidia-driver-525\nsudo systemctl start nvidia-fabricmanager\nsystemctl status nvidia-fabricmanager.service\n/usr/bin/nv-fabricmanager --version\n! python -c \"import torch; print(torch.cuda.is_available())\"\n# sudo apt-get install -y cuda-compat-11-8\n\npip install transformers==4.21.0 \npip install pytorch-lightning==1.7.4 \npip install einops==0.6.0 \npip install pytest==7.2.1 \npip install sentence-transformers==2.2.2 \npip install matplotlib==3.6.3  \npip install seaborn==0.12.2\npip install torchmetrics==0.11.4\n\npip install onnx==1.15\npip install onnxruntime-gpu==1.16\n# pip install onnxconverter-common==1.14.0\npip install grpcio-tools\n\nconda install -c pytorch -c nvidia faiss-gpu=1.7.4 mkl=2021 blas=1.0=mkl\n# or on CPU-only server\nconda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl\n```\n\nIn ~/.bashrc:\n\n```\nWORKSPACE=/fsx/PipeRAG\nconda activate retro\n```\n\n\n## Model download\n\nDownload the [retro.zip](https://chalmersuniversity.box.com/s/d7qijjdyfv6ubdy1ux10syrq4ep3ca6e) and extract it in `data/model` folder.\n\n\n## Folder Organization\n\n```\n├── Dockerfile\n├── LICENSE\n├── README.md\n├── data\n│   ├── datasets\n│   └── model\n├── inference\n│   ├── README.md\n│   ├── __pycache__\n│   ├── demo_retrieval_client.py\n│   ├── demo_retrieval_server.py\n│   ├── evaluate_rag_performance.py\n│   ├── faiss_server.py\n│   ├── grpc_test\n│   ├── inference_client.py\n│   ├── performance\n│   ├── performance_model.py\n│   ├── proto\n│   ├── retrieval_pb2.py\n│   ├── retrieval_pb2.pyi\n│   ├── retrieval_pb2_grpc.py\n│   ├── retriever_client.py\n│   ├── test_retrieval_performance.py\n│   └── test_sbert_performance.py\n├── logs\n├── plots\n├── src\n│   ├── data\n│   ├── dataset_retro.py\n│   ├── evaluate_retro_realtime_retrieval.py\n│   ├── evaluate_staleness_query_doc_similarity.py\n│   ├── evaluation_perplexity_all.py\n│   ├── evaluation_suite.py\n│   ├── generate_retro_greedy.py\n│   ├── generate_retro_onnx.py\n│   ├── generate_retro_original.py\n│   ├── modeling_retro.py\n│   ├── modeling_retro_inference.py\n│   ├── modeling_retro_original.py\n│   ├── onnx_retro_decoder\n│   ├── onnx_retro_encoder\n│   ├── out_onnx\n│   ├── retrieval.py\n│   ├── traces\n│   ├── train_retro.py\n│   └── unused\n└── test_funcs\n```\n\n### data\n\nThis folder stores the models and datasets for evaluation. \n\n```\n├── datasets\n│   ├── MassiveOpenText\n│   ├── Pile\n│   ├── README.md\n│   ├── RealNews\n│   ├── c4-en\n│   ├── generate_index_config.py\n│   ├── index.spec.json\n│   ├── indexes_c4\n│   ├── indexes_mix\n│   ├── indexes_realnews\n│   ├── indexes_wikipedia\n│   ├── process_data.py\n│   ├── val_c4\n│   ├── val_realnews\n│   ├── val_wikipedia\n│   ├── wikipedia-downloader\n│   └── wikipedia-en\n└── model\n    ├── README.md\n    ├── model.ckpt\n    └── retro.json\n```\n\n`process_data.py` is an important script to processing the document datasets, encoding them, and indexing them.\n\n### inference\n\nThis is the folder containing the scripts for performance evaluation (both inference and retrieval)\n\nKey files are as follows:\n\n```\n├── evaluate_rag_performance.py\n├── faiss_server.py\n├── inference_client.py\n├── performance\n├── performance_model.py\n├── test_retrieval_performance.py\n└── test_sbert_performance.py\n```\n\n`evaluate_rag_performance.py` is the script used to automatically evaluate all the generation performance, given that the search service is started.\n\n`faiss_server.py` is used to start the vector search service. \n\n`inference_client.py` is the inference program using ONNX. The modules in this script is invoked by `evaluate_rag_performance.py`. It also contains a model to get the performance model of inference.\n\n`performance` is a folder storing the trained performance models.\n\n`performance_model.py` contains the performance model modules used to predict the maximum nprobe, using the profiling results of the generation model, the retriever, and the SBERT model.\n\n`test_retrieval_performance.py` is the script to model the retrieval performance.\n\n`test_sbert_performance.py` is the script to model SBERT performance.\n\n### src\n\nStores all the scripts for perplexity evaluation.\n\n```\n├── data\n│   ├── embed_chunks.py\n│   ├── merge_populated_indexes.py\n│   ├── populate_faiss_index.py\n│   ├── retrieve_neighbours.py\n│   ├── tokenize_and_chunk.py\n│   └── train_faiss_index.py\n├── dataset_retro.py\n├── evaluate_retro_realtime_retrieval.py\n├── evaluate_staleness_query_doc_similarity.py\n├── evaluation_perplexity_all.py\n├── evaluation_suite.py\n├── generate_retro_greedy.py\n├── generate_retro_onnx.py\n├── generate_retro_original.py\n├── modeling_retro.py\n├── modeling_retro_inference.py\n├── modeling_retro_original.py\n├── onnx_retro_decoder\n├── onnx_retro_encoder\n├── retrieval.py\n├── train_retro.py\n```\n\n`data` folder contains some scripts to process data. But instead of using these scripts, the `process_data.py` script in another folder offers more user-friendly implementation of data preprocessing.\n\n#### Evaluating Perplexity and Quality\n\n`evaluation_perplexity_all.py` is an important script used to evaluate the perplexity of all experiments.\n\n`evaluate_retro_realtime_retrieval.py` is used to evaluate the perplexity of a single algorithm setting, it is invoked by `evaluation_perplexity_all.py` and `evaluation_suite.py`.\n\n`evaluate_staleness_query_doc_similarity.py` is used to evaluate the cosine similarity between content retrieved by stale and non-stale query using sentence transformers.\n\n`evaluation_suite.py` is a deprecated script. It was used to evaluate some perplexity numbers. But now `evaluation_perplexity_all.py` offers more comprehensive functionalities.\n\n`dataset_retro.py` specifies various data loader and iterators for perplexity evaluation, given non-stale and stale queries, with various settings.\n\n`train_retro.py` is a top-level abstraction, containing a function `get_realtime_retrieval_retro_dataset_from_spec` that uses the modules in `dataset_retro.py`. \n\nThe perplexity evaluation script invoking order is: `evaluation_perplexity_all.py` -\u003e `evaluate_retro_realtime_retrieval.py` -\u003e `train_retro.py` -\u003e `dataset_retro.py`\n\n#### ONNX processing\n\n`generate_retro_onnx.py` is the script that exports PyTorch model in ONNX format, with an implementation of generation. The ONNX models are stored in the following folders:\n\n```\n├── onnx_retro_decoder\n├── onnx_retro_encoder\n```\n\n`generate_retro_original.py` is the original PyTorch script for generation using HuggingFace.\n\n`generate_retro_greedy.py` is the PyTorch script with our own greedy decoding implementation.\n\n\n#### Attention Mechanisms\n\nThe following scripts specify the model architecture and the attention mechanisms.\n\n```\n├── modeling_retro.py\n├── modeling_retro_inference.py\n├── modeling_retro_original.py\n```\n\n`modeling_retro_original.py` is the original RETRO implementation.\n\n`modeling_retro.py` is the PipeRAG attention implementation used for perplexity evaluation.\n\n`modeling_retro_inference.py` is the PipeRAG attention implementation used for fast inference.\n\n### plots\n\nStores all the performance numbers as well as the plotting scripts for the Figure. \n\nThe following are the recorded performance and perplexity numbers:\n```\n├── generation_join_perplexity_and_performance_df.pickle\n├── generation_performance_df.pickle\n├── generation_perplexity_df.pickle\n```\n\nThe following scripts are important utils:\n\n```\n├── join_df.py\n└── print_df.py\n```\n\n`join_df.py` is used to join the performance and perplexity. It is very important to run this script to get the latest `generation_join_perplexity_and_performance_df.pickle` which will be used for plotting.\n\n`print_df.py` prints the content of a pickle-stored dataframe.\n\n\nThe following scripts are used for plots used in the paper:\n\n```\n├── plot_alternative_system_performance.py\n├── plot_dynamic_nprobe.py\n├── plot_pareto.py\n├── plot_pareto_allow_RETRO_flexible_interval.py\n├── plot_ppl_db_sizes_paper.py\n├── plot_ppl_nprobe_interval_paper.py\n```\n\n`plot_alternative_system_performance.py` projects the performance-perplexity trend on future hardware.\n\n`plot_dynamic_nprobe.py` shows the numbers (used in a table) of the performance-perplexity numbers using dynamic (performance-model-driven) retrievals.\n\n`plot_pareto.py` compares the Pareto performance-perplexity curve of PipeRAG and RETRO.\n\n`plot_pareto_allow_RETRO_flexible_interval.py` compares the Pareto performance-perplexity curve of PipeRAG and RETRO that supports flexible retrieval intervals.\n\n`plot_ppl_db_sizes_paper.py` shows the effect of different database sizes. \n\n`plot_ppl_nprobe_interval_paper.py` shows the effect of `nprobe` and `intervals` on perplexity.\n\n### (Not important) logs\n\nStores some logs used in the past.\n\n### (Not important) test_funcs\n\nSome test scripts regarding ONNS, SBERT, etc.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fpiperag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2Fpiperag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fpiperag/lists"}