Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/deep-diver/paperqa-ui


https://github.com/deep-diver/paperqa-ui

Last synced: about 5 hours ago
JSON representation

Awesome Lists containing this project

README

        

# PaperQA UI

This project lets users explore [arXiv](https://arxiv.org/) papers with the following features powered by large language model(LLM):
- exploring collected papers.
- searching collected paper with auto-complete, but also by year, month, and day.
- reading (question, answer) pairs smoothly.
- requesting to generate (question, answer) pairs of specified arXiv IDs.
- chatting directly about each paper. Each chat history per paper is managed permanently on in-browser local storage.

## Instructions

0. Prerequisites

- Currently, the underlying LLM is Google's [Gemini 1.0 Pro](https://deepmind.google/technologies/gemini/). This is due to the context size problem. Most of the existing open source large language models' context length is small. But, if you wish, you can try out since the codebase for interacting with Text Generation Infrerence is already integrated (tested with [`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) and [`meta-llama/Llama-2-70b-hf`](https://huggingface.co/meta-llama/Llama-2-70b-hf)).

- Sign-up [Google AI Studio](https://ai.google.dev) and grasp your own API Key. For personal use, the free version should be sufficient (60 message per minute).

- Sign-up [Hugging Face](https://huggingface.co/) and grasp your own Access Token. It will be used to create and modify Dataset repository. After all, **PaperQA UI** manages underlying data (generated Q&As and arXiv ID requests) in Hugging Face Hub.

1. Install the dependencies

```bash
$ pip install -r requirements.txt
```

2. Setup environment variables

```bash
$ export GEMINI_API_KEY=""
$ export HF_TOKEN=""
$ export SOURCE_DATA_REPO_ID="..."
$ export REQUEST_DATA_REPO_ID="..."
$ export RESTART_TARGET_SPACE_REPO_ID="..."
```

- `SOURCE_DATA_REPO_ID` and `REQUEST_DATA_REPO_ID` are the IDs of Hugging Face Dataset repository. Name them like `/`. At first, it will create the repositories automatically, then update them afterwards.

- `RESTART_TARGET_SPACE_REPO_ID` is the ID of Hugging Face Space repository. If you are running this app on a local machine, you can ignore this. If you are running this app on Hugging Face Space, specify the Space repository ID here. When a chunk of automated Q&A generation is finished, the Space will be automatically restarted.

> To reflect updated data from `SOURCE_DATA_REPO_ID`, you need to restart this application.

2. Run the application

```bash
$ python app.py
```

If you are curious how this application works in action, please visit Hugging Face Space that I am hosting this app on [[LINK](https://huggingface.co/spaces/chansung/paper_qa)].

## Todos

- [ ] more LLM support (ChatGPT4, ...)
- [ ] RAG system (now **PaperQA UI** simply inject 30000 characters of the entire paper in chatting mode)
- [ ] ...

## Acknowledgements

This is a project built during the Gemini sprint held by Google's ML Developer Programs team. I am thankful to be granted good amount of GCP credits to finish up this project.