https://github.com/deep-diver/auto-paper-analysis
https://github.com/deep-diver/auto-paper-analysis
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/deep-diver/auto-paper-analysis
- Owner: deep-diver
- License: apache-2.0
- Created: 2024-02-13T04:16:45.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-03-24T01:24:53.000Z (over 1 year ago)
- Last Synced: 2025-05-05T23:08:16.848Z (2 months ago)
- Language: Jupyter Notebook
- Size: 465 KB
- Stars: 35
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Auto Paper Analysis
This project automatically generate Questions and Answers on a given arXiv ids. For now, the CLI tool only supports to grasp arXiv ids from [Hugging Face 🤗 Daily Papers](https://huggingface.co/papers). Also, it is possible to directly generate on a set of arXiv ids.
You can see the generated QA dataset from [chansung/auto-paper-qa2](https://huggingface.co/datasets/chansung/auto-paper-qa2) repository. Also, you can see how these dataset could be used with [PaperQA](https://huggingface.co/spaces/chansung/paper_qa) space application.
## Instruction
If you want to do prompt engineering, modify the [prompts.toml](https://github.com/deep-diver/auto-paper-analysis/tree/main/app/constants/prompts.toml) file. There are two prompts to play with.
### Hugging Face 🤗 Daily Papers
To generate QAs of arXiv papers on a specific date, run:
```shell
export GEMINI_API_KEY=
export HF_ACCESS_TOKEN=python app.py --target-date $current_date \
--gemini-api $GEMINI_API_KEY \
--hf-token $HF_ACCESS_TOKEN \
--hf-repo-id $hf_repo_id \
--hf-daily-papers
```If you want to generate QAs of arXiv papers on the range of date, run:
```shell
export GEMINI_API_KEY=
export HF_ACCESS_TOKEN=
export HF_DATASET_REPO_ID=./date_iterator.sh "2024-03-01" "2024-03-03" $HF_DATASET_REPO_ID
```### arXiv Ids
To generate QAs of arXiv papers on a list of arXiv IDs, run:
```shell
export GEMINI_API_KEY=
export HF_ACCESS_TOKEN=python app.py \
--gemini-api $GEMINI_API_KEY \
--hf-token $HF_ACCESS_TOKEN \
--hf-repo-id $hf_repo_id \
--arxiv-ids ...
```## Acknowledgements
This is a project built during the Gemini sprint held by Google's ML Developer Programs team. I am thankful to be granted good amount of GCP credits to finish up this project.