{"id":21407100,"url":"https://github.com/daveshap/quickly_extract_science_papers","last_synced_at":"2026-01-03T11:48:16.994Z","repository":{"id":179200437,"uuid":"663118925","full_name":"daveshap/Quickly_Extract_Science_Papers","owner":"daveshap","description":"Scientific papers are coming out TOO DAMN FAST so we need a way to very quickly extract useful information.","archived":false,"fork":false,"pushed_at":"2023-07-09T23:59:54.000Z","size":30458,"stargazers_count":145,"open_issues_count":1,"forks_count":47,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-01-23T04:12:40.495Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daveshap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-07-06T15:35:46.000Z","updated_at":"2025-01-04T16:13:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"c669b137-c369-4bbd-89d4-bcfb2dd8ff61","html_url":"https://github.com/daveshap/Quickly_Extract_Science_Papers","commit_stats":null,"previous_names":["daveshap/quickly_extract_science_papers"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daveshap%2FQuickly_Extract_Science_Papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daveshap%2FQuickly_Extract_Science_Papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daveshap%2FQuickly_Extract_Science_Papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daveshap%2FQuickly_Extract_Science_Papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daveshap","download_url":"https://codeload.github.com/daveshap/Quickly_Extract_Science_Papers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243902320,"owners_count":20366262,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-22T16:44:39.127Z","updated_at":"2026-01-03T11:48:16.950Z","avatar_url":"https://github.com/daveshap.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Quickly_Extract_Science_Papers\n\nScientific papers are coming out TOO DAMN FAST so we need a way to very quickly extract useful information.\n\n## Repo Contents\n\n- `chat.py` - this file is a simple chatbot that will chat with you about the contents of `input.txt` (you can copy/paste anything into this text file). Very useful to quickly discuss papers. \n- `generate_multiple_reports.py` - this will consume all PDFs in the `input/` folder and generate summaries in the `output/` folder. This is helpful for bulk processing such as for literature reviews. \n- `render_report.py` - this will render all the reports in `output/` to a an *easier* to read file in `report.html`.\n\n## EXECUTIVE SUMMARY\n\nThis repository contains Python scripts that automate the process of generating reports from PDF files using OpenAI's\nGPT-4 model. The scripts extract text from PDF files, send the text to the GPT-4 model for processing, and save the\ngenerated reports as text files. The scripts also include functionality to render the generated reports as an HTML\ndocument for easy viewing.\n\n## SETUP\n\n1. Clone the repository to your local machine.\n2. Install the required Python packages by running `pip install -r requirements.txt` in your terminal.\n3. Obtain an API key from OpenAI and save it in a file named `key_openai.txt` in the root directory of the repository.\n4. Place the PDF files you want to generate reports from in the `input/` directory.\n\n## USAGE\n\n1. Run the `generate_multiple_reports.py` script to generate reports from the PDF files in the `input/` directory. The\ngenerated reports will be saved as text files in the `output/` directory.\n2. Run the `render_report.py` script to render the generated reports as an HTML document. The HTML document will be\nsaved as `report.html` in the root directory of the repository.\n3. You can modify the `prompts` in `generate_multiple_reports.py` to focus on any questions you would like to ask. In other words you can automatically ask any set of questions in bulk against any set of papers. This can help you greatly accelerate your literature reviews and surveys.\n\n## NOTE\n\nThe scripts are designed to handle errors and retries when communicating with the OpenAI API. If the API returns an\nerror due to the maximum context length being exceeded, the scripts will automatically trim the oldest message and retry\nthe API call. If the API returns any other type of error, the scripts will retry the API call after a delay, with the\ndelay increasing exponentially for each consecutive error. If the API returns errors for seven consecutive attempts, the\nscripts will stop and exit.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaveshap%2Fquickly_extract_science_papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaveshap%2Fquickly_extract_science_papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaveshap%2Fquickly_extract_science_papers/lists"}