{"id":31445951,"url":"https://github.com/semanticclimate/pdf_summarization_demo","last_synced_at":"2025-09-30T23:53:03.659Z","repository":{"id":306890390,"uuid":"1027558281","full_name":"semanticClimate/PDF_Summarization_demo","owner":"semanticClimate","description":null,"archived":false,"fork":false,"pushed_at":"2025-07-28T11:11:56.000Z","size":69,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-05T15:18:52.528Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/semanticClimate.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-28T07:34:19.000Z","updated_at":"2025-07-28T11:12:00.000Z","dependencies_parsed_at":"2025-07-28T09:30:13.769Z","dependency_job_id":"af1136cd-4f3e-4bb9-b7e3-13052599e164","html_url":"https://github.com/semanticClimate/PDF_Summarization_demo","commit_stats":null,"previous_names":["semanticclimate/pdf_summarization_demo"],"tags_count":0,"template":false,"template_full_name":"semanticClimate/notebook-template","purl":"pkg:github/semanticClimate/PDF_Summarization_demo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semanticClimate%2FPDF_Summarization_demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semanticClimate%2FPDF_Summarization_demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semanticClimate%2FPDF_Summarization_demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semanticClimate%2FPDF_Summarization_demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/semanticClimate","download_url":"https://codeload.github.com/semanticClimate/PDF_Summarization_demo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semanticClimate%2FPDF_Summarization_demo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277773147,"owners_count":25874567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-30T02:00:09.208Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-30T23:53:01.715Z","updated_at":"2025-09-30T23:53:03.651Z","avatar_url":"https://github.com/semanticClimate.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Demonstration of PDF Summarization\n\n\u003ca href=\"https://colab.research.google.com/github/semanticClimate/PDF_Summarization_demo/blob/main/RevLit_PDF_Summarization.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e\n\nDOI Zenodo badge: \n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.16526790.svg)](https://doi.org/10.5281/zenodo.16526790)\n\nCitation:\n\nBarbhuiya, S., S, A., Jawed, M., Kumari, R., Simon, W., Yadav, G., \u0026 Murray-Rust, P. (2025). Demonstration of PDF Summarization (0.1a). Zenodo. https://doi.org/10.5281/zenodo.16526790\n\n**Description:**\n\nThis Jupyter notebook provides an end-to-end pipeline for summarizing scientific PDFs using Natural Language Processing (NLP) techniques. It extracts text from uploaded PDFs and generates concise summaries using transformer-based models.\n\n#### Features\n- Upload and parse PDF documents\n- Extract meaningful text content\n- Generate summaries using Hugging Face Transformers (e.g., BART, T5)\n- Optionally view original and summarized text side-by-side\n- Includes visualization support with PyMuPDF and IPython.display\n#### Requirements\n1. Install the following packages:\n2. pip install transformers\n3. pip install PyPDF2\n4. pip install fitz\n5. pip install PyMuPDF\n6. pip install nltk\n7. pip install torch\n\n#### How to Use\n1.\tClone this repository or download the notebook.\n2.\tLaunch Jupyter Notebook or Google Colab.\n3.\tUpload your scientific or research-based PDF.\n4.\tRun all cells to:\n        - Extract the full text\n        - Preprocess and chunk the content\n        - Generate a summary using a transformer model\n\n#### Structure\n- upload_pdf() – Upload and read PDF files\n- extract_text() – Extract text from all pages\n- summarize_text() – Use pre-trained summarization models\n- visualize() – Display original vs. summarized content\n  \n#### Applications\n- Research paper summarization\n- Literature review automation\n- Information extraction for large documents\n \n \n#### Notes\n- Pretrained models like facebook/bart-large-cnn or t5-base are used.\n- Results depend on PDF formatting quality.\n\nReviewers \u0026 review process: \\\u003cAdd reviewers and review process link\\\u003e \n\n---\n\nSoftware citation information: [CITATION.cff](CITATION.cff)\n\nLicense: Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ | License information: [LICENSE](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemanticclimate%2Fpdf_summarization_demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsemanticclimate%2Fpdf_summarization_demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemanticclimate%2Fpdf_summarization_demo/lists"}