{"id":29008447,"url":"https://github.com/syaagalib/project_ocr","last_synced_at":"2025-06-25T14:05:44.894Z","repository":{"id":298729611,"uuid":"859741281","full_name":"SYAAGalib/project_ocr","owner":"SYAAGalib","description":"This code can do ocr","archived":false,"fork":false,"pushed_at":"2025-06-12T14:20:30.000Z","size":1294,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-12T15:32:03.453Z","etag":null,"topics":["ocr-python","python","streamlit","ui"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SYAAGalib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-09-19T07:46:32.000Z","updated_at":"2025-06-12T14:28:14.000Z","dependencies_parsed_at":"2025-06-12T15:47:22.975Z","dependency_job_id":null,"html_url":"https://github.com/SYAAGalib/project_ocr","commit_stats":null,"previous_names":["syaagalib/project_ocr"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SYAAGalib/project_ocr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SYAAGalib%2Fproject_ocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SYAAGalib%2Fproject_ocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SYAAGalib%2Fproject_ocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SYAAGalib%2Fproject_ocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SYAAGalib","download_url":"https://codeload.github.com/SYAAGalib/project_ocr/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SYAAGalib%2Fproject_ocr/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261888088,"owners_count":23225141,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-python","python","streamlit","ui"],"created_at":"2025-06-25T14:05:40.120Z","updated_at":"2025-06-25T14:05:44.885Z","avatar_url":"https://github.com/SYAAGalib.png","language":"Python","funding_links":["https://www.buymeacoffee.com/nainiayoub"],"categories":[],"sub_categories":[],"readme":"# PDF to Text\r\n[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/nainiayoub/pdf-text-data-extractor/main/app.py)\r\n![visitor badge](https://visitor-badge.glitch.me/badge?page_id=nainiayoub.pdf-text-data-extractor)\r\n![forks badge](https://img.shields.io/github/forks/nainiayoub/pdf-text-data-extractor)\r\n![starts badge](https://img.shields.io/github/stars/nainiayoub/pdf-text-data-extractor?style=social)\r\n\r\nPDF text data extraction app that takes a PDF document as input and returns either a txt file that contains all pages or a compressed folder of txt files representing the document pages. OCR can also be enabled for scanned docoments.\r\n\r\n\r\n![pdf_text_image](https://user-images.githubusercontent.com/50157142/214037439-448fafb8-5363-46cb-849e-6132f9bc0fb2.PNG)\r\n\r\n\r\n\r\n\r\n## How does it worK?\r\n\r\n```mermaid\r\nflowchart LR\r\n\r\nA[PDF] --\u003e |text conversion / OCR| B(Text)\r\nB --\u003e |Option 1| D[txt file]\r\nB --\u003e |Option 2| E[ZIP folder of txt files for pages]\r\n\r\n```\r\n1. Upload your PDF.\r\n2. Enable OCR (for scanned documents).\r\n3. Select the PDF language.\r\n4. Download your output file (zip/txt).\r\n\r\n## How to support the project\r\nYou can help support the project through feedback and/or [buy me coffee](https://www.buymeacoffee.com/nainiayoub).\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyaagalib%2Fproject_ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsyaagalib%2Fproject_ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyaagalib%2Fproject_ocr/lists"}