{"id":26881582,"url":"https://github.com/pilarcode/pdf_lab","last_synced_at":"2025-10-04T03:53:46.969Z","repository":{"id":283713598,"uuid":"906618926","full_name":"pilarcode/pdf_lab","owner":"pilarcode","description":"Having fun with pdf document processing libraries 🧐","archived":false,"fork":false,"pushed_at":"2025-03-22T15:44:59.000Z","size":6139,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-30T01:38:45.713Z","etag":null,"topics":["pdf-document","pdf2csv","pdf2txt"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pilarcode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-21T12:14:56.000Z","updated_at":"2025-03-23T13:11:44.000Z","dependencies_parsed_at":"2025-05-30T01:33:35.443Z","dependency_job_id":"9a4ac898-c88a-4c5b-82f0-4774ed743374","html_url":"https://github.com/pilarcode/pdf_lab","commit_stats":null,"previous_names":["pilarcode/pdf_lab"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pilarcode/pdf_lab","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pilarcode%2Fpdf_lab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pilarcode%2Fpdf_lab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pilarcode%2Fpdf_lab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pilarcode%2Fpdf_lab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pilarcode","download_url":"https://codeload.github.com/pilarcode/pdf_lab/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pilarcode%2Fpdf_lab/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278262445,"owners_count":25957938,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pdf-document","pdf2csv","pdf2txt"],"created_at":"2025-03-31T15:57:33.595Z","updated_at":"2025-10-04T03:53:46.938Z","avatar_url":"https://github.com/pilarcode.png","language":"Jupyter Notebook","readme":"\n# Pdf \u0026 images lab\nA project to explore libraries to extract text from pdfs such as:\n* pdfminer\n* pyMuPDF\n* pyPDF2\n* ptpdfium2\n\nBesides, I explore others to extract text from images such as\n* pytesseract\n* easyocr\n* transformers models from huggingface\n\nAdditionally, how to extract text from pdfs using LLMs is also explored\n* Gemini\n\n\n## Setup\n\n**Step 1**. Navigate to the root directory of the repository and create a new conda environment for development:\n\n```bash\nuv venv .venv\n```\n\n**Step 2**. Activate the environment:\n\n```bash\nsource .venv/Scripts/activate\n```\n\n**Step 3**. Install the dependencies:\n\n```bash\nuv pip install -e .\n```\n\n## Usage\nGo to the notebook and select your environment to run the cells.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpilarcode%2Fpdf_lab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpilarcode%2Fpdf_lab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpilarcode%2Fpdf_lab/lists"}