{"id":20372901,"url":"https://github.com/sidmishraw/docpruner","last_synced_at":"2025-10-26T10:47:23.996Z","repository":{"id":86400535,"uuid":"91619624","full_name":"sidmishraw/docpruner","owner":"sidmishraw","description":"DocPruner is an utility for pruning bad PDFs for cs 267 project and PDF processor","archived":false,"fork":false,"pushed_at":"2017-05-17T20:59:28.000Z","size":33,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-15T06:48:41.756Z","etag":null,"topics":["docpruner"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sidmishraw.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-17T20:58:55.000Z","updated_at":"2017-07-08T20:50:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"1fa65fcb-314b-491f-99e5-f5c6987a1f96","html_url":"https://github.com/sidmishraw/docpruner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sidmishraw%2Fdocpruner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sidmishraw%2Fdocpruner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sidmishraw%2Fdocpruner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sidmishraw%2Fdocpruner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sidmishraw","download_url":"https://codeload.github.com/sidmishraw/docpruner/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241921836,"owners_count":20042763,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docpruner"],"created_at":"2024-11-15T01:15:31.347Z","updated_at":"2025-10-26T10:47:18.945Z","avatar_url":"https://github.com/sidmishraw.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"### DocPruner\n\n\nPrunes the bad PDFs(probably scanned images of IEEE documents from IEEE Xplore) and \nmoves them out of the `input_pdfs` folder and moves folders `pdf_jsons` and \n`pdf_grouped_jsons` out of the cs267_project folder so that the PDF - JSON generation \nprocess can be started from scratch.\n \nThe artifact/jar (executable) jar is located in [here](./out/artifacts/DocPruner_jar/DocPruner.jar)\n\n#### Usage:\n```\njava -jar path_to_DocPruner.jar \u003cpath-to-pdfprocessor.log\u003e \u003cpath-to-pdf_jsons\u003e \u003cpath-to-pdf_grouped_jsons\u003e\n```\n\n \nIncase of concerns contact: sidharth.mishra@sjsu.edu","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsidmishraw%2Fdocpruner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsidmishraw%2Fdocpruner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsidmishraw%2Fdocpruner/lists"}