{"id":24794127,"url":"https://github.com/chris-cozy/pdf-interpreter","last_synced_at":"2026-04-15T14:02:12.563Z","repository":{"id":223922456,"uuid":"761390157","full_name":"chris-cozy/pdf-interpreter","owner":"chris-cozy","description":"The PDF Text Analysis Tool is a Python application that extracts and analyzes text from PDF files, focusing on extracting LOD (Limit of Detection) values and associated units, as well as DOIs (Digital Object Identifiers). It provides functionalities to clean and process the extracted data, generating a cleaned and normalized csv output.","archived":false,"fork":false,"pushed_at":"2024-04-08T15:53:13.000Z","size":18069,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-29T22:44:24.159Z","etag":null,"topics":["css","electron","glucose","html","javascript","lod","pdf","python","tailwind"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chris-cozy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-21T19:27:03.000Z","updated_at":"2024-04-15T18:09:35.000Z","dependencies_parsed_at":"2024-04-08T16:39:40.502Z","dependency_job_id":null,"html_url":"https://github.com/chris-cozy/pdf-interpreter","commit_stats":null,"previous_names":["chris-cozy/pdf-interpreter"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chris-cozy%2Fpdf-interpreter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chris-cozy%2Fpdf-interpreter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chris-cozy%2Fpdf-interpreter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chris-cozy%2Fpdf-interpreter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chris-cozy","download_url":"https://codeload.github.com/chris-cozy/pdf-interpreter/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245321449,"owners_count":20596353,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["css","electron","glucose","html","javascript","lod","pdf","python","tailwind"],"created_at":"2025-01-29T22:33:15.118Z","updated_at":"2026-04-15T14:02:12.415Z","avatar_url":"https://github.com/chris-cozy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDF Text Analysis Tool\n\nThe PDF Text Analysis Tool is a Python application that extracts and analyzes text from PDF files, focusing on extracting LOD (Limit of Detection) values and associated units, as well as Sensitivity values and DOIs (Digital Object Identifiers). It provides functionalities to clean and process the extracted data, generating a cleaned and normalized output for further analysis.\n\n## Features\n- Extracts text from PDF files\n- Normalizes text for consistent processing\n- Identifies LOD values and associated units\n- Identifies Sensitivity values\n- Extracts DOIs from PDF text\n- Cleans and processes extracted data\n- Generates cleaned and normalized output in CSV format\n\n## Installation\n1. Clone the repository:\n\n```\ngit clone https://github.com/chris-cozy/pdf-interpreter.git\n```\n2. Install the required dependencies:\n\n```\npip install PyPDF2 pandas\n```\n\n3. Run the program:\n```\npython main.py\n```\n## Usage\n1. Place your PDF files in the pdfs directory.\n2. Run the application using the installation instructions above.\n3. The application will extract text from the PDFs, analyze the text for LOD and sensitivity values, LOD units, and DOIs, and generate both raw data CSV files (raw_lod_table.csv) (raw_sensitivity_table.csv) and cleaned/normalized CSV files (cleaned_lod_table.csv) (cleaned_sensitivity_table.csv).\n\n## Sample Output (LOD)\n```\nDOI,Value,Units,Count\n10.1016/j.snb.2018.11.055,0.04,μm,2\n10.1016/j.snb.2018.11.055,0.08,μm,2\n10.1016/j.bios.2014.09.042,22.2,mg/dl,3\n10.1039/d1an00283j,0.01,mm,2\n10.1021/acs.analchem.5b00012,1.07,μm,1\n```\n## Sample Output (Sensitivity)\n```\nDOI,Value,Count\n10.1016/j.snb.2018.11.055,0.0,1\n10.1016/j.bios.2014.09.042,0.0033,1\n10.1021/acs.analchem.5b00012,6.1,1\n```\n  \n## Contributing\nContributions are welcome! If you have ideas for improvements or new features, please open an issue or submit a pull request.\n\n## License\nThis project is licensed under the MIT License - see the LICENSE file for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchris-cozy%2Fpdf-interpreter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchris-cozy%2Fpdf-interpreter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchris-cozy%2Fpdf-interpreter/lists"}