{"id":31114671,"url":"https://github.com/gracefullight/paper-verbs","last_synced_at":"2025-09-17T10:52:26.808Z","repository":{"id":313116511,"uuid":"1050105630","full_name":"gracefullight/paper-verbs","owner":"gracefullight","description":null,"archived":false,"fork":false,"pushed_at":"2025-09-04T01:01:17.000Z","size":111,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-04T02:39:56.884Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gracefullight.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-09-04T00:43:01.000Z","updated_at":"2025-09-04T01:01:21.000Z","dependencies_parsed_at":"2025-09-04T02:39:59.960Z","dependency_job_id":"e244857e-500b-46f7-8738-aedf9de048da","html_url":"https://github.com/gracefullight/paper-verbs","commit_stats":null,"previous_names":["gracefullight/paper-verbs"],"tags_count":null,"template":false,"template_full_name":"gracefullight/py-starter","purl":"pkg:github/gracefullight/paper-verbs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gracefullight%2Fpaper-verbs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gracefullight%2Fpaper-verbs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gracefullight%2Fpaper-verbs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gracefullight%2Fpaper-verbs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gracefullight","download_url":"https://codeload.github.com/gracefullight/paper-verbs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gracefullight%2Fpaper-verbs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275583374,"owners_count":25490651,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-17T02:00:09.119Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-17T10:52:24.700Z","updated_at":"2025-09-17T10:52:26.801Z","avatar_url":"https://github.com/gracefullight.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# paper-verbs\n\nAnalyze English verb usage from PDFs. Place your papers under `src/assets/` and run the analyzer to extract verb lemma frequencies, simple verb phrases, and tense/voice distributions. Results are printed to console and saved as CSVs under `src/`.\n\n## Quick Start\n\n1) Ensure Python 3.12 is used (3.13 is not supported due to spaCy/NumPy ABI constraints):\n\n```sh\nuv python install 3.12     # once\nuv sync                     # install dependencies + model\n```\n\n2) Put PDFs under `src/assets/` (recursive; `.pdf`/`.PDF` both supported).\n\n3) Run the analyzer:\n\n```sh\nuv run python src/main.py\n```\n\n## What It Does\n\n- Scans `src/assets/` for PDFs and extracts text (PyMuPDF).\n- Removes trailing References/Bibliography/Acknowledgements heuristically.\n- Uses spaCy (en_core_web_sm) to parse and collect:\n  - Verb lemmas (AUX/modals excluded by default; e.g., can/would/have)\n  - Simple verb phrases around each counted verb\n  - Tense and voice distributions\n- Prints the top 100 verb lemmas to console.\n- Saves CSVs under `src/`:\n  - `src/verbs.csv` with columns: `rank, verb, count`\n  - `src/phrases.csv` with columns: `rank, verb_phrase, count`\n\nNotes\n- The lemma `preprint` is excluded from counts.\n- No CLI flags: defaults are baked in. Just run the script.\n\n## Development\n\nCommon tasks (via poe):\n\n```sh\nuv run poe run         # run app\nuv run poe test        # tests (pytest)\nuv run poe test-cov    # tests with coverage\nuv run poe lint        # ruff check\nuv run poe format      # ruff format\nuv run poe type-check  # mypy (strict)\nuv run poe all-checks  # all of the above\n```\n\n## Troubleshooting\n\n- spaCy model not found / load fails:\n  - This project pins and installs `en_core_web_sm` via `pyproject.toml`. Run `uv sync`.\n  - Verify: `uv run python -c \"import en_core_web_sm; en_core_web_sm.load(); print('OK')\"`\n- NumPy/Thinc/Blis errors (e.g., dtype size changed or build failures):\n  - Use Python 3.12. Run `uv python install 3.12` then `uv sync --reinstall`.\n- PDFs with only scanned images will have little/no text extracted. OCR is required to analyze such files.\n\n## Requirements\n\n- Python \u003e=3.12,\u003c3.13\n- uv (install via `pip install uv`)\n\n## Project Layout\n\n- `src/main.py` – entrypoint that runs `paper_verbs.main()`\n- `src/paper_verbs.py` – analyzer logic\n- `src/assets/` – put your PDFs here\n- `src/verbs.csv`, `src/phrases.csv` – outputs\n- `tests/` – pytest suite\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgracefullight%2Fpaper-verbs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgracefullight%2Fpaper-verbs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgracefullight%2Fpaper-verbs/lists"}