{"id":13564417,"url":"https://github.com/philgooch/abbreviation-extraction","last_synced_at":"2026-01-16T12:00:09.778Z","repository":{"id":57407746,"uuid":"108258975","full_name":"philgooch/abbreviation-extraction","owner":"philgooch","description":"Python3 implementation of the Schwartz-Hearst algorithm for extracting abbreviation-definition pairs","archived":false,"fork":false,"pushed_at":"2023-10-20T21:33:57.000Z","size":59,"stargazers_count":88,"open_issues_count":3,"forks_count":20,"subscribers_count":5,"default_branch":"develop","last_synced_at":"2025-10-20T12:44:46.708Z","etag":null,"topics":["abbreviations","information-extraction","keyword-extraction","nlp","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philgooch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-10-25T11:09:16.000Z","updated_at":"2025-04-20T04:56:43.000Z","dependencies_parsed_at":"2022-09-26T17:10:47.381Z","dependency_job_id":"31726c77-9ba8-4a79-8f0f-415bbb1a63e3","html_url":"https://github.com/philgooch/abbreviation-extraction","commit_stats":{"total_commits":40,"total_committers":4,"mean_commits":10.0,"dds":0.09999999999999998,"last_synced_commit":"a9cae045dd506f0cc8d36c512a9d49107c758da5"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/philgooch/abbreviation-extraction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philgooch%2Fabbreviation-extraction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philgooch%2Fabbreviation-extraction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philgooch%2Fabbreviation-extraction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philgooch%2Fabbreviation-extraction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philgooch","download_url":"https://codeload.github.com/philgooch/abbreviation-extraction/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philgooch%2Fabbreviation-extraction/sbom","scorecard":{"id":731207,"data":{"date":"2025-08-11","repo":{"name":"github.com/philgooch/abbreviation-extraction","commit":"db38d87e83f881fbb453f6d8f74173c832a19226"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.8,"checks":[{"name":"Code-Review","score":2,"reason":"Found 3/15 approved changesets -- score normalized to 2","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: MIT License: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during GetBranch(master): error during branchesHandler.query: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 30 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-22T14:28:42.466Z","repository_id":57407746,"created_at":"2025-08-22T14:28:42.466Z","updated_at":"2025-08-22T14:28:42.466Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478394,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abbreviations","information-extraction","keyword-extraction","nlp","python3"],"created_at":"2024-08-01T13:01:31.006Z","updated_at":"2026-01-16T12:00:09.705Z","avatar_url":"https://github.com/philgooch.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Extraction of abbreviation-definition pairs\n\n[![Build Status](https://travis-ci.org/philgooch/abbreviation-extraction.svg)](https://travis-ci.org/philgooch/abbreviation-extraction)\n\n## Version: 0.2.5\n\nThis is a Python3 implementation of the [Schwartz-Hearst algorithm](https://psb.stanford.edu/psb-online/proceedings/psb03/schwartz.pdf)\nfor identifying abbreviations and their corresponding definitions in free text[1].\n\nThe [original implementation is in Java](http://biotext.berkeley.edu/software.html), and Vincent Van Asch created a Python2 implementation at\n\nhttp://www.cnts.ua.ac.be/~vincent/scripts/abbreviations.py\n\n* NB: As of March 2019 this link appears to be dead. \n\nI have simplified, refactored it for Python 3 and added some tests.\n\nThis version outputs a Python dictionary of abbreviation:definition pairs.\n\n\n## Installation for command-line use\n    pip install -r requirements.txt\n    \n### Usage\n\nFrom the command line\n\n    python abbreviations/schwartz_hearst.py \u003cinput file\u003e\n    \n## Installation as a module\n\n    python3 setup.py install\n    \nor\n\n    pip install abbreviations\n    \n### Usage\n\n    from abbreviations import schwartz_hearst\n    \n    # By default, the most recently encountered definition for each term is returned\n    pairs = schwartz_hearst.extract_abbreviation_definition_pairs(doc_text='The emergency room (ER) was busy')\n    pairs = schwartz_hearst.extract_abbreviation_definition_pairs(file_path='\u003cpath_to_file\u003e')\n\n    # If multiple definitions are encountered for each term, you might want to return the most common for each\n    pairs = schwartz_hearst.extract_abbreviation_definition_pairs(doc_text='...', most_common_definition=True)\n    \n    # ... or you might want to return the first encountered definition for each\n    pairs = schwartz_hearst.extract_abbreviation_definition_pairs(doc_text='...', first_definition=True)\n    \n    # when using a longer text, the format is line-separated sentences:\n    import nltk\n    sentences = nltk.sent_tokenize(longer_text)\n    pairs = schwartz_hearst.extract_abbreviation_definition_pairs(doc_text='\\n'.join(sentences))\n\n[1] A. Schwartz and M. Hearst (2003) A Simple Algorithm for Identifying Abbreviations Definitions in Biomedical Text.\nBiocomputing, 451-462.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilgooch%2Fabbreviation-extraction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilgooch%2Fabbreviation-extraction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilgooch%2Fabbreviation-extraction/lists"}