{"id":13482355,"url":"https://github.com/chartbeat-labs/textacy","last_synced_at":"2025-05-14T07:11:03.953Z","repository":{"id":41403603,"uuid":"51014761","full_name":"chartbeat-labs/textacy","owner":"chartbeat-labs","description":"NLP, before and after spaCy","archived":false,"fork":false,"pushed_at":"2023-09-22T23:38:28.000Z","size":32964,"stargazers_count":2219,"open_issues_count":35,"forks_count":248,"subscribers_count":86,"default_branch":"main","last_synced_at":"2025-04-11T02:51:42.869Z","etag":null,"topics":["natural-language-processing","nlp","python","spacy"],"latest_commit_sha":null,"homepage":"https://textacy.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chartbeat-labs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-02-03T16:52:45.000Z","updated_at":"2025-04-09T05:27:04.000Z","dependencies_parsed_at":"2023-02-12T13:18:12.630Z","dependency_job_id":"69c58a21-aad1-4911-b339-e9b97b985c33","html_url":"https://github.com/chartbeat-labs/textacy","commit_stats":{"total_commits":1724,"total_committers":35,"mean_commits":49.25714285714286,"dds":0.345707656612529,"last_synced_commit":"f08ecbc46020f514b8cbb024778ec4f80456291f"},"previous_names":[],"tags_count":29,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chartbeat-labs%2Ftextacy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chartbeat-labs%2Ftextacy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chartbeat-labs%2Ftextacy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chartbeat-labs%2Ftextacy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chartbeat-labs","download_url":"https://codeload.github.com/chartbeat-labs/textacy/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254092798,"owners_count":22013292,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["natural-language-processing","nlp","python","spacy"],"created_at":"2024-07-31T17:01:01.174Z","updated_at":"2025-05-14T07:10:58.938Z","avatar_url":"https://github.com/chartbeat-labs.png","language":"Python","funding_links":[],"categories":["Python","Resources and Frameworks","文本数据和NLP","Libraries","Tools and codes","Natural Language Processing","函式庫","Feature Extraction","Chinese NLP Toolkits 中文NLP工具","Packages"],"sub_categories":["Videos and Online Courses","Plain text","General Purpose NLP","書籍","Text/NLP","General-Purpose Machine Learning","Popular NLP Toolkits for English/Multi-Language 常用的英文或支持多语言的NLP工具包","Libraries"],"readme":"## textacy: NLP, before and after spaCy\n\n`textacy` is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals --- tokenization, part-of-speech tagging, dependency parsing, etc. --- delegated to another library, `textacy` focuses primarily on the tasks that come before and follow after.\n\n[![build status](https://img.shields.io/travis/chartbeat-labs/textacy/master.svg?style=flat-square)](https://travis-ci.org/chartbeat-labs/textacy)\n[![current release version](https://img.shields.io/github/release/chartbeat-labs/textacy.svg?style=flat-square)](https://github.com/chartbeat-labs/textacy/releases)\n[![pypi version](https://img.shields.io/pypi/v/textacy.svg?style=flat-square)](https://pypi.python.org/pypi/textacy)\n[![conda version](https://anaconda.org/conda-forge/textacy/badges/version.svg)](https://anaconda.org/conda-forge/textacy)\n\n### features\n\n- Access and extend spaCy's core functionality for working with one or many documents through convenient methods and custom extensions\n- Load prepared datasets with both text content and metadata, from Congressional speeches to historical literature to Reddit comments\n- Clean, normalize, and explore raw text before processing it with spaCy\n- Extract structured information from processed documents, including n-grams, entities, acronyms, keyterms, and SVO triples\n- Compare strings and sequences using a variety of similarity metrics\n- Tokenize and vectorize documents then train, interpret, and visualize topic models\n- Compute text readability and lexical diversity statistics, including Flesch-Kincaid grade level, multilingual Flesch Reading Ease, and Type-Token Ratio\n\n... *and much more!*\n\n### links\n\n- Download: https://pypi.org/project/textacy\n- Documentation: https://textacy.readthedocs.io\n- Source code: https://github.com/chartbeat-labs/textacy\n- Bug Tracker: https://github.com/chartbeat-labs/textacy/issues\n\n### maintainer\n\nHowdy, y'all. 👋\n\n- Burton DeWilde (\u003cburtdewilde@gmail.com\u003e)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchartbeat-labs%2Ftextacy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchartbeat-labs%2Ftextacy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchartbeat-labs%2Ftextacy/lists"}