{"id":15227109,"url":"https://github.com/fastent/fastent","last_synced_at":"2025-04-09T23:41:03.824Z","repository":{"id":53302693,"uuid":"118632277","full_name":"fastent/fastent","owner":"fastent","description":"custom models for named-entity recognition","archived":false,"fork":false,"pushed_at":"2021-03-31T18:35:23.000Z","size":2704,"stargazers_count":6,"open_issues_count":10,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-24T01:35:15.476Z","etag":null,"topics":["data-annotation","data-generation","named-entities","named-entity-recognition","natural-language-processing","nlp","spacy"],"latest_commit_sha":null,"homepage":"https://fastent.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fastent.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-01-23T15:50:40.000Z","updated_at":"2025-02-04T22:08:56.000Z","dependencies_parsed_at":"2022-09-02T02:00:41.632Z","dependency_job_id":null,"html_url":"https://github.com/fastent/fastent","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastent%2Ffastent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastent%2Ffastent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastent%2Ffastent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastent%2Ffastent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fastent","download_url":"https://codeload.github.com/fastent/fastent/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248131468,"owners_count":21052819,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-annotation","data-generation","named-entities","named-entity-recognition","natural-language-processing","nlp","spacy"],"created_at":"2024-09-28T22:05:14.126Z","updated_at":"2025-04-09T23:41:03.806Z","avatar_url":"https://github.com/fastent.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fastent\nThe **fastent** Python library is a tool for end-to-end creation of **custom models for [named-entity recognition](https://en.wikipedia.org/wiki/Named-entity_recognition)**.\n\n#### Custom Models\nTo train a model for a new type of entity, you just need a list of examples.\n\nYou are not limited to only predefined types like person, location and organization.\n\n#### How It Works\nfastent does end-to-end creation: **dataset generation**, **annotation**, **contextualiziation** and **training** a model.\n\nYou can also use fastent modules as standalone tools.\n\n#### Made for Prod\nfastent includes integrations with tools like spaCy, fastText pre-trained models and NLTK.\n\nfastent is built to scale to very large text datasets in many languages.\n\n---\n\n\u003c!--ts--\u003e\n* [Installation](#installation)\n* [How To](#how-to)\n    * [Dataset Generation](#generation)\n    * [Annotation](#annotation)\n    * [Contextualization](#contextualization)\n    * [Training](#training)\n    * [Testing](#testing)\n* [Integrations](#integrations)\n    * [Pre-trained Models](#pre-trained-models)\n    * [Text utilities](#text-utilities)\n    * [WordNet](#wordnet)\n    * [Poincaré embeddings](#poincare-embeddings)\n* [More](#more)\n\u003c!--te--\u003e\n\n### Installation\n\nfastent is developed for Python 3 on Unix systems.\n\nClone this repo or install from PyPI:\n```\npip install fastent\n```\n\nDownload NLTK data:\n```\npython -m nltk.downloader stopwords\n```\n\nInstall and set up CouchDB:\n```\nwget -O - https://raw.githubusercontent.com/fastent/fastent/master/install.sh | bash\n```\n\n\n#### Downloading data files\nTODO: fastText stuff\n\n## How To\n\n### Generation\nfastent can generate a dataset from a list\n\nTODO\n\nfastent can even generate a list from one or two examples.\n```\nfrom fastent import dataset_pseudo_generator\n\nmodel = dataset_pseudo_generator.spacy_initialize('en_core_web_lg')\ndataset_pseudo_generator.dataset_generate(model,['cocaine', 'heroin'], 100)\n```\n\nThe equivalent on the command line:\n```\npython dataset_pseudo_generator.py -m en_core_web_lg -s cocaine,heroin\n```\n\n### Annotation\nTODO\n\n### Contextualization\nTODO\n\n### Training\nTo train a model from the annotated and contextualized dataset:\n\nFor now the only supported learning framework is spaCy.\n\n[Request support for a new learning framework](https://github.com/fastent/fastent/issues/new?labels=Models\u0026title=New+learning+framework+support+request:)\n\nTODO: sample output\n\n### Testing\nComing soon!\n\n## Integrations\nfastent includes integrations for downloading datasets and pre-trained models.\n\nTODO\n\n## More\nSee how fastent performs on [benchmarks](/benchmarks)\n\nTry the [tutorial](/tutorial) or fork [examples](/examples)\n\nBrowse [frequently asked questions](/faq)\n\n[Report bugs or request new features](https://github.com/fastent/fastent/issues/new)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffastent%2Ffastent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffastent%2Ffastent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffastent%2Ffastent/lists"}