{"id":19435683,"url":"https://github.com/code-kern-ai/sequence-learn","last_synced_at":"2025-04-24T21:30:36.920Z","repository":{"id":40464063,"uuid":"446551001","full_name":"code-kern-ai/sequence-learn","owner":"code-kern-ai","description":"With sequence-learn, you can build models for named entity recognition as quickly as if you were building a sklearn classifier.","archived":false,"fork":false,"pushed_at":"2022-10-20T15:02:19.000Z","size":564,"stargazers_count":22,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-03T11:36:24.586Z","etag":null,"topics":["machine-learning","named-entity-recognition","natural-language-processing","ner","nlp","python"],"latest_commit_sha":null,"homepage":"https://www.kern.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/code-kern-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-10T19:11:52.000Z","updated_at":"2024-03-24T08:28:39.000Z","dependencies_parsed_at":"2022-08-25T03:00:32.738Z","dependency_job_id":null,"html_url":"https://github.com/code-kern-ai/sequence-learn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-kern-ai%2Fsequence-learn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-kern-ai%2Fsequence-learn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-kern-ai%2Fsequence-learn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-kern-ai%2Fsequence-learn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/code-kern-ai","download_url":"https://codeload.github.com/code-kern-ai/sequence-learn/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250712871,"owners_count":21475092,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","named-entity-recognition","natural-language-processing","ner","nlp","python"],"created_at":"2024-11-10T15:07:36.523Z","updated_at":"2025-04-24T21:30:36.585Z","avatar_url":"https://github.com/code-kern-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![sequence-learn](https://uploads-ssl.webflow.com/61e47fafb12bd56b40022a49/6274762101c203108c785958_banner.png)\n[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/)\n[![pypi 0.0.9](https://img.shields.io/badge/pypi-0.0.9-green.svg)](https://pypi.org/project/sequencelearn/0.0.9/)\n\n# ➡️ sequence-learn\nWith `sequence-learn`, you can build models for named entity recognition as quickly as if you were building a sklearn classifier.\n\nIt takes as input embedded token lists, which you can create within a few lines of code using the [embedders library](https://github.com/code-kern-ai/embedders). The labels are on token-level, i.e., for each token, you must provide some information in a simple list.\n\n## Installation\nYou can set up this library via either running `$ pip install sequencelearn`, or via cloning this repository and running `$ pip install -r requirements.txt` in your repository.\n\nA sample installation including `embedders` would be (including [spaCy](https://github.com/explosion/spaCy) for tokenization):\n```\n$ conda create --name sequence-learn python=3.9\n$ conda activate sequence-learn\n$ pip install sequencelearn\n$ pip install embedders\n$ python -m spacy download en_core_web_sm\n```\n\n## Usage\nOnce you have installed the package(s), you can easily create the input for a text corpus and put it - together with the required labels - into the model training.\n\n```python\nfrom embedders.extraction.contextual import TransformerTokenEmbedder\nfrom sequencelearn.sequence_tagger import CRFTagger\n\ncorpus = [\n    \"I went to Cologne in 2009\",\n    \"My favorite number is 41\",\n    # ...\n]\n\nlabels = [\n    [\"OUTSIDE\", \"OUTSIDE\", \"OUTSIDE\", \"CITY\", \"OUTSIDE\", \"YEAR\"],\n    [\"OUTSIDE\", \"OUTSIDE\", \"OUTSIDE\", \"OUTSIDE\", \"DIGIT\"],\n    # ...\n]\n\n# use embedders to easily convert your raw data\nembedder = TransformerTokenEmbedder(\"distilbert-base-uncased\", \"en_core_web_sm\")\n\nembeddings = embedder.fit_transform(corpus)\n# contains a list of ragged shape [num_texts, num_tokens (text-specific), embedding_dimension]\n\ntagger = CRFTagger()\ntagger.fit(embeddings, labels)\n```\n\nNow that you've trained a tagger model, you can easily apply it to new text data.\n\n```python\nsentence = [\"My birthyear is 2002\"]\nprint(tagger.predict(embedder.transform(sentence)))\n# prints [['OUTSIDE', 'OUTSIDE', 'OUTSIDE', 'YEAR']]\n```\n\n## Roadmap\n- [x] Add documentation to existing models\n- [x] Add sequence-based models (e.g. CRF-based)\n- [x] Add sample projects\n- [ ] Enable models to be converted to bytes / stored to disk\n- [ ] Add test cases\n\nIf you want to have something added, feel free to open an [issue](https://github.com/code-kern-ai/sequence-learn/issues).\n\n## Contributing\nContributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.\n\nIf you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag \"enhancement\".\nDon't forget to give the project a star! Thanks again!\n\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\nAnd please don't forget to leave a ⭐ if you like the work! \n\n## License\nDistributed under the Apache 2.0 License. See LICENSE.txt for more information.\n\n## Contact\nThis library is developed and maintained by [kern.ai](https://github.com/code-kern-ai). If you want to provide us with feedback or have some questions, don't hesitate to contact us. We're super happy to help ✌️\n\n## Acknowledgements\nHuge thanks to [Erik Ziegler](https://github.com/erksch) for helping with the CRF implementation!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-kern-ai%2Fsequence-learn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode-kern-ai%2Fsequence-learn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-kern-ai%2Fsequence-learn/lists"}